Assignment 10, Ling 645/CMSC 723, Fall 1997


10.1  Using Hidden Markov Models for part-of-speech tagging

Re-read Allen, Sec 7.3 from p. 195 to p. 201, but don't worry about
the Viterbi algorithm; we're covering only the "brute force algorithm"
Allen refers to on p. 201, which is to compute ALL POSSIBLE
part-of-speech category sequences and compare their probabilities
given the observed words.  AN IMPORTANT NOTE TO AVOID CONFUSION: Allen
uses C to denote part of speech sequences and W for words, but in
class I used W to denote the part of speech sequences and the letter O
for the sequence of words.  Make sure you're aware of this notational
difference when you're comparing your class notes to the book.

    (a)  Given the table of frequencies in Allen, Figure 7.5 (p. 199),
	 what would the estimated probability of "flies like flowers"
	 be according to a unigram model?  Use maximum likelihood
	 estimates to estimate the probabilities from frequencies.
	 (Don't just give a number; show how you got it.) [10 pts]

    (b)  Using the probability estimates in the rightmost column of
	 Allen's Figure 7.4, what is the estimated probability of 
	 "N V N" according to a bigram model?  (Again, show why.)
	 Notice (p. 197) that Allen uses a special symbol, a 0 with 
	 a slash through it, as a special "start" symbol, in just the way
	 we used '#' in Problem 9.1(b) of your previous assignment. [10 pts]

    (c)  Derive an expression for Pr("N V N" | "flies like flowers")
	 using Bayes' rule.  Give the corresponding expression
	 for Pr("N N V" | "flies like flowers").  Using the numbers
	 in Figures 7.5 and 7.6, show which of these two probabilities
	 is higher.  [20 pts]

         NOTE: Figure 7.6 is incomplete!  Use .059 as the value of 
         Pr(flowers | N), and use .07 as the value of Pr(flowers | V). 

    (d)  Draw a diagram identifying the two components of the noisy
         channel model for part of speech tagging, showing the part of
	 speech sequence "N V N" and the sentence "flies like flowers"
	 in the appropriate places.  Next to each component, give an
	 expression for the probability computed by that component.
         [5 pts]

10.2  [45 pts] Read Church and Hanks, "Word Association Norms, Mutual
      Information, and Lexicography".  If you have not already done
      so, I recommend that you download the statistical
      software described in Problem 9.3 of the previous assignment,
      execute Run as described there, and have a look at "infile.mi".
      [Note that I will use the terms "mutual information" and "the
      association ratio" more or less interchangeably.]

      (a) The association ratio can be expressed as log(A/B).
          Explain what A and B are, and why A > B would indicate 
          that two words are associated.  Give an example.

      (b) Explain how mutual information might be used by the following 
	  people, including both WHAT they might do and WHY mutual
	  information might be useful or problematic for the task.
	  Support any arguments you make by reference to the paper, to
	  material covered in class, and/or by reference to your
	  experience using the statistical software.

	(i)   A lexicographer writing the "Dictionary of English as Used on 
	      the World Wide Web".

	(ii)  A technical writer at Sun Microsystems (a major computer 
	      manufacturer) writing the glossary to be used with a new
	      set of software documentation (length approximately 200,000
	      words).

	(iii) A publishing house that needs to quickly create a back-of-the
	      book index for a new psychology textbook.

	(iv)  A computational linguist creating a lexicon and grammar 
	      for English following the approach James Allen uses in
	      his textbook.

       The grading for this problem will focus on what you write for
       part (b), including such factors as correctness, the logic of the
       points you make, the quality of writing, etc.
Return to the course home page.