Assignment 10, Ling 645/CMSC 723, Fall 1997

# Assignment 10, Ling 645/CMSC 723, Fall 1997

10.1  Using Hidden Markov Models for part-of-speech tagging

Re-read Allen, Sec 7.3 from p. 195 to p. 201, but don't worry about
the Viterbi algorithm; we're covering only the "brute force algorithm"
Allen refers to on p. 201, which is to compute ALL POSSIBLE
part-of-speech category sequences and compare their probabilities
given the observed words.  AN IMPORTANT NOTE TO AVOID CONFUSION: Allen
uses C to denote part of speech sequences and W for words, but in
class I used W to denote the part of speech sequences and the letter O
for the sequence of words.  Make sure you're aware of this notational
difference when you're comparing your class notes to the book.

(a)  Given the table of frequencies in Allen, Figure 7.5 (p. 199),
what would the estimated probability of "flies like flowers"
be according to a unigram model?  Use maximum likelihood
estimates to estimate the probabilities from frequencies.
(Don't just give a number; show how you got it.) [10 pts]

(b)  Using the probability estimates in the rightmost column of
Allen's Figure 7.4, what is the estimated probability of
"N V N" according to a bigram model?  (Again, show why.)
Notice (p. 197) that Allen uses a special symbol, a 0 with
a slash through it, as a special "start" symbol, in just the way
we used '#' in Problem 9.1(b) of your previous assignment. [10 pts]

(c)  Derive an expression for Pr("N V N" | "flies like flowers")
using Bayes' rule.  Give the corresponding expression
for Pr("N N V" | "flies like flowers").  Using the numbers
in Figures 7.5 and 7.6, show which of these two probabilities
is higher.  [20 pts]

NOTE: Figure 7.6 is incomplete!  Use .059 as the value of
Pr(flowers | N), and use .07 as the value of Pr(flowers | V).

(d)  Draw a diagram identifying the two components of the noisy
channel model for part of speech tagging, showing the part of
speech sequence "N V N" and the sentence "flies like flowers"
in the appropriate places.  Next to each component, give an
expression for the probability computed by that component.
[5 pts]

10.2  [45 pts] Read Church and Hanks, "Word Association Norms, Mutual
Information, and Lexicography".  If you have not already done
software described in Problem 9.3 of the previous assignment,
execute Run as described there, and have a look at "infile.mi".
[Note that I will use the terms "mutual information" and "the
association ratio" more or less interchangeably.]

(a) The association ratio can be expressed as log(A/B).
Explain what A and B are, and why A > B would indicate
that two words are associated.  Give an example.

(b) Explain how mutual information might be used by the following
people, including both WHAT they might do and WHY mutual
information might be useful or problematic for the task.
Support any arguments you make by reference to the paper, to
material covered in class, and/or by reference to your
experience using the statistical software.

(i)   A lexicographer writing the "Dictionary of English as Used on
the World Wide Web".

(ii)  A technical writer at Sun Microsystems (a major computer
manufacturer) writing the glossary to be used with a new
set of software documentation (length approximately 200,000
words).

(iii) A publishing house that needs to quickly create a back-of-the
book index for a new psychology textbook.

(iv)  A computational linguist creating a lexicon and grammar
for English following the approach James Allen uses in
his textbook.

The grading for this problem will focus on what you write for
part (b), including such factors as correctness, the logic of the
points you make, the quality of writing, etc.