Note: Any solution that mentions dice, balls, urns, or rain is not entertaining or interesting, but we didn't take off points for that! A good example (adapted from a student's paper): Let X = renting an apartment, Let Y = being a graduate student. P(X|Y) is very high, since we can't afford to buy a house and a cardboard box on the sidewalk wouldn't protect our textbooks. P(Y|X) is fairly low overall, since many people rent apartments but not many people in the general population are graduate students.
Bayes's Theorem: P(B|A) = P(A|B) * P(B) / P(A) Proof: (1) P(A|B) = P(A,B) / P(B) [Def. of conditional probability] (2) P(B|A) = P(A,B) / P(A) [Def. of conditional probability] (3) implies P(A,B) = P(B|A) * P(A) [Multiplied both sides of (2) by P(A)] (4) Therefore [Using (3) to replace P(A,B) in (1)] P(B|A) = [P(B|A) * P(A)] / P(B).
A maximum likelihood estimate (MLE) of probability is computed by taking the raw count and dividing by the total, so for a general N-gram model the relevant maximum likelihood estimates are: (1) P(w[n-N+1],...,w[n-1],n) = Count(w[n-N+1],...,w[n-1],w[n])/Total (2) P(w[n-N+1],...,w[n-1]) = Count(w[n-N+1],...,w[n-1])/Total What we want is the conditional probability P(w[n-N+1],...,w[n-1],n) (3) P(w[n] | w[n-N+1],...,w[n-1]) = ------------------------- P(w[n-N+1],...,w[n-1]) Substituting (1) and (2) into the numerator and denominator of (3), the Total in the numerator and denominator cancel out, so we get the general formula for the MLE of probability in an N-gram model: Count(w[n-N+1],...,w[n-1],w[n]) (4) P(w[n] | w[n-N+1],...,w[n-1]) = --------------------------------- Count(w[n-N+1],...,w[n-1]) Note that Jurafsky and Martin give you (4), so you didn't actually need to go through steps (1)-(3) by yourself! They're included here so you can fully understand what's going on. Given (4), you need only show how this general formula applies for a trigram model, where N = 3: Count(w[n-2],w[n-1],w[n]) P(w[n] | w[n-2],...,w[n-1]) = --------------------------------- Count(w[n-2],w[n-1]) Note that throughout I have used the notation w[i] to indicate that w has subscript i.