This is the schedule of topics for Computational Linguistics II, Spring 2010.
Unless otherwise specified, readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in.
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the class mailing list or email me for "official"
dates.
Class  Topic 
Readings*  Assignments  Other 

Jan 27  Course administrivia, semester plan; some statistical NLP fundamentals 
Ch 1, 2.1.[19] (for review) Historical overview; Probability spaces; finitestate and Markov models; Bayes' Rule; Bayesian updating; conjugate priors 
Assignment 1  Language Log (the linguistics blog), Hal Daume's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest) 
Feb 3  Words and lexical association 
Ch 5 Zipf's law; collocations; hypothesis testing; mutual information; 
Assignment 2  Dunning (1993); Goodman S (1999). "Toward evidencebased medical statistics. 1: The P value fallacy.". Ann Intern Med 130 (12): 995–1004. PMID 10383371.; Goodman S (1999). "Toward evidencebased medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005–13. PMID 10383350. Kilgarriff (2005); Gries (2005); Bland and Altman (1995); 
Feb 10  Snow day! 

Feb 17  Information theory 
Ch 2.2, Ch 6 Information theory essentials; entropy, relative entropy, mutual information; noisy channel model; cross entropy and perplexity 
Assignment 3  
Feb 24  Maximum likelihood estimation and Expectation Maximization 
Skim Ch 910, Chapter 6 of Lin and Dyer (forthcoming) Maximum likelihood estimation overview; quick review of smoothing; HMM review; deriving forwardbackward algorithm as an instance of EM; Viterbi algorithm. 
An
empirical study of smoothing techniques for language modeling (Stanley
Chen and Joshua Goodman, Technical report TR1098, Harvard University,
August 1998); Revised Chapter 4 from the updated Jurafsky and Martin textbook. 

March 3  Probabilistic grammars and parsing 
Ch 1112, Abney (1996) Parsing as inference; distinction between logic and control; Memoization and dynamic programming; review of CKY. Defining PCFG; PCKY (inside probabilities); Viterbi CKY; revisiting EM: the insideoutside algorithm. CFG extensions (grandparent parent nodes, lexicalization); syntactic dependency trees. 
Assignment 4  Jason Eisner's great parsing song; Pereira (2000); Detlef Prescher, A Tutorial on the ExpectationMaximization Algorithm Including MaximumLikelihood Estimation and EM Training of Probabilistic ContextFree Grammars; McClosky, Charniak, and Johnson (2006), Effective SelfTraining for Parsing 
Mar 10  Advanced topic: parsing with latent variables [Guest presenter/facilitator: Vlad Eidelmann] 
Petrov and Klein, Learning and Inference for Hierarchically Split PCFGs  Takehome midterm handed out  
Mar 17  Spring Break 
Have fun!  
Mar 24  Supervised classification [Guest lecture: Eric Hardisty] 
Ch 16 (except 16.2) Supervised learning  knearest neighbor classification; naive Bayes; decision lists; decision trees; transformationbased learning (Sec 10.4); linear classifiers; the kernel trick; perceptrons; SVM basics. 
Extra credit assignment, worth 50% of a homework assignment  
March 31  Evaluation in NLP  Lin and Resnik, Evaluation of NLP Systems, Ch 11 of
Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural
Language Processing Handbook. Evaluation paradigms for NLP; parser evaluation 
Team project handed out  
Apr 7  More on supervised learning: maximum entropy models and conditional random fields [Guest lecture: Chris Dyer] 
Using maximum entropy for text classification (Kamal Nigam, John Lafferty, Andrew McCallum);
Shallow Parsing with Conditional Random Fields (Fei Sha and Fernando Pereira)
The maximum entropy principle; maxent classifiers (for predicting a single variable); CRFs (for predicting interacting variables); L2 regularization. 
Optionally, some good introductory material appears in Adam Berger's maxent tutorial, Dan Klein and Chris Manning's Maxent Models, Conditional Estimation, and Optimization, without the Magic, and Noah Smith's notes on loglinear models (which provides explicit details for a lot of the math). Another useful reading, focused on estimating the parameters of maxent models, is A comparison of algorithms for maximum entropy parameter estimation (Rob Malouf). Also, Manning and Schuetze section 16.2 can be read as supplementary material. Of historical interest: Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997).  
Apr 14  Unsupervised methods and topic modeling [topic tentative] [Guest lecture: Jordan BoydGraber] 
Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated
(new version to be linked shortly); M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning
[tentative] Graphical model representations of generative models; MLE, MAP, and Bayesian inference; Markov Chain Monte Carlo (MCMC)and Gibbs Sampling; Latent Dirichlet Allocation (LDA) 
Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation  
Apr 21  Word sense disambiguation 
Ch 8.5, 15.{1,2,4} Semantic similarity; relatedness; synonymy; polysemy; homonymy; entailment; ontologybased similarity measures; vector representations and similarity measures; sketch of LSA. Characterizing the WSD problem; WSD as a supervised classification problem. Lesk algorithm; semisupervised learning and Yarowsky's algorithm; WSD in applications; WSD evaluation. 
Optional: Adam Kilgarriff (1997) I don't believe in word senses Computers and the Humanities 31(2), pp. 91113; Philip Resnik (2006), WSD in NLP Applications (Google Books)  
Apr 28  Machine translation 
Ch 13 and Adam Lopez, Statistical Machine Translation,
In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008. Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond wordlevel models 
Also potentially useful or of interest:
Kevin Knight, A Statistical MT Tutorial Workbook;
Mihalcea and Pedersen (2003); Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283299. 

May 5 [tentative]  Phrasebased statistical MT 
This material may be folded into the previous class
in order to make room for a different topic. Papineni, Roukos, Ward and Zhu. 2001. BLEU: A Method for Automatic Evaluation of Machine Translation Components of a phrasebased system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation. 
Takehome final handed out  Koehn, PHARAOH: A Beam Search Decoder for PhraseBased Statistical Machine Translation; Koehn (2004) presentation on PHARAOH decoder 