This is the schedule of topics for Computational Linguistics II, Spring 2006.
Readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing, unless otherwise specified. The "other" column has optional links pointing either to material you should already know (but might want to review) or to related material you might be interested in.
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the class mailing list or email me for "official"
dates.
Class  Topic 
Readings*  Assignments  Other 

Jan 25  Course administrivia, semester plan; corpusdriven and computational linguistics 
Ch 1, 2.1.[19] (for review) Word counts; tokenization; frequency and Zipf's law; concordances 
Assignment 0 (given in class)  Corpus Colossal (The Economist, 20 Jan 2005); Language Log; Resnik and Elkiss (DRAFT); Linguist's Search Engine 
Words and lexical association 
Ch 5 Collocations; mutual information; hypothesis testing 
Assignment 1a, Assignment 1b  Dunning (1993), Bland and Altman (1995)  
Information theory, ngram models 
Ch 2.2, Ch 6 Information theory essentials; noisy channel model; maximum likelihood estimation 
Assignment 2  
Smoothing; hidden Markov models 
Ch 910 Smoothing methods; review of forward and Viterbi algorithms; EM and the forwardbackward algorithm 
Assignment 3  
Treebanks and probabilistic parsing 
Ch 1112, Abney (1996) PCFGs; inside probabilities; dependencybased models; NLP evaluation paradigms and parser evaluation 
Pereira (2000); Detlef Prescher, A Tutorial on the ExpectationMaximization Algorithm Including MaximumLikelihood Estimation and EM Training of Probabilistic ContextFree Grammars.  
EM Revisited: InsideOutside Algorithm 
A fairly general, intuitive schema for deriving EM update equations, and the insideoutside algorithm as an instance of it.  Assignment 4  
Mar 15  Guest Lecture: Jimmy Lin on Information Retrieval 
Ch 8.5, 15.{1,2,4}  Lecture slides  Takehome midterm assigned 
Mar 22  Spring Break 
none  
Mar 29  Supervised classification 
Ch 16 Experimental setups and evaluation; knearest neighbor classification; naive Bayes; decision lists; decision trees 
Assignment 5 (Project) Due April 26 

Apr 5  Maximum entropy models 
Ch 16 The maximum entropy principle; loglinear models; feature selection for supervised classification 
Other useful readings include Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997) and A Maximum Entropy Model for PartOfSpeech Tagging (EMNLP 1996); Adam Berger's maxent tutorial; and Noah Smith's notes on loglinear models.  
Apr 12  Word sense disambiguation 
Ch 7 Characterizing the WSD problem; WSD evaluation; unsupervised methods/Lesk's algorithm; supervised techniques; semisupervised learning and Yarowsky's algorithm 

Apr 12  Word sense disambiguation in NLP applications 
Resnik (2006), "WSD in NLP Applications" (to appear in
Edmonds and Agirre 2006) "Traditional" WSD in IR, QA, MT, and related applications 

Apr 26  Machine translation 
Ch 13 and Adam Lopez, Statistical Machine Translation
(survey article, submitted) Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond wordlevel models 
Mihalcea and Pedersen (2003); Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283299.  
May 3  Phrasebased statistical MT  Components of a phrasebased system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation.  Assignment 6  Koehn, PHARAOH: A Beam Search Decoder for PhraseBased Statistical Machine Translation 
May 10  Computational approaches to human language acquisition 
Mintz, T. H. (2006). Finding the verbs: distributional cues to categories available to young learners. In K. HirshPasek & R. M. Golinkoff (Eds.), Action Meets Word: How Children Learn Verbs, p. 3163. New York: Oxford University Press. [link]; 