Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2006.

Readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing, unless otherwise specified. The "other" column has optional links pointing either to material you should already know (but might want to review) or to related material you might be interested in.

In addition, some topic areas may take longer than expected, so keep an eye on the class mailing list or e-mail me for "official" dates.

Class Topic
Readings* Assignments Other
Jan 25 Course administrivia, semester plan; corpus-driven and computational linguistics
Ch 1, 2.1.[1-9] (for review)
Word counts; tokenization; frequency and Zipf's law; concordances
Assignment 0 (given in class) Corpus Colossal (The Economist, 20 Jan 2005); Language Log; Resnik and Elkiss (DRAFT); Linguist's Search Engine
Words and lexical association
Ch 5
Collocations; mutual information; hypothesis testing
Assignment 1a, Assignment 1b Dunning (1993), Bland and Altman (1995)
Information theory, n-gram models
Ch 2.2, Ch 6
Information theory essentials; noisy channel model; maximum likelihood estimation
Assignment 2
Smoothing; hidden Markov models
Ch 9-10
Smoothing methods; review of forward and Viterbi algorithms; EM and the forward-backward algorithm
Assignment 3
Treebanks and probabilistic parsing
Ch 11-12, Abney (1996)
PCFGs; inside probabilities; dependency-based models; NLP evaluation paradigms and parser evaluation
Pereira (2000); Detlef Prescher, A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars.
EM Revisited: Inside-Outside Algorithm
A fairly general, intuitive schema for deriving EM update equations, and the inside-outside algorithm as an instance of it. Assignment 4
Mar 15 Guest Lecture: Jimmy Lin on Information Retrieval
Ch 8.5, 15.{1,2,4} Lecture slides Take-home midterm assigned
Mar 22 Spring Break
Mar 29 Supervised classification
Ch 16
Experimental setups and evaluation; k-nearest neighbor classification; naive Bayes; decision lists; decision trees
Assignment 5 (Project)
Due April 26
Apr 5 Maximum entropy models
Ch 16
The maximum entropy principle; log-linear models; feature selection for supervised classification
Other useful readings include Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997) and A Maximum Entropy Model for Part-Of-Speech Tagging (EMNLP 1996); Adam Berger's maxent tutorial; and Noah Smith's notes on loglinear models.
Apr 12 Word sense disambiguation
Ch 7
Characterizing the WSD problem; WSD evaluation; unsupervised methods/Lesk's algorithm; supervised techniques; semi-supervised learning and Yarowsky's algorithm
Apr 12 Word sense disambiguation in NLP applications
Resnik (2006), "WSD in NLP Applications" (to appear in Edmonds and Agirre 2006)
"Traditional" WSD in IR, QA, MT, and related applications
Apr 26 Machine translation
Ch 13 and Adam Lopez, Statistical Machine Translation (survey article, submitted)
Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond word-level models
Mihalcea and Pedersen (2003); Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283-299.
May 3 Phrase-based statistical MT Components of a phrase-based system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation. Assignment 6 Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation
May 10 Computational approaches to human language acquisition
Mintz, T. H. (2006). Finding the verbs: distributional cues to categories available to young learners. In K. Hirsh-Pasek & R. M. Golinkoff (Eds.), Action Meets Word: How Children Learn Verbs, p. 31-63. New York: Oxford University Press. [link];

*Readings are from Manning and Schuetze unless otherwise specified. Do the reading before the class where it is listed!

Return to course home page

This page last updated 5 April 2006.

Many thanks to David Chiang, Bonnie Dorr, Christof Monz, Amy Weinberg, for discussions about the syllabus. Responsibility for the outcome is, of course, completely indeterminate. :-)