\
This is the schedule of topics for Computational Linguistics II, Spring 2010.
Unless otherwise specified, readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in.
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the class mailing list or e-mail me for "official"
dates.
Class | Topic |
Readings* | Assignments | Other |
---|---|---|---|---|
Jan 27 | Course administrivia, semester plan; some statistical NLP fundamentals |
Ch 1, 2.1.[1-9] (for review) Historical overview; Probability spaces; finite-state and Markov models; Bayes' Rule; Bayesian updating; conjugate priors |
Assignment 1 | Language Log (the linguistics blog), Hal Daume's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest) |
Feb 3 | Words and lexical association |
Ch 5 Zipf's law; collocations; hypothesis testing; mutual information; |
Assignment 2 | Dunning (1993); Goodman S (1999). "Toward evidence-based medical statistics. 1: The P value fallacy.". Ann Intern Med 130 (12): 995–1004. PMID 10383371.; Goodman S (1999). "Toward evidence-based medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005–13. PMID 10383350. Kilgarriff (2005); Gries (2005); Bland and Altman (1995); |
Feb 10 | Snow day! |
Feb 17 | Information theory |
Ch 2.2, Ch 6 Information theory essentials; entropy, relative entropy, mutual information; noisy channel model; cross entropy and perplexity |
Assignment 3 | |
Feb 24 | Maximum likelihood estimation and Expectation Maximization |
Skim Ch 9-10, Chapter 6 of Lin and Dyer (forthcoming) Maximum likelihood estimation overview; quick review of smoothing; HMM review; deriving forward-backward algorithm as an instance of EM; Viterbi algorithm. |
An
empirical study of smoothing techniques for language modeling (Stanley
Chen and Joshua Goodman, Technical report TR-10-98, Harvard University,
August 1998); Revised Chapter 4 from the updated Jurafsky and Martin textbook. |
|
March 3 | Probabilistic grammars and parsing |
Ch 11-12, Abney (1996) Parsing as inference; distinction between logic and control; Memoization and dynamic programming; review of CKY. Defining PCFG; PCKY (inside probabilities); Viterbi CKY; revisiting EM: the inside-outside algorithm. CFG extensions (grandparent parent nodes, lexicalization); syntactic dependency trees. |
Assignment 4 | Jason Eisner's great parsing song; Pereira (2000); Detlef Prescher, A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars; McClosky, Charniak, and Johnson (2006), Effective Self-Training for Parsing |
Mar 10 | Advanced topic: parsing with latent variables [Guest presenter/facilitator: Vlad Eidelmann] |
Petrov and Klein, Learning and Inference for Hierarchically Split PCFGs | Take-home midterm handed out | |
Mar 17 | Spring Break |
Have fun! | ||
Mar 24 | Supervised classification [Guest lecture: Eric Hardisty] |
Ch 16 (except 16.2) Supervised learning -- k-nearest neighbor classification; naive Bayes; decision lists; decision trees; transformation-based learning (Sec 10.4); linear classifiers; the kernel trick; perceptrons; SVM basics. |
Extra credit assignment, worth 50% of a homework assignment | |
March 31 | Evaluation in NLP | Lin and Resnik, Evaluation of NLP Systems, Ch 11 of
Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural
Language Processing Handbook. Evaluation paradigms for NLP; parser evaluation |
Team project handed out | |
Apr 7 | More on supervised learning: maximum entropy models and conditional random fields [Guest lecture: Chris Dyer] |
Using maximum entropy for text classification (Kamal Nigam, John Lafferty, Andrew McCallum);
Shallow Parsing with Conditional Random Fields (Fei Sha and Fernando Pereira)
The maximum entropy principle; maxent classifiers (for predicting a single variable); CRFs (for predicting interacting variables); L2 regularization. |
Optionally, some good introductory material appears in Adam Berger's maxent tutorial, Dan Klein and Chris Manning's Maxent Models, Conditional Estimation, and Optimization, without the Magic, and Noah Smith's notes on loglinear models (which provides explicit details for a lot of the math). Another useful reading, focused on estimating the parameters of maxent models, is A comparison of algorithms for maximum entropy parameter estimation (Rob Malouf). Also, Manning and Schuetze section 16.2 can be read as supplementary material. Of historical interest: Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997). | |
Apr 14 | Unsupervised methods and topic modeling [topic tentative] [Guest lecture: Jordan Boyd-Graber] |
Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated
(new version to be linked shortly); M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning
[tentative] Graphical model representations of generative models; MLE, MAP, and Bayesian inference; Markov Chain Monte Carlo (MCMC)and Gibbs Sampling; Latent Dirichlet Allocation (LDA) |
Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation | |
Apr 21 | Word sense disambiguation |
Ch 8.5, 15.{1,2,4} Semantic similarity; relatedness; synonymy; polysemy; homonymy; entailment; ontology-based similarity measures; vector representations and similarity measures; sketch of LSA. Characterizing the WSD problem; WSD as a supervised classification problem. Lesk algorithm; semi-supervised learning and Yarowsky's algorithm; WSD in applications; WSD evaluation. |
Optional: Adam Kilgarriff (1997) I don't believe in word senses Computers and the Humanities 31(2), pp. 91-113; Philip Resnik (2006), WSD in NLP Applications (Google Books) | |
Apr 28 | Machine translation |
Ch 13 and Adam Lopez, Statistical Machine Translation,
In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008. Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond word-level models |
Also potentially useful or of interest:
Kevin Knight, A Statistical MT Tutorial Workbook;
Mihalcea and Pedersen (2003); Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283-299. |
|
May 5 [tentative] | Phrase-based statistical MT |
This material may be folded into the previous class
in order to make room for a different topic. Papineni, Roukos, Ward and Zhu. 2001. BLEU: A Method for Automatic Evaluation of Machine Translation Components of a phrase-based system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation. |
Take-home final handed out | Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation; Koehn (2004) presentation on PHARAOH decoder |