Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2014.

In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!

In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.

Class Topic
Readings* Assignments Other
Jan 29 Course administrivia, semester plan; some statistical NLP fundamentals
M&S Ch 1, 2.1.[1-9] (for review)
Assignment 1 Language Log (the linguistics blog), Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest)
Feb 5 Words and lexical association
M&S Ch 5
Assignment 2 Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations. Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events.

A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a more recent and ery readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video.

Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research.

Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book.

Feb 12 Information theory
M&S Ch 2.2, M&S Ch 6

Optional: Piantadoso et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density

Assignment 3 Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into all of the concepts from this lecture with greater rigor but a lot of clarity.

Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.

Feb 19 Maximum likelihood estimation and Expectation Maximization
Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read my EM recipe discussion.
Assignment 4
Feb 26 Bayesian inference and modeling

Overview of final exam project

Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated. No need to work through all the equations in Section 2 in detail, but read carefully enough to understand the concepts.

Read M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning and/or review the CL1 topic modeling lecture (notes, video).

Do one of EC1, EC2, or EC3 from Assignment 4. (Worth 50% of a usual homework, and not due until 4:30pm Friday March 7) For a very nice and brief summary of LDA, including a really clear explanation of the corresponding Gibbs sampler (with pseudocode!), see Section 5 of Gregor Heinrich, Parameter estimation for text analysis.

I will touch on supervised topic models, particularly in the context of the project; I recommend reading Blei and McAuliffe, Supervised Topic Models (though note that we will not be talking about variational EM). Also relevant is Nguyen, Boyd-Graber, and Resnik, Lexical and Hierarchical Topic Regression.

If you're interested in going back to the source for LDA, see Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation.

Mar 5 Supervised classification
M&S Ch 16 except 16.2.1; Hearst et al. 1998 Support Vector Machines (cleaner copy here)
I picked Hearst et al. (1998) as the SVM reading because it's the clearest, shortest possible introduction. There are many other good things to read at, including a "best tutorials" section, broken out by introductory, intermediate, and advanced, under Tutorials. Feel free to go with one of the other tutorials (the ones I've seen used most often are Burges 1998 and Smola et al. (1999))) instead of Hearst if you want a meatier introduction.

Optional: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP. Noah Smith's (2004) Log-Linear Models is a nice alternative introduction expressed in a vocabulary that is more consistent with current work.

Mar 12 Deep learning
Read sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Other sources we are likely to discuss include: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations. Recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives, and the background on the skip-gram approach in word2vec found in Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality. Background on Mikolov et al.'s Linguistic Regularities paper is in Mikolov et al. Recurrent neural network based language model.
Mar 19 Spring Break
Have fun!
March 26 Evaluation in NLP Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook.
Take-home midterm
April 2 Structured prediction Ke Wu, Discriminative Sequence Labeling

Noah Smith's Structured prediction for NLP tutorial slides (ICML'09)

For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automata-theoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation.
Apr 9 More on structured prediction: CRF, structured perceptron, structural SVM Noah Smith, Linguistic Structure Prediction, esp. Sections 3.5.2-3.7. (The book is available online for UMD and many other university IP addresses.) Also of interest:
April 16 Guest lecture: Doug Oard on information retrieval Douglas W. Oard, Jerome White, Jaiul Paik, Rashmi Sankepally and Aren Jansen, "The FIRE 2013 Question Answering for the Spoken Web Task", Fifth Forum for Information Retrieval Evaluation, 8 pages, New Delhi, India, 2013.
Apr 23 Machine translation
Ch 13 and Adam Lopez, Statistical Machine Translation, In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008.

Also potentially useful or of interest:
April 30 Machine translation continued

May 7 Projects discussion

Return to course home page