This is the schedule of topics for Computational Linguistics II, Spring 2014.
In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.
|Jan 29||Course administrivia, semester plan; some statistical NLP fundamentals
||M&S Ch 1, 2.1.[1-9] (for review)
||Assignment 1||Language Log (the linguistics blog), Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest)|
|Feb 5||Words and lexical association
||M&S Ch 5
||Assignment 2|| Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations.
Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events.
A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a more recent and ery readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video.
Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research.
|Feb 12||Information theory
||M&S Ch 2.2, M&S Ch 6
Optional: Piantadoso et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density
Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into all of the concepts from this lecture with greater rigor but a lot of clarity.
Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.
|Feb 19||Maximum likelihood estimation and Expectation Maximization
||Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read
my EM recipe discussion.
|Feb 26||Bayesian inference and modeling
Overview of final exam project
|Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated. No need to work through all the equations in Section 2 in detail, but read carefully enough to understand the concepts.||Do one of EC1, EC2, or EC3 from Assignment 4. (Worth 50% of a usual homework, and not due until 4:30pm Friday March 7)||
For a very nice and brief summary of LDA, including a really clear explanation of the corresponding Gibbs sampler (with pseudocode!), see Section 5 of Gregor Heinrich, Parameter estimation for text analysis.
I will touch on supervised topic models, particularly in the context of the project; I recommend reading Blei and McAuliffe, Supervised Topic Models (though note that we will not be talking about variational EM). Also relevant is Nguyen, Boyd-Graber, and Resnik, Lexical and Hierarchical Topic Regression.
If you're interested in going back to the source for LDA, see Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation.
|Mar 5||Supervised classification
||M&S Ch 16 except 16.2.1;
Hearst et al. 1998 Support Vector Machines (cleaner copy here)
I picked Hearst et al. (1998) as the SVM reading because it's the clearest, shortest possible introduction. There are many other good things to read at svms.org, including a "best tutorials" section, broken out by introductory, intermediate, and advanced, under Tutorials. Feel free to go with one of the other tutorials (the ones I've seen used most often are Burges 1998 and Smola et al. (1999))) instead of Hearst if you want a meatier introduction.
Optional: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP. Noah Smith's (2004) Log-Linear Models is a nice alternative introduction expressed in a vocabulary that is more consistent with current work.
|Mar 12||Deep learning
||Read sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Other sources we are likely to discuss include: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations.||Recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives, and the background on the skip-gram approach in word2vec found in Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality. Background on Mikolov et al.'s Linguistic Regularities paper is in Mikolov et al. Recurrent neural network based language model.|
|Mar 19||Spring Break
|March 26||Evaluation in NLP||Lin and Resnik, Evaluation of NLP Systems, Ch 11 of
Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural
Language Processing Handbook.
|April 2||Structured prediction||
Ke Wu, Discriminative Sequence Labeling
Noah Smith's Structured prediction for NLP tutorial slides (ICML'09)
|For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automata-theoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation.|
|Apr 9||More on structured prediction: CRF, structured perceptron, structural SVM||Noah Smith, Linguistic Structure Prediction, esp. Sections 3.5.2-3.7. (The book is available online for UMD and many other university IP addresses.)||Also of interest:
|April 16||Guest lecture: Doug Oard on information retrieval||Douglas W. Oard, Jerome White, Jaiul Paik, Rashmi Sankepally and Aren Jansen, "The FIRE 2013 Question Answering for the Spoken Web Task", Fifth Forum for Information Retrieval Evaluation, 8 pages, New Delhi, India, 2013.|
|Apr 23||Machine translation
||Ch 13 and Adam Lopez, Statistical Machine Translation,
In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008.
Also potentially useful or of interest:
|April 30||Machine translation continued
|May 7||Projects discussion
Return to course home page