This is the schedule of topics for Computational Linguistics II, Spring 2016.
In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the online class discussions for "official"
dates.
See CL Colloquium Talks for possible extra credit each week.
Class | Topic |
Readings* | Assignments | Other |
---|---|---|---|---|
Jan 27 | Course administrivia, semester plan; some statistical NLP fundamentals |
M&S Ch 1, 2.1.[1-9] (for review) |
Assignment 1
|
Language Log (the linguistics blog); Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest, make sure to read the comment threads also because they're often excellent) |
Feb 3 | Lexical association measures and hypothesis testing |
M&S Ch 5 |
Assignment 2
|
Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations.
Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events.
A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a very readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. (Show that one to your friends and family!) For an interesting response, see Most Published Research Findings Are False--But a Little Replication Goes a Long Way". Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research. Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book. |
Feb 10 | Information theory |
M&S Ch 2.2, M&S Ch 6 Piantadosi et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density |
Assignment 3 |
Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into many concepts from this lecture with greater rigor but a lot of clarity. Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.
|
Feb 17 | Expectation Maximization |
Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read
my EM recipe discussion. |
Assignment 4
|
|
Feb 24 | Neural network language models (Lecture by Yogarshi Vyas) | Bengio et al (2003) ; Mikolov et al (2010); Mikolov et al. (2013) | Before next class, review Yogarshi's slides and do the reading. | |
March 2 | More on neural network models, deep learning, embeddings
|
Yoav Goldberg, A Primer on Neural Network Models for Natural Language Processing. Read with an emphasis on Sections 1-4 (largely covered in lecture last week), and Section 5 (embeddings); also look over 10-11 (RNNs) and 12 (recursive NNs) and the Wikipedia page on autoencoders. |
Assignment 5, worth 50% of a typical homework assignment: do one of the mini-projects from Assignment 4. (If you already did one of these for the extra credit, you must do the other one!)
Note that Assignment 5 is not an extra-credit assignment.
|
Recommended: Sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Also recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives.
Other useful background reading for broader perspective: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations. |
March 9 | Bayesian graphical modeling, Gibbs sampling |
Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated.
M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning and/or review the CL1 topic modeling lecture (notes, video).
|
|
|
Mar 16 | Spring Break |
Have fun! | ||
March 23 | Parsing
|
M&S Ch 11 (esp. pp. 381-388) and Ch 12 (esp. pp. 408-423, 448-455)
|
Assignment 6 | A really nice article introducing parsing as inference is Shieber et al., Principles and Implementation of Deductive Parsing (see Sections 1-3), with a significant advance by Joshua Goodman, Semiring Parsing. |
Mar 30 | Hal Daumé, Guest lecture |
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, Stéphane Ross, Geoff J. Gordon, and J. Andrew Bagnell, AIStats 2011. | Take-home midterm handed out | |
April 6 | Structured prediction | Resources: Noah Smith, Linguistic structure prediction; Joshua Goodman, Semiring Parsing |
Project handed out. Project plans due next Wednesday.
|
|
April 13 | Evaluation | No class: Philip home sick. Please use the class time to meet about your projects, and please read Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook. | ||
April 20 | More on structured prediction; Machine translation |
Ke Wu, Discriminative Sequence Labeling; Koehn, Statistical Machine Translation; Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation |
|
Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP. Noah Smith's (2004) Log-Linear Models is a nice alternative introduction expressed in a vocabulary that is more consistent with current work.
Also of interest:
Montreal's LISA laboratory is an epicenter for work on neural machine translation. |
Apr 27 | Framing (with guest Sarah Oates) | Entman (2003), Cascading Activation: Contesting the White House's Frame After 9/11; Oates, Framing and Agenda-Setting Theory: Widening the Linguistic Lens; Greene, Stephan and Philip Resnik, More than Words: Syntactic Packaging and Implicit Sentiment, NAACL (2009); Nguyen et al., Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress, ACL 2015. |
I have been called for jury duty on April 27. Prof. Sarah Oates of the Journalism School has kindly agreed to do a guest lecture, which will be either longer (if jury duty keeps me from class) or shorter (if it doesn't). In the latter case, she'll speak from the social science perspective in the first part of class, and I will talk about computational approaches to these issues in the second part of class.
Either way, you will be required to write a short summary, which will count as part of your homework grade.
|
|
May 4 | More on machine translation |
M&S Ch 13 and Adam Lopez, Statistical Machine Translation,
In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008. |
|