This is the schedule of topics for Computational Linguistics II, Spring 2017.
In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.
See CL Colloquium Talks for possible extra credit each week.
|Jan 25||Course organization, semester plan; knowledge-driven and data-driven NLP
||M&S Ch 1, 2.1.[1-9] (for review)
|Language Log (the linguistics blog); Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest, make sure to read the comment threads also because they're often excellent)|
|Feb 1||Lexical association measures and hypothesis testing
||M&S Ch 5
Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations.
Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events.
A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a very readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. (Show that one to your friends and family!) For an interesting response, see Most Published Research Findings Are False--But a Little Replication Goes a Long Way".
Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research.
|Feb 8||Information theory
||M&S Ch 2.2, M&S Ch 6
Piantadosi et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density
Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into many concepts from this lecture with greater rigor but a lot of clarity.
Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.
Roger Levy provides a formal proof that uniform information density minimizes the "difficulty" of interpreting utterances. The proof assumes that, for any given word i in an utterance, the difficulty of processing it is some power k of its surprisal with k > 1.
|Feb 15||HMM review and Expectation Maximization
||Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read
my EM recipe discussion.
Recommended reading (and code to look at!): Dirk Hovy's Interactive tutorial on the Forward-Backward Expectation Maximization algorithm. Note that although his iPython notebook is designed to be interactive, you can also simply read it.
|Feb 22||Bayesian graphical modeling, Gibbs sampling||Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated.||Assignment 5: Do the readings for next week, about which there will be a short in-class quiz|
|March 1||Topic models
M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning and/or review the CL1 topic modeling lecture (notes, video).
|March 8||Context-free parsing||M&S Ch 11 (esp. pp. 381-388) and Ch 12 (esp. pp. 408-423, 448-455)||Assignment 7. This is a lighter assignment worth 50% of a regular homework.||A really nice article introducing parsing as inference is Shieber et al., Principles and Implementation of Deductive Parsing (see Sections 1-3), with a significant advance by Joshua Goodman, Semiring Parsing.|
|Mar 15||Guest lecture. Allyson Ettinger: Psycholinguistics for Computational Linguists
||No required readings||Take-home midterm handed out, due 11:59pm March 18|
|Mar 22||Spring Break
|March 29||More on parsing; Evaluation||
Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook.
Cohen and Howe, How Evaluation Guides AI Research
|April 5||Neural network models, deep learning, embeddings (tentative)
|Yoav Goldberg, A Primer on Neural Network Models for Natural Language Processing. Read with an emphasis on Sections 1-4, and Section 5 (embeddings); also look over 10-11 (RNNs) and 12 (recursive NNs) and the Wikipedia page on autoencoders.||Project handed out. Project plans due in one week.||
Recommended: Sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Also recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives.
Other useful background reading for broader perspective: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations.
|April 12||Guest lecture: Han-Chin Shing, Advanced topics in vector space models||Necessary background: Turney and Pantel (2010), From Frequency to Meaning: Vector Space Models of Semantics||Recommended: Faruqui et al. (2015), Retrofitting Word Vectors to Semantic Lexicons; Fyshe et al. (2015), A Compositional and Interpretable Semantic Space; Labutov and Lipson (2013), Re-embedding Words|
|April 19||Machine translation
||Koehn, Statistical Machine Translation; Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation; M&S Ch 13 and Adam Lopez, Statistical Machine Translation, ACM Computing Surveys 40(3), Article 8, pages 149, August 2008.||
|April 26||More on machine translation; structured prediction
||Noah Smith's (2004) Log-Linear Models (selections); Ke Wu, Discriminative Sequence Labeling;||
Historical background: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP.
For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automata-theoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation.
Also potentially of interest:
Montreal's LISA laboratory is an epicenter for work on neural machine translation.
|May 3||Relating neural MT and structured prediction||Kyunghyun Cho, Natural Language Understanding with Distributed Representation, Chapter 6; Noah Smith's (2004) Log-Linear Models (emphasis on model, not estimation)|
|May 10||Text analysis in computational social science||Project is due 11:59pm May 17|