This is the schedule of topics for Computational Linguistics II, Spring 2017.
In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the online class discussions for "official"
dates.
See CL Colloquium Talks for possible extra credit each week.
Class  Topic 
Readings*  Assignments  Other 

Jan 25  Course organization, semester plan; knowledgedriven and datadriven NLP 
M&S Ch 1, 2.1.[19] (for review) 
Assignment 1

Language Log (the linguistics blog); Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest, make sure to read the comment threads also because they're often excellent) 
Feb 1  Lexical association measures and hypothesis testing 
M&S Ch 5 
Assignment 2

Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chisquared and getting inflated values for lowfrequency observations.
Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events.
A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a very readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. (Show that one to your friends and family!) For an interesting response, see Most Published Research Findings Are FalseBut a Little Replication Goes a Long Way". Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research. Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book. 
Feb 8  Information theory 
M&S Ch 2.2, M&S Ch 6 Piantadosi et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density 
Assignment 3 
Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into many concepts from this lecture with greater rigor but a lot of clarity. Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, InformationTheoretic Models of Language and Cognition, which looks as if it was awesome. Roger Levy provides a formal proof that uniform information density minimizes the "difficulty" of interpreting utterances. The proof assumes that, for any given word i in an utterance, the difficulty of processing it is some power k of its surprisal with k > 1. 
Feb 15  HMM review and Expectation Maximization 
Skim M&S Ch 910, Chapter 6 of Lin and Dyer. Read
my EM recipe discussion. 
Assignment 4

Recommended reading (and code to look at!): Dirk Hovy's Interactive tutorial on the ForwardBackward Expectation Maximization algorithm. Note that although his iPython notebook is designed to be interactive, you can also simply read it.

Feb 22  Bayesian graphical modeling, Gibbs sampling 
Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated.
M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning and/or review the CL1 topic modeling lecture (notes, video).

Assignment 5: Do the readings for next week, about which there will be a short inclass quiz  
March 1  Topic models

M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning and/or review the CL1 topic modeling lecture (notes, video).

Assignment 6  
March 8  Contextfree parsing  M&S Ch 11 (esp. pp. 381388) and Ch 12 (esp. pp. 408423, 448455)  Assignment 7. This is a lighter assignment worth 50% of a regular homework.  A really nice article introducing parsing as inference is Shieber et al., Principles and Implementation of Deductive Parsing (see Sections 13), with a significant advance by Joshua Goodman, Semiring Parsing. 
Mar 15  Guest lecture. Allyson Ettinger: Psycholinguistics for Computational Linguists 
No required readings  Takehome midterm handed out, due 11:59pm March 18  
Mar 22  Spring Break 
Have fun!  
March 29  More on parsing; Evaluation 
Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook. Cohen and Howe, How Evaluation Guides AI Research 
Assignment 8  
April 5  Neural network models, deep learning, embeddings (tentative)

Yoav Goldberg, A Primer on Neural Network Models for Natural Language Processing. Read with an emphasis on Sections 14, and Section 5 (embeddings); also look over 1011 (RNNs) and 12 (recursive NNs) and the Wikipedia page on autoencoders.  Project handed out. Project plans due in one week. 
Recommended: Sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Also recommended: the nice overview of representation learning in sections 14 of Bengio et al. Representation Learning: A Review and New Perspectives.
Other useful background reading for broader perspective: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations. 
April 12  Guest lecture: HanChin Shing, Advanced topics in vector space models  Necessary background: Turney and Pantel (2010), From Frequency to Meaning: Vector Space Models of Semantics  Recommended: Faruqui et al. (2015), Retrofitting Word Vectors to Semantic Lexicons; Fyshe et al. (2015), A Compositional and Interpretable Semantic Space; Labutov and Lipson (2013), Reembedding Words  
April 19  Machine translation 
Koehn, Statistical Machine Translation; Chiang, A Hierarchical PhraseBased Model for Statistical Machine Translation; M&S Ch 13 and Adam Lopez, Statistical Machine Translation, ACM Computing Surveys 40(3), Article 8, pages 149, August 2008. 


April 26  More on machine translation; structured prediction 
Noah Smith's (2004) LogLinear Models (selections); Ke Wu, Discriminative Sequence Labeling; 
Historical background: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP.
For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automatatheoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation. Also potentially of interest:
Montreal's LISA laboratory is an epicenter for work on neural machine translation. 

May 3  Relating neural MT and structured prediction  Kyunghyun Cho, Natural Language Understanding with Distributed Representation, Chapter 6; Noah Smith's (2004) LogLinear Models (emphasis on model, not estimation)  
May 10  Text analysis in computational social science  Project is due 11:59pm May 17 