This is the schedule of topics for Computational Linguistics II, Spring 2018.
In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the online class discussions for "official"
dates.
See CL Colloquium Talks for possible extra credit each week.
Class  Topic 
Readings*  Assignments  Other 

Jan 24  Course organization, semester plan; knowledgedriven and datadriven NLP 
M&S Ch 1, 2.1.[19] (for review) 
Assignment 1

Language Log (the linguistics blog); Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest, make sure to read the comment threads also because they're often excellent) 
Jan 31  Lexical association measures and hypothesis testing 
M&S Ch 5 
Assignment 2

Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chisquared and getting inflated values for lowfrequency observations.
Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events.
A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a very readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. (Show that one to your friends and family!) For an interesting response, see Most Published Research Findings Are FalseBut a Little Replication Goes a Long Way". Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research. Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book. 
Feb 7  Information theory 
M&S Ch 2.2, M&S Ch 6 Piantadosi et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density 
Assignment 3 
Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into many concepts from this lecture with greater rigor but a lot of clarity. Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, InformationTheoretic Models of Language and Cognition, which looks as if it was awesome. Roger Levy provides a formal proof that uniform information density minimizes the "difficulty" of interpreting utterances. The proof assumes that, for any given word i in an utterance, the difficulty of processing it is some power k of its surprisal with k > 1. 
Feb 14  HMMs and Expectation Maximization 
Skim M&S Ch 910, Chapter 6 of Lin and Dyer. Read
my EM recipe discussion. 
Assignment 4

Recommended reading (and code to look at!): Dirk Hovy's Interactive tutorial on the ForwardBackward Expectation Maximization algorithm. Note that although his iPython notebook is designed to be interactive, you can also simply read it.

Feb 21  Guest lecture (HanChin Shing): Reduceddimensionality representations for words  Efficient Estimation of Word Representations in Vector Space (with a focus on the the network architecture of CBOW and SkipGram); Distributed Representations of Words and Phrases and their Compositionally (with a focus on hierarchical softmax and negative sampling); Deep Learning with PyTorch: A 60 Minute Blitz (a really good tutorial for PyTorch)  Assignment 5  
Feb 28  Reduceddimensionality representations for documents: Gibbs sampling and topic models 
Read Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated;
watch Jordan BoydGraber's 2013 CL1 topic modeling lecture (20 minutes, slides/notes available here.

Assignment 6 
Recommended reading: Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning.

March 7  Contextfree parsing  M&S Ch 11 (esp. pp. 381388) and Ch 12 (esp. pp. 408423, 448455)  Assignment 7.  A really nice article introducing parsing as inference is Shieber et al., Principles and Implementation of Deductive Parsing (see Sections 13), with a significant advance by Joshua Goodman, Semiring Parsing. 
Mar 14  Evaluation 
Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook. Cohen and Howe, How Evaluation Guides AI Research 
See Pereira, Formal grammar and information theory: together again? for discussion of probabilistic grammar and the argument made by Chomsky involving the sentences Colorless green ideas sleep furiously and Furiously sleep ideas green colourless.  
Mar 21  Spring Break 
Have fun!  
Mar 28  Guest lecture (Joe Barrow): Sequence and seq2seq models  No required readings  Takehome midterm handed out, due 11:59pm Sunday April 1.  
April 4  Deep learning and linguistic structure: a broader perspective

Yoav Goldberg, A Primer on Neural Network Models for Natural Language Processing. Read with an emphasis on Sections 14, and Section 5 (embeddings); also look over 1011 (RNNs) and 12 (recursive NNs) and the Wikipedia page on autoencoders.  Project handed out. Project plans due in one week. 
Recommended: Sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Also recommended: the nice overview of representation learning in sections 14 of Bengio et al. Representation Learning: A Review and New Perspectives.
Other useful background reading for broader perspective: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations. 
April 11  Machine translation 
Koehn, Statistical Machine Translation; Chiang, A Hierarchical PhraseBased Model for Statistical Machine Translation; M&S Ch 13 and Adam Lopez, Statistical Machine Translation, ACM Computing Surveys 40(3), Article 8, pages 149, August 2008; Wu et al. (2016), Google's neural machine translation system: Bridging the gap between human and machine translation.  Work on your project from here on out!  
April 18  Text analysis in computational social science 


April 25  Text analysis in computational social science, continued  
May 2  Structured prediction 
Reference material in: Noah Smith, Structured Prediction for Natural Language Processing; Ke Wu, Discriminative Sequence Labeling 
Some useful historical background: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP.
For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automatatheoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation as well as cdec. Also potentially of interest:


May 9  Tentative: Natural language "understanding"  Bill MacCartney's excelleng tutorial on semantic parsing  Project is due 11:59pm ET May 18 