Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2018.

In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!

In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.

See CL Colloquium Talks for possible extra credit each week.

Class Topic
Readings* Assignments Other
Jan 24 Course organization, semester plan; knowledge-driven and data-driven NLP
M&S Ch 1, 2.1.[1-9] (for review)
Assignment 1

Language Log (the linguistics blog); Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest, make sure to read the comment threads also because they're often excellent)
Jan 31 Lexical association measures and hypothesis testing
M&S Ch 5
Assignment 2

Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations. Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events.

A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a very readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. (Show that one to your friends and family!) For an interesting response, see Most Published Research Findings Are False--But a Little Replication Goes a Long Way".

Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research.

Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book.

Feb 7 Information theory
M&S Ch 2.2, M&S Ch 6

Piantadosi et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density

Assignment 3 Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into many concepts from this lecture with greater rigor but a lot of clarity.

Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.

Roger Levy provides a formal proof that uniform information density minimizes the "difficulty" of interpreting utterances. The proof assumes that, for any given word i in an utterance, the difficulty of processing it is some power k of its surprisal with k > 1.

Feb 14 HMMs and Expectation Maximization
Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read my EM recipe discussion.
Assignment 4

Recommended reading (and code to look at!): Dirk Hovy's Interactive tutorial on the Forward-Backward Expectation Maximization algorithm. Note that although his iPython notebook is designed to be interactive, you can also simply read it.
Feb 21 Guest lecture (Han-Chin Shing): Reduced-dimensionality representations for words Efficient Estimation of Word Representations in Vector Space (with a focus on the the network architecture of CBOW and SkipGram); Distributed Representations of Words and Phrases and their Compositionally (with a focus on hierarchical softmax and negative sampling); Deep Learning with PyTorch: A 60 Minute Blitz (a really good tutorial for PyTorch) Assignment 5
Feb 28 Reduced-dimensionality representations for documents: Gibbs sampling and topic models Read Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated; watch Jordan Boyd-Graber's 2013 CL1 topic modeling lecture (20 minutes, slides/notes available here.
Assignment 6 Recommended reading: Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning.

March 7 Context-free parsing M&S Ch 11 (esp. pp. 381-388) and Ch 12 (esp. pp. 408-423, 448-455) Assignment 7. A really nice article introducing parsing as inference is Shieber et al., Principles and Implementation of Deductive Parsing (see Sections 1-3), with a significant advance by Joshua Goodman, Semiring Parsing.
Mar 14 Evaluation Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook.
Cohen and Howe, How Evaluation Guides AI Research
See Pereira, Formal grammar and information theory: together again? for discussion of probabilistic grammar and the argument made by Chomsky involving the sentences Colorless green ideas sleep furiously and Furiously sleep ideas green colourless.
Mar 21 Spring Break
Have fun!
Mar 28 Guest lecture (Joe Barrow): Sequence and seq2seq models No required readings Take-home midterm handed out, due 11:59pm Sunday April 1.
April 4 Deep learning and linguistic structure: a broader perspective

Yoav Goldberg, A Primer on Neural Network Models for Natural Language Processing. Read with an emphasis on Sections 1-4, and Section 5 (embeddings); also look over 10-11 (RNNs) and 12 (recursive NNs) and the Wikipedia page on autoencoders. Project handed out. Project plans due in one week. Recommended: Sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Also recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives.

Other useful background reading for broader perspective: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations.

April 11 Machine translation
Koehn, Statistical Machine Translation; Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation; M&S Ch 13 and Adam Lopez, Statistical Machine Translation, ACM Computing Surveys 40(3), Article 8, pages 149, August 2008; Wu et al. (2016), Google's neural machine translation system: Bridging the gap between human and machine translation. Work on your project from here on out!
April 18 Text analysis in computational social science
April 25 Text analysis in computational social science, continued
May 2 Structured prediction
Reference material in: Noah Smith, Structured Prediction for Natural Language Processing; Ke Wu, Discriminative Sequence Labeling Some useful historical background: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP.

For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automata-theoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation as well as cdec.

Also potentially of interest:

May 9 Tentative: Natural language "understanding" Bill MacCartney's excelleng tutorial on semantic parsing Project is due 11:59pm ET May 18