Computational Linguistics I
CMSC 723
Fall 2012
![]() |
Computational Linguistics I
CMSC 723 Fall 2012
|
![]() |
Computational linguistics (CL) is the science of doing what linguists
do with language, but using computers. Natural language processing
(NLP) is the engineering disciplin of doing what people do with
language, but using computers. Despite the title, most of this course
is actually about NLP (not CL), but we'll still do CL along the way.
CL and NLP are broad fields and we cannot possibly cover everything.
We will cover both rule-based and statistical approaches to a wide
variety of challenging problems in natural language processing and
computational linguistics. We will discover that
language ambiguity is the rabid wolf of NLP, and develop
techniques to try to tame it. Along the way, we will see some
linguistic theories developed specifically for computational
linguistics, which sheds some like on what sort of linguistic models
make sense, computationally.
Prerequisites: You must be able to program. You must find
language interesting. Anyone who has taken and undergrad AI course, a
machine learning course, an algorithms course, or LING 689/889
(Computational Psycholinguistics) should be able to to well in this
course. That said, it is also a prerequisite that you be willing to
work hard and catch up on things you don't know on your own. In
particular, the following are considered background material
and I will not cover them (though you must know their
contents): Unix
for Poets
and very
basic prob/stats
and slightly
less basic stats.
![]() |
The official textbook is: Speech and Language Processing (Second Edition) by Dan Jurafsky and James Martin(ISBN 978-0-13-605234-0) Other recommended (but not required) books:
|
The purpose of grading (in my mind) is to provide extra incentive for
you to keep up with the material and to ensure that you exit the class
as a computational linguistics genius. If everyone gets an A, that
would make me happy (sadly, it hasn't happened yet). The components
of grading are:| 20% | Written homeworks There are eleven written homeworks (roughly one per week). Your lowest score will be dropped. Each of the remaining 10 is worth 2% of your final grade. They are graded on a high-pass (100%), low-pass (50%) or fail (0%) basis. These are to be completed individually. Your lowest scoring homework will be dropped. (The initial homework, HW00, is not graded, but required if you do not want to fail.) | |
| 30% | Programming projects There are three programming projects, each worth 10% of your final grade. You will be graded on both code correctness as well as your analysis of the results. These must be completed in teams of two or three students, with cross-department (eg., CS to linguistics) teams highly encouraged. | |
| 20% | Midterm exam There will be an in-class "midterm" exam in early November. | |
| 30% | Final exam The structure of the final exam is a take-home practical project of your choosing (but you must clear it with me). During the final exam slot, you will give brief presentations of your work (probably poster presentations; this is TBD). |
| Date | Topics | Required Readings | Suggested Readings | Due |
| 30 Aug |
Welcome to Computational Linguistics
What is this class about, linguistic phenomena |
- | - | - |
| 04 Sep |
Regular languages
Finite state machines and baby morphology |
2-2.2, 3-3.4 | - | HW00 |
| 06 Sep |
N-gram models
Language modeling and information theory |
4-4.2, 4.4-4.5 | - | HW01 |
| 11 Sep |
Noisy channel models
Automatic morphological disambiguation |
3.5, 5.9, Link (3, 4-4.2) | - | - |
| 13 Sep |
Unsupervised learning via EM
In-class example: morphological disambiguation |
![]() |
- | HW02 |
| 18 Sep |
Unsupervised learning II
Word alignment and model 1 |
25.5-25.6 | - | - |
| 20 Sep |
Phonological change
Cognate lists and diachronic linguistics |
(not all willmake sense: that's ok) |
- | HW03 |
| 25 Sep |
Indroduction to syntax
Flavors, tests and goals |
![]() |
- | P1 |
| 27 Sep |
Part of speech tagging
Finite state solutions |
5.1-5.5 | - | HW04 |
| 02 Oct |
Syntactic parsing
Treebanks, PCFGs and the CKY algorithm |
13, 13.2-13.4.1, 14.2 | - | - |
| 04 Oct |
Dependency grammars
Graph-based models |
(through 3.3), 12.7 |
- | HW05 |
| 09 Oct |
Left-to-right parsing
Efficient and psycholinguistically plausible |
12.9, 14.10, ![]() |
![]() |
- |
| 11 Oct |
Discussion about projects
(and catch-up...) |
TBD | - | HW06 |
| 16 Oct |
Working Day
(come with P2 questions) |
None | - | - |
| 18 Oct |
Categorial grammars
Representation and parsing |
or ![]() |
- | P2+HW07 |
| 23 Oct |
Lambda calculus/first order logic
Semantic interpretation |
17.1, 17.3 | ![]() |
- |
| 25 Oct |
Lexical semantics
Word sense disambiguation |
19-19.3, 20.7 | - | HW08 |
| 01 Nov |
Multilingual semantics
Lexical substitution |
// ![]() |
- | Proposals |
| 01 Nov |
6pm, AVW 3258!
Images and words: Multimodal inference of meaning |
None | - | HW09 |
| 06 Nov | No class: Hal is sick | - | - | - |
| 08 Nov |
Textual inference
Entailment and paraphrasing |
(sec 1.0, 1.3, 2.4, 4.2, 4.3) |
- | - |
| 13 Nov |
Words and actions
Following instructions |
![]() |
![]() |
P3 |
| 15 Nov |
Anaphora and coreference
Supervised and unsupervised approaches |
![]() |
- | - |
| 20 Nov | Midterm review | - | - | - |
| 27 Nov |
Midterm
One double-sided page of notes allowed |
- | - | - |
| 29 Nov |
Multilingual language processing
Shared representations and learning history |
None | - | Progress |
| 04 Dec |
Topic modeling
(Guest speaker: Jordan Boyd-Graber) |
![]() |
- | HW10 |
| 06 Dec |
Modeling conversations
(Guest speaker: Philip Resnik) |
![]() |
![]() |
- |
| 11 Dec |
Computational Phonology
(Guest speaker: Ewan Dunbar) |
![]() |
- | HW11 |