Multilingual Modeling (CMSC 828F)

Multilingual Modeling
CMSC 828F
Fall 2011

Schedule:	Fri 10:30-11:45
Location:	CSI 3118
Instructor:	Hal Daume III:
Office Hours:	AVW 3227; TBD or by appointment
Piazza:	UMD/cs828ff11

Background and Description

There are a lot of languages out there. The goal of this course is to learn about a handful of different styles of techniques that are relevant to trying to model them as a whole. We will discuss questions of natural language processing (building systems to deal with lots of languages), computational linguistics (using and uncovering latent structure in Language) and computational psycholinguistics (explaining why things are as they are). The course will be almost entire project based in teams that (to the extent possible) span departments.

The class will operate as follows:

Week 1-3: I will discuss several types of models that seem relevant to multilingual modeling, such as probabilistic models, finite state techniques and linear algebraic models. Interleaved, I will discuss some potential project ideas that I think are interesting.
Week 4: You will each, individually, propose a project idea. Based on these proposals, we will begin refining, merging and whittling proposals.
Week 5: Final project teams (hopefully 3-4 teams) are decided, with concrete project proposals.
Weeks 6-9: We will all read papers relevant to the projects that you are, at this point, actively working on. We will continue with iterative refinement of the projects.
Week 10: Project updates.
Weeks 11-12: Continue readings, working and reflecting.
Week 13: Final project presentations, submit papers :).

To be clear, while there are projects that I think are interesting, part of the point of this course is to learn from each other. I would be perfectly happy if everyone came in with their own problem and we all figured out together how to solve it. Some ideas I have for projects include things like: joint syntactic models over dozens of languages using typological knowledge; uniform information density accounts of typological generalizations; spelling-to-pronunciation modeling across many languages; using (human) second-language acquisition knowledge to help (machine) second-language acquisition; modeling cross-linguistic discourse structure. Or anything else you know about that's fun and interesting that I've never thought of! Prereqs: Students should have taken at least one of the following courses: Computational Linguistics I, Machine Learning or Computational Modeling of Language. (Or you should at least have some knowledge about these things.) You should also be very interested in this topic: please do not take this course unless you're really going to be involved.