INFM 718G: Data-Intensive Information Processing Applications

I'll try to post lectures by 4pm the on the day of lecture. Readings for the next week will be posted by 4pm the previous week. Readings for each week are below the date of each class. (E.g. for the first week, read "The Unreasonable Effectiveness of Date" and the original MapReduce paper.) Do not use this page for when homework is due; it's due sometime during the week listed, but perhaps not on the same day as class.

Date Subject Assignment Due Lecture

Cancelled: Snow Day

2/3 Introduction: Computation and Storage at Scale [PDF]
Before class, please read:
2/10 Hadoop/Data Nuts and Bolts Assignment 1a [PDF]

Skim White Chapters 1-4

Look at Protocol Buffers (Sorry, I added this late, but have a look if you have time).

2/17 The MapReduce Programming Environment Assignment 1b [PDF]

Read White Chapters 5-6

Read Lin and Dyer Chapter 3

2/24 Text Retrieval [PDF]

Compute tf-idf with Dumbo

3/3 Graph Algorithms Assignment 2 [PDF]

Read Lin and Dyer Chapter 5

Explanation of PageRank (Optional)

3/10 Language Models [PDF]

Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. (2007) Large Language Models in Machine Translation. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.

3/17 Midterm
3/24 No Class: Spring Break
3/31 Web-Scale Databases Assignment 3 [PDF]

Read White chapters 11-13.

4/7 Probabilistic Models Assignment 4 (Due: 4/11) [PDF]

Read Lin and Dyer Chapter 6

Read paper on Latent Dirichlet Allocation

Rabiner's Tutorial on Hidden Markov Models (Optional)

4/14 Digging into Data at Scale Assignment 5 [PDF]
4/21 Building and Administering Hadoop Clusters [PDF]

Read White chapters 9-10.

4/28 Final Exam Assignment 6
5/5 Project Presentations Project Writeup (Due 5/10)