INST 734
Information Retrieval Systems
Fall 2015
Module 2: Monday September 7 to Sunday September 13
This module takes us through the process of getting from a collection
of text documents to the inverted index that we will use in the next
module as a basis for ranked retrieval. The module is designed to be
completed in 12 hours over 7 days. As with every module, you must
complete all components of this module by midnight on the evening
of the indicated end date for the module (in this case, Sunday,
September 13).
Module Checklist
The recommended order for completing the activities in this module is:
Check ELMS to see if you have an additional reading summary assigned to you
this week. If so, complete that summary by midnight Thursday and
submit it using ELMS. Be sure to follow the guidance on page 2 of
the Example Reading
Summary that was provided in Module 1.
Project
Overview (6 minutes) [Correction: The optional text chat will
be on Thursday September 17 at 5 PM]
Optional Supplementary Videos
A lecture on
Natural Language Processing from Cheng Zhai's
University of Illinois MOOC on Text Retrieval and Search
Engines. You will need a (free) Coursera account to access
this video.
A lecture on Implementation of Text Retrieval Systems from Cheng Zhai's University of
Illinois MOOC on Text Retrieval and Search Engines. You will
need a (free) Coursera account to access this video.
A lecture on System Implementation: Inverted Index Construction from Cheng Zhai's University of
Illinois MOOC on Text Retrieval and Search Engines. You will
need a (free) Coursera account to access this video.
A lecture on The Inverted
Index from Chris Manning's Stanford MOOC on Natural
Language Processing.
A lecture on Boolean
Retrieval from Chris Manning's Stanford MOOC on Natural
Language Processing.
A lecture on Phrase
Queries from Chris Manning's Stanford MOOC on Natural
Language Processing.
A lecture on Indexing
from Chirag Shah's 2014 online Introduction to IR Systems
Development course at Rutgers. (this is a large file video
download, not streaming video, so allow some time).
A guest lecture by Bill Graham on Introduction
to Hadoop from Marti Hearst's Fall 2012 i290 course on
Analyzing Big Data with Twitter at the University of
California, Berkeley.
Finally, complete Exercise E2. Like all
assignments that you are asked to turn in, this is due at midnight on
the last day of the module (which, except for the last module, is
always a Sunday night). Note that you are allowed to work with other
students on exercises, but you must type in the results yourself (no
cut and paste -- see the course description for details).
Doug Oard
Last modified: Thu Dec 14 08:22:23 2017