LBSC 796/INFM 718R: Homework 4

Please turn in this homework by linking it from your course Web page DURING CLASS on the due date. Please do not link it before that time.

Assignment adapted from James Allan's CMPSCI 646 course (Fall, 2004) at U. Mass.

Part 2: Evaluating Systems

This Excel spreadsheet contains three separate sets of judgments for the hits examined in Homework 2. For one of the topics, you will:

  1. analyze agreement on the relevance judgments
  2. adjudicate the judgments
  3. use the adjudicated set to evaluate both Teoma and Google

1. Agreement on Relevance Judgments

The above spreadsheet should contain three sets of judgments for every document (Web page). The first question you'll answer is: How often do judges agree on relevance? There are four possibilities:

For your chosen topic, figure out how often each case happens (both in terms of counts and in terms of percentage). Turn this information in. Pick three cases where judgments about a particular hit are not uniform, and briefly speculate why this may be so. Try to employ the concepts of relevance discussed in lecture. Turn this in.

2. Adjudication

Adjudication is simply the process of reconciling inconsistent judgments. Do this by simple majority voting. You do not need to turn anything in for this, but you will need the results for the third question.

3. Evaluation of Teoma and Google

Now, evaluate Teoma and Google using the adjudicated relevance judgments you just created (for the topic you chose). Make sure you are pooling judgments from both systems! Turn in the following information for both search engines:

In addition, answer the following questions:

Last update: October 12, 2007 (deletion of interpolated precision)