Relevance Judgment and Evaluation Results

This assignment applies only to teams that are performing the Structured Evaluation Exercise as their term project.

The first part of this assignment, relevance judgment, is performed only by the requesting party. Once relevance judgment has been completed, the entire team then works together to compute evaluation results.

Relevance judgments are performed on only the sampled documents, and only by the requesting party. They should not communicate with any other member of their team during this period (except in class, or to arrange schedules for computation of evaluation results). The conception of relevance applied by the request party should be that which resulted from their complaint, production request, Boolean negotiation, and any other "meet and confer" activities. Their opinion of relevance may change during the assessment process; if it does, they may need to reassess some documents. Ultimately, all sampled documents must be assessed to a standard of relevance that is internally consistent, and that is strictly in compliance with what the requesting party believes that they have communicated to the responding party as their intent.

Relevance judgments should be recorded in the exact machine-readable form (i.e., without retyping document identifiers -- use cut and paste if working with spreadsheets of other manually updated files). A simple function for recording relevance using on-screen buttons will, however, probably be available in the search system (implementation of that is not yet completed). Once the relevance judgments have been completed, they should be emailed to the instructors and they may not subsequently be changed for any reason. Discrepancies that are later discovered may, of course, be noted.

The second part of the assignment requires estimation of Precision, Recall, and F for both the Boolean result and for the the responding party's result set. Because the requesting party selected the sampling strategy, it is principally their responsibility to compute the estimates of Precision, Recall and F. Everyone can, and should, chip in. If you use the interactive task's sampling strategy, the formulae in the track overview paper and a calculator should be all that you need. If you use the Ad Hoc task's sampling strategy, you will need scripts (which are available from the track coordinators), and running them will require a bit of computer savvy,

The final, and most important part of the results preparation is analysis of why the results came our the way they did. Was the sampling appropriate? Did assessment errors cause serious problems? What is it about the Boolean query that caused it to do well (or poorly)? This analysis need not be completed by the due date for this assignment, but it will be needed for your presentation the next week. All you need to turn in for this assignment are the relevance judgments (early; as soon as they are frozen) and he P, R, and F values for the two result sets.