LBSC 796 Homework 3

*** The file names here will probably change ***

All of the following files are available from any glue machine:

/software/hawk3/trec/qrels/qrels.latimes    TREC Relevance judgments
/software/hawk3/trec/latimes/*		    Documents (one file per day)
/software/hawk3/trec/topics/topics.301-350  TREC Topics

The following additional files are specifically for homework 3 - they
include the portions of the above files that were actually used to
produce the output files that you will use:

/software/hawk3/lbsc708a/hw3/queries.titles      What InQuery got
/software/hawk3/lbsc708a/hw3/inquery.top1000     What it gave back
/software/hawk3/lbsc708a/hw3/compute_statistics  Compute the answers
/software/hawk3/lbsc708a/hw3/trec_eval           Used by compute_statistics

The same files are also temporarily available from any WAM machine at:

/pub/wangjq/qrels.latimes       TREC Relevance judgments
/pub/wangjq/latimes/*	        Documents (samples only, one file per day)
/pub/wangjq/topics.301-350      TREC Topics
/pub/wangjq/queries.titles      What InQuery got
/pub/wangjq/inquery.top1000     What it gave back
/pub/wangjq/compute_statistics  Compute the answers
/pub/wangjq/trec_eval           Used by compute_statistics

To compute the statistics, just do the following:

cd ~
cp /pub/wangjq/compute_statistics ~
cat compute_statistics
chmod +x compute_statistics
compute_statistics
pico hw3.statistics

The first two commands get you a copy of the shell script
The next command lest you look at it
The next command makes your copy of the shell script executable
The next two commands run the shell script and let you look at the
results
The last command prints the results on the white printer outside my office

Lots of variations on these commands will work equally well, but this
recipe should work if it all looks like greek and you just want to
know what to type :-)

The results have 10 sections, one for each query, followed by a
summary section (numbered 10 instead of 301-310).  In the summary
section you can see the average precision at 11 "interpolated recall -
precision averages" between 0.0 and 1.0 and the "noninterpolated
average precision".  

Please plot a recall precision graph using the 11 values in the
summary section.  You can either use graph paper and a pencil or a
graphing or spreadsheet program like Microsoft Excel to do this.
Please mark the overall average precision on your graph.

In addition to this recall precision graph, examine (but do not turn
in) the statistics for other queries and provide brief answers to
the following questions:

1) Is there much variation in the precision at various levels of
recall on a query-by query basis?  Why?

2) Would it be more useful to report the precision at various fixed
document cutoffs instead of the precision at various levels of recall?
Why or why not?

If you would like to see how results are reported at TREC, check out
the publications section of http://trec.nist.gov

If you would like to learn more about the trec_eval software you can
obtain your own copy at ftp://ftp.cs.cornell.edu/pub/smart/

----------------------------------------- End ------------------------