Run Submission

This assignment applies only to teams that are performing the Structured Evaluation Exercise as their term project.

For this task, the entire team EXCEPT the requesting party will work together to identify a set of documents that (to the extent possible) are topically relevant to the production request. A second set of documents, those that exactly match the Final Boolean Query, should also be submitted. Note that the Final Boolean Query does NOT define relevance; a document is topically relevant if an only if it is judged relevant in the opinion of the REQUESTING PARTY.

The team is required to work with only ONE of the three production requests. In general, they should select a production request that they expect to be relatively rich in relevant documents, since evaluation results for topics to which no documents are relevant are rather boring.

To produce the Boolean result set, simply run the Boolean Query using the search system provided by Tamer Elsayed. You will need to learn Lucene's query language. A Web search will also find other tutorials.

You may use any technique that you like to produce the set of (hopefully!) relevant documents. One technique that should work reasonably well is to iteratively form progressively more complex queries (e.g., using the pearl growing technique from LBSC 650), examining some search results at each iteration, and confirming your choices with the topic authority periodically (unless, of course, you ARE the topic authority, which is permissible. Teamwork can be helpful here, since two minds have been shown to be better than one (and, paradoxically, often less good than two really should be).

Once you have a really great query, then you need to estimate where to put the cutoff. Your goal is to optimize the balanced F measure, which is achieved by putting the cutoff at the position that will make recall and precision equal (recall generally increases with deeper cutoffs, precision generally decreases with deeper cutoffs). You can use a sampling strategy to try to estimate what cutoff rank will result in the optimal value for F.

Once you have chosen your set of documents documents, give the document identifiers for all of the documents that you believe to be relevant to the instructors. If your technique produces a best-first order (as the technique above will), leave your documents sorted in that order and note that fact with your submission. At the same time (or earlier), provide the Boolean result set (in any order). If your files are large, you may need to use a memory stick to transfer the files.

This assignment is due at 6 PM on the date indicated in the syllabus.