INST 734
Information Retrieval Systems
Fall 2015
Project Batch Evaluation Results (Assignment P11)
Once you have completed your batch evaluation, you will turn in a
written report describing your evaluation results. Here's a brief
outline for what should be in the report:
- Describe the problem that you are solving (not much detail is
needed on this for class purposes) [0.5-1 page; all of these page
counts assume your report is single-spaced]
- Describe the design of your evaluation, generally including: what
test collections you used, why you selected those test collections,
what topics you used, why you selected those topics, how you
formulated the queries for those topics, what comparison function you
evaluated, what parameter(s) you varied, how you varied those
parameter(s), why you varied those parameter(s) in that way, what
evaluation measure(s) you used, and why you selected those evaluation
measure(s). [1-2 pages]
- Present your results using one or more tables and/or graphs
(either way is fine), along with some explanatory comments. These
should include your evaluation results, of course, but you will want
to present them in a manner that facilitates informative comparisons.
If you will present your batch evaluation results in your video
presentation (assignment P14) you can choose either your batch
evaluation or your user study for P14) then you will want to be sure
to present your results in a way that you can easily reuse in that
video. Don't forget to do statistical significance tests for the
differences that you will want to focus on in the next section (not
for every pair of values, but for the ones that will be a part of the
story that emerges from your analysis). [1-2 pages, most of which
will be tables or graphs]
- Analysis of your results. Don't get too carried away here -- the
key is to draw some interesting conclusions from your evaluation that
go beyond the simple numbers. You should do more than just say whether
the numbers went up or down. Why did they go up or down? Were the
results different on different collections? Why? Were the results
different for different queries? Why? Do those sorts of comparisons
lead to different conclusions when you use different evaluation
measures? Why? [1-2 pages]
- Some appropriate way of ending (e.g., remarks on limitations, one
or more suggestions for future work, etc.) [0.5 page]
The total should be about 6 pages. Several good examples are
available from TREC or SIGIR (which
is available through the ACM Digital Library in Research Port).
Submit your batch evaluation report using ELMS. I will provide
feedback to you by email.
Doug Oard
Last modified: Sat Oct 24 10:16:28 2015