INST 734
Information Retrieval Systems
Spring 2018
Project User Evaluation Design (Assignment P6)
This document applies only to instructor-designed projects.
The goal for this part of the project is to design the user study that
will represent one of the two major parts of your project (the other
will be the batch evaluation you will conduct).
You should think of your batch evaluation and your user study as ways
of answering different aspects of the same question: "How well does
our system help the user achieve their goals?" Because user studies
are generally more expensive than batch evaluations (i.e., they take
more time to plan, execute and analyze if done well), it is prudent to
focus your user study on aspects of the question that are not amenable
to batch evaluation. Some examples of these kinds of questions are:
- Do novice users find the system easy to learn?
- Can users easily learn to formulate effective queries using our
system?
- Are there common mistakes or misunderstandings that could be
addressed by a better design?
Those are just examples -- you can surely think of many more. Which
brings us to the second important point: user studies must be focused
if they are to be useful. In other words, you need to decide on a few
questions you most want to answer. Think back to exercise E1 and the
issues that came to your mind as you tried a new system for a new
task. Thinking like a user is one important key to asking the right
questions.
Once you have decided what you want to know, you are ready to choose a
study design. There are two basic kinds of study designs:
- Quantitative. In these, you do most of the work before the
study by selecting an independent variable (what you change), one or
more dependent variables (what you measure), and a very specific
study protocol (e.g., what you ask people to search for, in what
order they perform those search tasks, and how much time you will
allow for each). This advance work makes the analysis after the
study very simple -- you just plot the relationship between the
dependent variable and the independent variables on a graph and draw
your conclusions (often with the help of a statistical analysis to
determine the degree to which those conclusions would generalize to
other users). This approach can be a good choice if someone on your
team has already taken a course on study design (e.g., LBSC 701, LBSC
802, PSYC 601, or EDMS 645).
- Observational. In these, the only thing you do before the study
is design a fairly general study protocol (e.g., "take half an hour
and search for two different things that interest you"). Then during
the study you collect data about what happens in several ways (e.g.,
through over-the-shoulder observation, with a system-generated log
file, and with a post-session interview). The real work then begins,
as you draw insights from your observations by defining a consistent
set of things that are of interest to you (e.g., recovering from
mistakes) and identifying where they occurred in your data.
Generally, observational studies are necessary if you want to see
things that can only be seen when people are doing an
internally-motivated task (because the protocol in a quantitative
study must standardize what people search for in order to make it
possible to compare performance under different circumstances).
Quantitative studies are normally used to compare two variants of your
system under controlled conditions (e.g., simple and advanced search
interfaces).
Once you have chosen goals for your study and the kind of study design
you want to use, you should consult one or more examples of studies
using that kind of design. Here's one of each that you could start
with:
Of course, your study shouldn't be as ambitious as either of these.
You'll probably want to recruit study participants from the members of
the class, so you should probably limit the size of your study to four
users and the length of your sessions to two hours (including training
time, which you shouldn't scrimp on if you hope to learn anything from
the study).
Once you have a study design, you should test it on someone who did
not contribute to your study design and who will not be one of your
actual study participants. This is called a "pilot study", and you
will almost surely learn of the need for some improvements (e.g., in
how you do training, or in how you collect data). You can go do this
with almost anyone since you are not studying them, you are simply
trying out your study design. But it is best if you don't use a
member of your team (because they know too much!).
Normally you would want to recruit study participants who are
representative of the people who would really use a system like the
one you are building, and if you can find such people and motivate
them to participate that would be great (but see below on IRB
requirements if you want to make a publishable study out of this).
But in practice, most of you will actually recruit your classmates.
The only restriction on this is that if two teams are working on the
same problem they should not recruit from each other.
With that as background, here is what should be in your plan:
- Identify your team and describe the system that you have been
assigned to evaluate.
- State the questions you seek to answer. These are your choice --
they are not assigned to you. Don't be too ambitious, but do
select questions that are worth answering.
- Identity which kind of study design you plan to do (qualitative
or quantitative)
- Present a detailed study design. This shoudl include details on
both the pilot study and your actual your study, including the
number of participants, what you will ask them to do, how you
will collect data, and how you will do the analysis of the
results.
- Finish up with a brief description of the kinds of results you
expect to be able to report. Don't ne too specific here -- you
are oing the study because you don't actually know the answer to
this question in detail yet. But the key is to be able to
confirm that the study you are doing will be able to provide you
with answers to the questions you have asked.
As general guidelines, your study design should fit comfortably on
four pages, and your study (including all planning, the pilot study,
the analysis, and reporting results) should require around 50% of the
total time available for your project and analysis (8 weeks * 5 hours
per week * the number of people on your team * 50% ... for a 2 person
team this number is around 40 hours).
If you plan to publish the results of your study, advance approval
from the University of Maryland Institutional Review Board (IRB) is
required. If your results will not be published, that step is not
required. IRB approval can take a month, so get started on this way
earlier than the assignment due date if you do want to be able to
publish your results.
Submit your user evaluation design using ELMS.
I will send you comments on your submission, but (as with all the
pieces) the overall project grade will be assigned holistically rather
than being determined by a fixed formula.
Doug Oard
Last modified: Sun Mar 4 18:20:16 2018