UMCP: iSchool: INST 734: Spring 2018: User Evaluation Design

INST 734
Information Retrieval Systems
Spring 2018
Project User Evaluation Design (Assignment P6)

This document applies only to instructor-designed projects.

The goal for this part of the project is to design the user study that will represent one of the two major parts of your project (the other will be the batch evaluation you will conduct).

You should think of your batch evaluation and your user study as ways of answering different aspects of the same question: "How well does our system help the user achieve their goals?" Because user studies are generally more expensive than batch evaluations (i.e., they take more time to plan, execute and analyze if done well), it is prudent to focus your user study on aspects of the question that are not amenable to batch evaluation. Some examples of these kinds of questions are:

Do novice users find the system easy to learn?
Can users easily learn to formulate effective queries using our system?
Are there common mistakes or misunderstandings that could be addressed by a better design?

Those are just examples -- you can surely think of many more. Which brings us to the second important point: user studies must be focused if they are to be useful. In other words, you need to decide on a few questions you most want to answer. Think back to exercise E1 and the issues that came to your mind as you tried a new system for a new task. Thinking like a user is one important key to asking the right questions.

Once you have decided what you want to know, you are ready to choose a study design. There are two basic kinds of study designs:

Quantitative. In these, you do most of the work before the study by selecting an independent variable (what you change), one or more dependent variables (what you measure), and a very specific study protocol (e.g., what you ask people to search for, in what order they perform those search tasks, and how much time you will allow for each). This advance work makes the analysis after the study very simple -- you just plot the relationship between the dependent variable and the independent variables on a graph and draw your conclusions (often with the help of a statistical analysis to determine the degree to which those conclusions would generalize to other users). This approach can be a good choice if someone on your team has already taken a course on study design (e.g., LBSC 701, LBSC 802, PSYC 601, or EDMS 645).
Observational. In these, the only thing you do before the study is design a fairly general study protocol (e.g., "take half an hour and search for two different things that interest you"). Then during the study you collect data about what happens in several ways (e.g., through over-the-shoulder observation, with a system-generated log file, and with a post-session interview). The real work then begins, as you draw insights from your observations by defining a consistent set of things that are of interest to you (e.g., recovering from mistakes) and identifying where they occurred in your data.

Generally, observational studies are necessary if you want to see things that can only be seen when people are doing an internally-motivated task (because the protocol in a quantitative study must standardize what people search for in order to make it possible to compare performance under different circumstances). Quantitative studies are normally used to compare two variants of your system under controlled conditions (e.g., simple and advanced search interfaces).

Once you have chosen goals for your study and the kind of study design you want to use, you should consult one or more examples of studies using that kind of design. Here's one of each that you could start with:

A quantitative study comparing two variants of a system for doing cross-language information retrieval (which is the focus of Module 11)
An observational study of a cross-language information retrieval system

Of course, your study shouldn't be as ambitious as either of these. You'll probably want to recruit study participants from the members of the class, so you should probably limit the size of your study to four users and the length of your sessions to two hours (including training time, which you shouldn't scrimp on if you hope to learn anything from the study).

Once you have a study design, you should test it on someone who did not contribute to your study design and who will not be one of your actual study participants. This is called a "pilot study", and you will almost surely learn of the need for some improvements (e.g., in how you do training, or in how you collect data). You can go do this with almost anyone since you are not studying them, you are simply trying out your study design. But it is best if you don't use a member of your team (because they know too much!).

Normally you would want to recruit study participants who are representative of the people who would really use a system like the one you are building, and if you can find such people and motivate them to participate that would be great (but see below on IRB requirements if you want to make a publishable study out of this). But in practice, most of you will actually recruit your classmates. The only restriction on this is that if two teams are working on the same problem they should not recruit from each other.

With that as background, here is what should be in your plan:

Identify your team and describe the system that you have been assigned to evaluate.
State the questions you seek to answer. These are your choice -- they are not assigned to you. Don't be too ambitious, but do select questions that are worth answering.
Identity which kind of study design you plan to do (qualitative or quantitative)
Present a detailed study design. This shoudl include details on both the pilot study and your actual your study, including the number of participants, what you will ask them to do, how you will collect data, and how you will do the analysis of the results.
Finish up with a brief description of the kinds of results you expect to be able to report. Don't ne too specific here -- you are oing the study because you don't actually know the answer to this question in detail yet. But the key is to be able to confirm that the study you are doing will be able to provide you with answers to the questions you have asked.

As general guidelines, your study design should fit comfortably on four pages, and your study (including all planning, the pilot study, the analysis, and reporting results) should require around 50% of the total time available for your project and analysis (8 weeks * 5 hours per week * the number of people on your team * 50% ... for a 2 person team this number is around 40 hours).

If you plan to publish the results of your study, advance approval from the University of Maryland Institutional Review Board (IRB) is required. If your results will not be published, that step is not required. IRB approval can take a month, so get started on this way earlier than the assignment due date if you do want to be able to publish your results.

Submit your user evaluation design using ELMS.

I will send you comments on your submission, but (as with all the pieces) the overall project grade will be assigned holistically rather than being determined by a fixed formula.

Doug Oard

Last modified: Sun Mar 4 18:20:16 2018

INST 734 Information Retrieval Systems Spring 2018 Project User Evaluation Design (Assignment P6)

INST 734
Information Retrieval Systems
Spring 2018
Project User Evaluation Design (Assignment P6)