SIGIR 2007 Proceedings

Session 11: Interaction

Supporting Multiple Information-Seeking Strategies i n a Single System Framework
College of Computing and Information University at Albany, SUNY Albany, NY +1 518 591 8746

Xiaojun Yuan

Nicholas J. Belkin
School of Communication, Information & Library Studies Rutgers University New Brunswick, NJ +1 732 932 7500

xyuan@albany.edu ABSTRACT
This paper reports on an experiment comparing the retrieval effectiveness of an integrated interactive information retrieval (IIR) system which adapts to support different informationseeking strategies, with that of a standard baseline IIR system. The experiment, with 32 subjects each searching on 8 different topics, indicates that using the integrated IIR system resulted in significantly better performance, including user satisfaction with search results, significantly more effective interaction, and significantly better usability than using the baseline system.

nick@belkin.rutgers.edu
This paper reports on an experiment which compares an integrated IIR system which adapts to support different ISSs within a single information-seeking episode, with a rather generic IIR, modeled on standard operational IIR systems and designed to support searching through query specification. The integrated system is based on an explicit model of IIR which attempts to relate various characteristics of the user in the system, including the user's context, to different ISSs in which the person might engage, and to relate the different ISSs to one another in a systematic way. In the following sections, we describe the theoretical basis for the integrated IIR, discuss related work, describe the two systems and the experiment, present and discuss its results, and draw some conclusions based on those results.

Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval ­ search process.

General Terms
Design, Experimentation, Human Factors.

2. BACKGROUND 2.1 ISSs and IIR
Belkin and colleagues suggested in [4] that the variety of people's ISSs could be captured by a classification scheme consisting of four binary-valued facets (Figure 1). These facets were derived from an analysis of the information seeking and information retrieval literatures. Their claim was that a given ISS could be characterized by a specific combination of values of the four facets, yielding 16 distinct ISSs, in what they characterized as a "space of ISSs". On the basis of these facets, the idea of a searcher moving from one ISS to another in this space, and a twolevel hypertext IR model, they proposed a design for an IIR system which could support both "browsing" and "querying". Facet Method of Seeking Goal of Seeking Mode of Seeking Resource Interacted with Values (Scanning; Searching) (Learning; Selecting) (Recognition; Specification) (Information; Meta-information)

Keywords
Scanning, searching, adaptive information retrieval, interactive information retrieval, personalization

1. INTRODUCTION
It has been clear for some years that people engage in a variety of different information-seeking strategies (ISSs) when engaged in information systems, and also that standard interactive information retrieval (IIR) systems are designed to support only one such strategy, specified searching for one or more items. Although there have been some attempts to design systems (or to propose frameworks for systems) which will support more than one ISS (e.g. [4] [11]), for the most part this issue has been ignored in both research and operational IIR systems. It seems that this situation is due predominantly to three factors: a lack of recognition of the problem itself; the lack of a theoretical structure which might provide a framework within which multiple ISSs could be supported; and, the inherent difficulty of the task itself.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR `07, July 23 - 27, 2007, Amsterdam, The Netherlands. Copyright 2007 ACM 1-58113-000-0/00/0004...$5.00.

Figure 1. Facets of Information-Seeking Strategies (after [4]) Subsequently, Cool and Belkin [6] extended this model somewhat, based on an empirical study of knowledge workers. Figure 2 displays the relevant facets from [6], and their values, which are the basis for the description of ISSs used in our study. In particular, in the work reported here, we differentiated between two basic classes of ISSs; those characterized by the method of searching, and those characterized by the method of scanning, within the specific information behavior of access.

247


SIGIR 2007 Proceedings

Session 11: Interaction

Information Behavior Facet (This facet includes a variety of different types of such behaviors; for this study reported here, only the "Access" behavior was considered.) Access · · Method: Scanning ... Searching Mode: Recognition ... Specification

separate systems makes it hard to use the results related to one ISS to support another. Croft and Thompson [12] proposed a system which supported both browsing and specified search. Belkin, Marchetti, and Cool [4] designed an IR interface (BRAQUE) which supported multiple ISSs and allowed seamless movement from one specific ISS to another. In 1995, Belkin, Cool, Stein and Thiel [3] designed a dialogue-based interactive system, MERIT, which supported several ISSs and structured changes of ISSs in a mixedinitiative dialogue. All these systems made an effort to integrate multiple ISSs in a single framework, but their effectiveness remains to be evaluated empirically. The ScentTrails method developed by Olston and Chi [11] integrated browsing and searching to help people get information on the web. It used hyperlink highlighting to indicate paths to search results so people can easily integrate browsing and searching. This method was evaluated in a preliminary user study whose results showed that it allowed people to locate information more quickly than by searching or browsing alone. However, due to the small sample size (twelve subjects), the generalizability of the results is unclear, and the method remains to be implemented and tested in a larger context.

Objects Interacted with Facet · Level: Information ... Meta-information · Medium: Image, written text, speech, ... · Quantity: One object, set of objects, database Common Dimensions of Interaction Facet · Information object: Part ... Whole · Systematicity: Random ... Systematic · Degree: Selective ... Exhaustive Interaction Criteria Facet · e.g. accuracy, alphabet, authority, date, person Figure 2. A faceted classification of ISSs (based on [6]) Belkin [2] proposed that an information-seeking episode could be construed as a sequence of different types of interactions with information, or different ISSs, each of which could be "optimally" supported by different combinations of various retrieval techniques (see Figure 3). Thus, there would be different choices of representation, comparison, etc. techniques, for best support of any particular ISS. This model of IIR is the basis for the implementation of support techniques for different ISSs in our experimental integrated IIR system.

3. TASKS, ISSs, AND SUPPORT 3.1 Introduction
In order to test whether an integrated IIR system would better support people in their information seeking than a non-integrated IIR system, we began by identifying "proto-typical" problematic situations, or information problems, or tasks, which we believed would lead people to engage in a variety of ISSs. In the following sub-sections, we identify two such tasks, describe how a person might address them, according to our scheme of ISSs1, and indicate the relationships between these ISSs and corresponding support techniques which, in combination, we hypothesize would support them well (using the framework of Figure 3). These two general tasks were used as the basis for the ten specific topics that were given to subjects in our experiment, and the support techniques were the basis of the integrated system design.

USER USER co CO C O M P A R IS O N USER G o a ls ta s k s ..... R E P R E S E N T A T IO N

IN T E R A C T IO N J u d g m e n t, use, in te rp re ta tio n , m o d ific a tio n

S U M M A R I Z A T IO N

NA NA N A V IG A T I O N IN F O R M A T IO N Type , m e d iu m m ode le ve l

V I S U A L I Z A T IO N

3.2 Scanning, then searching
Task 1: A person is very interested in one particular topic. S/he wants to find some good documents on this topic from a system which is composed of several databases. But s/he has no idea about which of the many possible databases to search. Description: Given this situation, this person needs to first scan the whole system to identify the best databases for one particular topic, then conduct a systematic search on those databases on a specific topic. This person needs to compare the descriptions of the contents of different databases in order to choose the appropriate ones. Since the person does not know which databases are good, s/he needs to scan the meta-information of the databases in order to recognize the best databases. In order to get some meta-information of the databases, s/he issues a query. That query would be compared using a best match technique against the index terms associated with each database. The meta-information then is displayed by representing the database by the posting frequency of the index terms. Each database is summarized by the
1

Ti m e

Figure 3. A model of support for different interactions with information, or ISSs (after [2])

2.2 Related work
It is widely recognized that traditional information systems based solely on searching leave room for improvement. Some work has been done to investigate the possibilities of integrating separate IR systems, separate retrieval models, or multiple ISSs into a single system framework. Frisse and Cousins [8] developed a system which integrated hypertext and probabilistic retrieval models. Hearst et al. [9] combined a browsing system, a vector-space best-match retrieval system, and a visualization system into one system. However, it is difficult to optimally support different ISSs by completely integrating two different models, and combining originally

Terms which characterize the ISSs are italicized.

248


SIGIR 2007 Proceedings

Session 11: Interaction

number of documents indexed by the terms in the query, and by some descriptions of the contents based on most frequently indexed terms. By doing this, a person could easily see how topics are covered in the databases and how they are related to each other. This representation will allow the person to easily compare the different databases and decide which ones look more interesting by scrolling through them. Next, the person needs to conduct a systematic search within one or multiple database(s) for the specific topic in order to find documents of interest. The person needs to formulate a query based on the given task. The query would be compared using a best match technique against the index terms associated with chosen databases. The results of the query can be represented by clustering because clustering shows the relationship among the documents, as well as the relationship between documents in the clusters and query terms or other terms that might turn out to be useful. Query-based clusters (that is, clusters customized to reflect the information problem described in a query) would be displayed because it is known that clustered displays in terms of topicality are useful to help people locate relevant information (cf. [10]). It is also believed that clusters can tell the person what is the relationship between different clusters in the specific database at a glance. To accomplish this, we need to get the retrieval results and cluster them. Each cluster has a short summary about the number of documents in each cluster, and number of documents tailored to the particular topic. Now the person can decide which clusters are desirable and then drill down to the documents within the clusters. From our theoretical framework (Figure 3), an informationseeking behavior can be seen as the movement from one ISS to another ISS; different combinations of support techniques could optimally support a given type of ISS. In this task, a combination of such support techniques as best match, summary of database, indexing and scrolling are used to support a scanning-based ISS. A combination of support techniques, such as best match, clustering, clustered retrieval results display, and following link are used to support a searching-based ISS. The interactions between the user and the system are related to the situation, tasks, goals, etc.

author, date, publisher, publication place, etc. The retrieved results will be displayed as a complete citation. Then the person can see the table of contents of each book by following links of each citation. Next, since the person has only a general idea about the quotation, s/he needs to scan through the meta-information to identify some candidate quotation page. This person might look initially at the table of contents of the book to look for places where the quotations might occur. Then s/he goes to those pages and scans through them roughly to see if the desired quotations are there and, if so, record the quotations. In this task, a combination of such support techniques as table of contents visualization, and scrolling are used to support a scanning-based ISS. A combination of such support techniques as best match, indexing, fielded query search and following links are used to support a searching-based ISS. The interactions between the user and the system are related to the situation, tasks, goals, etc.

4. SYSTEM DESIGN AND IMPLEMENTATION 4.1 General design issues
Both the baseline system and the integrated system were constructed using Java and the LEMUR toolkit2, using Indriindexing, passage indexing, structured-query retrieval and document clustering. As test collections, we used the 2004 TREC HARD collection of eight news databases (see [1] for details], and a specially prepared database of 50 books downloaded from Project Gutenberg3. The specific different combinations of support techniques for the integrated system were tested to learn whether each separately was effective for supporting the individual ISSs. Results of those experiments indicated that they were indeed superior to the baseline system [13]. Both the baseline and integrated system have the same general interface structure. They begin with an introductory screen, asking the user to choose one of several functionalities. Choosing one leads to a screen which has a query box and "search" button at the top, a large results display area, a column on the right, the top of which displays the topic, the bottom being a space for saving results, and a horizontal bar across the bottom of the screen with navigation buttons for returning (or going) to other screens.

3.3 Searching, then scanning
Task 2: A person is in the process of preparing a talk for a conference. S/he recalls some germane comments from a known electronic book but cannot remember the exact contents. S/he needs to find out the exact quotations. S/he recalls that a certain electronic book might be very helpful. But she cannot exactly remember the name of the book. Description: Given this situation, this person first needs to search on the system to find the book, then scan through the book to get the comments needed. This person has a vague recollection about a book that s/he saw. S/he needs to improve her/his knowledge of some characteristics of the book, such as author, title, etc. Thus, s/he might need to search the system on terminological fragments of those data elements. In this situation, it would be good to give the person an opportunity to see something about the different characteristics of the book that s/he might remember. The items in the database, catalog or electronic books would be indexed to support a best match technique within different fields such as title,

4.2 Baseline system
Our baseline system used the default LEMUR parameters for indexing, comparison and ranking. Because of the nature of the tasks, and the two different databases, the opening screen offers the choice of searching for books on a specific topic, or searching for news articles on a specific topic. Having chosen, the searcher arrives at the appropriate search/results screen, and enters a query. The only difference between the book and news-article choice is that in the former, the paragraphs of the books are the indexed units, and all paragraphs in all books are ranked; in the latter, the entire article is the indexed unit, and all databases in the news
2 3

http://www.cs.cmu.edu/~lemur http://www.gutenberg.org/wiki/Main_Page

249


SIGIR 2007 Proceedings

Session 11: Interaction

collection are searched. In the book version, the first line of each paragraph is displayed; in the news-article, the title is displayed. In both cases, the source (book; database) is also displayed. Clicking on first line or title displays the full paragraph or article; clicking on source displays the ranked list of paragraphs in the book, or the ranked list of articles retrieved from the specific database. Figure 4 shows a news-article retrieval result for the baseline system.

task, the search/results screen shows the number of books retrieved for each query term.

Figure 5. Integrated system, clustered results display

Figure 4. Baseline system, news-article retrieval result

4.3 Integrated system
The integrated system begins with a screen containing four choices: (a) learning about the databases; (b) learning about the content coverage of databases with respect to a given topic; (c) searching for books on a specific topic; (d) searching for news articles on a specific topic. Choosing (a) leads to a search/results screen which lists the names of the databases and the number of documents in each. This includes the book database. The user can select one or more of the databases and enter a query which searches in those choices. If news article databases are chosen, the results of the search are displayed as a list of clusters, with highly-ranked cluster terms as a label, listing the first three titles of each cluster (Figure 5). Clicking on cluster link displays a ranked list of all (maximum of 30) titles and their sources in the cluster. Clicking on a title in either the cluster or subsequent list display shows the entire article; clicking on a source in the list shows the ranked results for the query for that database. If the book database is chosen, the query results in a ranked list of complete citations of the retrieved books; clicking on a book leads to a display of the table of contents of the book in a column on the left of the screen. Clicking on one of the items in the table of contents displays that part of the book. (Figure 6). Choosing option (b) leads to a screen which lists ten queries, each related to one of the eight test topics, and the two training topics. The user chooses one of these, which leads to a search/results screen. In the news-article tasks, this lists each database, and the number of occurrences of each query term in that database (Figure 7). The user can then choose one or more databases on which to do subsequent searches, or go directly to one database. In both cases, the result is a display as in Figure 5. For a book

Figure 6. Integrated system, book table of contents display

Figure 7. Integrated system, database summary display Choosing option (c), searching for books on a topic, leads to a fielded search/results screen. The user enters values in the fields to perform a query; the results (complete citations) are displayed according to how well they satisfy the Boolean conditions of the

250


SIGIR 2007 Proceedings

Session 11: Interaction

query (Figure 8). Clicking on any citation leads to the display of Figure 6.

shown in Table 1. Computer and information searching experience were determined in a entry questionnaire, using a 7point scale, with 1=low, and 7=high. Those data are displayed in Table 2. Table 2. Computer and searching experience of the subjects (7-point low to high scale) Type of computer/search experience Computer daily use Expertise of computer Searching experience of library catalogs Searching experience of commercial system Searching experience of WWW Searching experience of other systems Frequency of search Mean (s.d.) 6.91 (0.39) 5.34 (1.15) 5.63 (1.29) 3.88 (1.91) 6.72 (0.58) 1.14 (0.38) 6.50 (0.92) 6.06 (0.95) 5.28 (0.58) 7.34 (2.24)

Figure 8. Integrated system, book search results display Choosing option (d) leads the user directly to the search/results screen shown in Figure 5.

Success in finding information Expertise of searching Number of years of searching experience

5. CONDUCT OF EXPERIMENT 5.1 Experimental design
The underlying hypothesis for this experiment is: The integrated system designed for supporting scanning and searching performs better in supporting integrated tasks requiring both scanning and searching than the baseline system. This was a within-subject design, in which subjects performed searches using each of the two systems, first one system, then the other. For each system, subjects first performed a search on a training topic, then searched on four different topics. These four topics belong to two task categories, that is, finding news-article task and finding comments task, as described in Section 3. The first test topic was of the same task type as the training topic, the second topic was of the other task type, and so on. The order of the task types and topics was rotated across subjects and the experiment was replicated by exchanging the order of the two systems. This design led to 32 subjects. Table 1. Demographic characteristics of the subjects Characteristics Age Value <30 30-39 40-49 >=50 Library and Information Science Communication Computer Science Political Science Anthropology Biomedical Engineering Others Master Bachelor No. 22 3 4 3 9 5 3 3 2 2 8 18 14

5.3 Tasks and topics
Each of the two different tasks has five topics, including one as the training topic, all following the same general structures as described in Sections 3.2 and 3.3. Below, we give an example topic for each task type. Task type 1: (Finding documents) Topic: As a graduate student, you are asked to write an essay about high blood pressure for one of your courses. You are supposed to get information you need from a system that is composed of several databases. Each database has lots of news articles on a variety of topics, but you have no idea which databases are good on this topic. You believe it would be interesting to discover methods that reduce high blood pressure, and would like to collect news articles that identify different methods. Task: Please find as many different methods as possible. For each method, please copy the title or link of the article which discusses that method, and paste it to the answer box. For each article that you copy, please type or copy the method(s) that it identifies. If an article discusses more than one method, you only need to copy and paste the article once. If there are several articles which discuss the same methods, you only need to copy and paste one such article. Task type 2: (Finding comments) Topic: You are in the process of preparing a talk on the history of Rome. There are a lot of books available on this topic. But what you are interested in are the wars of Julius Caesar. You recall that some comments from an electronic book might be very useful for the talk. You cannot remember the exact name of the book. But you believe that it was published by a publisher in New York. The comments are about the strategies that Caesar used on the battle field to win the Battle of Pharsalia. You cannot remember the exact comments, but would like to quote them in your talk.

Current Major

Degree earned

5.2 Subjects
Thirty-two Rutgers university graduate students participated in this experiment. Demographic characteristics of the subjects are

251


SIGIR 2007 Proceedings

Session 11: Interaction

Task: Please find the relevant comments from the book, copy the one best paragraph then paste it into the answer box. Also, please copy the title of the book then paste it to the answer box.

Table 3. Performance measures (** significant at <.01 level) Mean (standard deviation) Systems Time (mins) 8.94(3.05) 9.11(2.91) Result Satisfaction (1-7) Baseline Integrated 4.86 (1.77) 5.40 ** (1.43) Result Correctness (0-2) 0.97 (0.84) 1.17 (0.77) 0.44(0.21) 0.54** (0.21) Aspectual Recall

5.4 Data collection
Log data, including the interaction between the user and the system were collected using the computer logs and a logging software, "Techsmith Morae 1.3". Morae was also used to record what the user said during the whole search process, as well as the exit interview. An entry questionnaire gathered demographic and other background information; a pre-search questionnaire elicited information about subjects' knowledge of the topic; a post-search questionnaire elicited opinions about the particular search; a postsystem questionnaire collected opinions about the specific system; and, an exit interview compared search experience and opinions of the two systems.

5.5 Conduct
When subjects arrived, they completed an informed consent form, which included detailed instructions about the experiment, and then the entry questionnaire. Next, they were given a training topic to practice with the first system they would use. Then, for each topic, they filled out a pre-search questionnaire, conducted the search and saved the answers in the given place. When they felt that a satisfactory answer was saved, or they ran out of time (subjects had up to 12 minutes per search), they answered a brief post-search questionnaire. This procedure continued until four topics in the first system were completed, after which they filled out a post-system questionnaire and were given a three-minute break. The same procedure was followed for the next set of topics using the second system, after which the exit interview was given. Each subject was paid $30 cash equivalent value (gift card/cash) after completing the experiment.

Judgment of result correctness for task type 2 topics was accomplished by constructing the topics so that there was only one paragraph in the book database which provided a completely correct answer; paragraphs which had only some required information were judged partially correct. Figure 9 shows the distribution of different correctness values for each topic.
40 30 20 10 0 Topic 5 Topic 6 Topic 7 Topic 8
2 - Correct 1 - Partially correct 0 - Incorrect

Figure 9. Result correctness by topic

6. RESULTS 6.1 Performance
The measures of performance were the time taken to complete the task; the subjects' satisfaction with their results; the correctness of the results (incorrect, partially correct, correct); and, aspectual recall. Aspectual recall and correctness were the performance measures for task type 1 and task type 2, respectively. Time was determined through logging, satisfaction with results though the post-search questionnaires, on a scale of 1 (unsatisfied) to 7 (completely satisfied); correctness was determined by the investigators, according to the task definition, and was graded on a 3-point scale, 0-2; and aspectual recall was determined by pooling all of the aspects identified for each topic by all of the subjects. Aspectual recall, a measure developed in the TREC Interactive Track (cf. [7]), is the ratio of aspects of the search topic identified in the documents saved by the subject, to the total number of aspects of the topic. Table 3 displays the performance results. Significance tests were through ANOVA, Wilcoxon Signed-rank, Pearson Chi-square, and ANOVA tests, respectively. For result satisfaction, the details are: Z=-2.633, p=0.008; for aspectual recall: F=6.951, p=0.009. Although result correctness difference between the baseline and integrated system is not significant, it is in favor of the integrated system. Time for the integrated system is slightly more than the baseline system, but the difference is not significant.

6.2 Interaction and effort
The characteristics of user interaction with the system, and of user effort, were measured by: the number of iterations (i.e. queries) in a search; the total number of documents or paragraphs saved at the end of the search; the number of documents or books viewed during a search; and, the mean query length per search. Table 4 displays these results. Table 4. Interaction and effort measures (* significant at <.05 level, ** significant at <.01 level) Variables Interaction Measure Number of iterations Number of final saved documents/paragraphs Number of documents/ books viewed Query length Baseline 3.81 (3.65) 4.55 (2.22) 7.98 (4.71) 3.39 (1.20) Systems Integrated 2.96* (2.68) 4.58 (2.37) 9.64 (8.75) 4.78** (2.39)

The number of iterations per search is significantly lower in the integrated system (ANOVA, F=4.516, p=0.035), and the query length is significantly higher in that system (ANOVA, F=34.571, p<0.001). There were no significant differences in the other two variables.

252

Number of responses


SIGIR 2007 Proceedings

Session 11: Interaction

6.3 Usability
The usability of the systems was measured in two general ways: one had to do with subjects' perceptions of ease of learning to use the system, ease of use of the system, understanding of the system, and usefulness of the system (all on 7-point scales, 1=low; 7=high). The results (Table 5) show that subjects found the integrated system to be significantly (Wilcoxon signed-rank test) easier to use (Z=-2.264, p=0.024), and significantly more useful (Z =-2.522, p=0.012) than the baseline system. The second general approach to evaluating usability measured issues concerned with searching. The variables here were perceptions of ease of starting a search, ease of doing the search, satisfaction with search results (reported in Table 3), and whether the subject had enough time to do the search (all on the same type of 7-point scale). The results (Table 6) are that it was significantly (Wilcoxon signed-rank test) easier to start the search in the integrated system (Z=-2.239, p=0.025), and that there was a significant difference in favor of the integrated system in subjects' perception of having had enough time to do the search (Z=-2.466, p=0.014). Although difference in ease of searching is not significant (Z=-1.341, p=0.180), it is, again, in the integrated system's favor. Table 5. Use usability measures (* significant at <.05 level) Systems Baseline Integrated Easy to learn 5.25 (1.52) 5.53 (1.11) Mean (standard deviation) Easy to UnderstandUsefulness use ability 4.72 5.09 (1.49) 4.47 (1.34) (1.33) 5.38* 5.25(1.30) 5.44* (1.32) (1.16)

There were significant differences in favor of the integrated system on two interaction measures. The number of iterations, that is, queries, per search was significantly lower, and the mean query length was significantly higher. Since it is known that longer queries perform better in best match systems, the latter result is of some general interest. Both systems, in many respects, were novel to the subjects, but the integrated system was thought by them to be significantly easier to use, and significantly more useful, with respect to the tasks. It was also significantly easier for the subjects to start their tasks in the integrated system, and when asked if they had sufficient time to do the search, they gave significantly more positive responses for the integrated system. We considered whether the results could have arisen from any systematic differences between the subjects with respect to their topic expertise or familiarity. The data on these factors (Table 7) seem not to support this, as the subjects' mean self-reported expertise and familiarity, measured on a 7-point low to high scale, are all pretty uniformly low, for all topics, with rather low standard deviation, as well. Only 1 subject indicated topic familiarity of 7, for only one topic; 9 subjects indicated topic familiarity of 6, for four topics; these data are insufficient to investigate any possible interaction of familiarity with system. Table 7. Topic familiarity and expertise (7-point low to high scale) Mean (s.d.) Topic No. Training -book Training -article 1 2 3 4 Topic History of America Air pollution Global warming High blood pressure International trade in cotton Auto safety Development of airplane models Childhood education Development of the domestic bird business History of Rome Topic familiarity 1.56 (0.91) 3.72 (1.42) 4.22 (1.29) 3.31 (1.28) 1.91 (1.23) 2.94 (1.32) 1.88 (1.07) 2.69 (1.69) 1.63 (0.94) 2.16 (1.37) Topic expertise 1.41 (0.71) 3.00 (1.32) 3.44 (1.39) 2.66 (1.36) 1.78 (1.10) 2.47 (1.14) 1.56 (0.84) 2.16 (1.32) 1.50 (0.80) 1.84 (1.17)

Table 6. Search usability measures (* significant at <.05 level) Systems Baseline Integrated Mean (Standard deviation) Ease of Ease of starting searching 5.42 (1.60) 5.05 (1.65) 5.76* (1.27) 5.37 (1.32) Enough time 5.51 (1.65) 5.93* (1.27)

7. DISCUSSION
The results of our experiment demonstrate that the integrated system that we designed which adapts to different ISSs within the course of a single information-seeking episode appears to have significant advantages over a baseline system which is designed to support specified searching only. There was no measure on which the baseline system, built using current state-of-the-art technology, and using the standard current support techniques, outperformed the integrated system. Furthermore, in each of our evaluation categories, the integrated system significantly outperformed the baseline system, on at least two measures. With respect to our performance measures, although there was no significant difference between the systems with respect to time taken for completing the task, this is probably an artifact of the design, which limited search time to twelve minutes. The advantage to the integrated system for both aspectual recall and subject satisfaction with results was highly significant, and there was no difference in correctness of results between the systems.

5 6 7 8

Thus, it appears that our basic hypothesis, that an integrated system which adapts to support different ISSs during the course of an information-seeking episode, better supports information seeking than a system designed to support only the standard ISS of specified searching, is itself supported.

8. CONCLUSIONS
In this paper, we presented an Interactive Information Retrieval (IIR) system which adapts to support a searcher in a variety of different ways during the course of an information-seeking episode. This system is based on a theoretical model of IIR, which construes an information-seeking episode as a person's moving

253


SIGIR 2007 Proceedings

Session 11: Interaction

from one Information-Seeking Strategy (ISSs) to another. On the basis of this theoretical model and a classification of ISSs, we identified different combinations of various IR techniques which we hypothesized would best support the different ISSs that a searcher might engage in while attempting to resolve some particular kinds of information problems. This analysis led to the implementation of an integrated IIR system which adapts to support both scanning and searching behaviors within a single framework. To see whether an integrated system of the sort that we implemented would in fact better support human information seeking than the type kind of IIR system in current use, which is designed to support only one kind of ISS, specified searching comparing a query to a set of information objects, we conducted an user-oriented experiment. The experiment compared user performance and behavior in our integrated system to that in a baseline system which emulated the support offered by most standard IIR systems. The results of the experiment demonstrated substantial and significant advantage to the integrated system in terms of objective and subjective performance; degree of user interaction with the system; and usability. These results, we believe, speak strongly in favor of the general concept of designing IIR systems explicitly to be able to support different ISSs. They also demonstrate that it is indeed possible to support quite different behaviors within a single system framework which searchers can actually understand and use effectively. They also demonstrate that a principled approach to designing such systems is possible. We believe that we have shown, through this study, at least to a limited extent, that a model of IIR as support for interaction with information, combined with an empirically-based classification of such interactions, can provide such principles. Of course, there are limitations to the conclusions which we can draw from our experiment, and issues concerning our conclusions which need further to be investigated. As always in user studies of this type, we were constrained by a limited, and to some extent rather homogeneous number of subjects, and a limited number of search topics. The only realistic way to address this issue is to do more studies, which we intend to perform. Furthermore, since this was an experimental study, the subjects were assigned topics to search, rather than searching on topics of their own interest, and searched in somewhat limited databases. We attempted to address this problem by using scenario-based topic descriptions (cf. [5]), and by use of a TREC collection, but the only way really to deal with it is to move from a strictly experimental environment to a quasi-experimental environment in which the integrated system is embedded in a real-life context. Such a study awaits a more robust and complete system than the one that we have tested, as well as one that is not so specifically tailored to particular types of information problems. Finally, the integrated IIR system which we tested responded to only a small number of different ISSs. The identification of good support techniques for other ISSs, their implementation in a more general integrated IIR system, and the evaluation of such a system is an obvious next step. Despite the limitations and unanswered questions associated with this study, we believe that it is a convincing step on the road toward the reality of integrated IIR systems, which will adapt to

provide support for a variety of ISSs within a single informationseeking episode.

9. REFERENCES
[1] Allan, J. (2005). HARD Track overview in TREC 2004: High accuracy retrieval from documents. In E.M. Voorhees & L.P. Buckland (Eds.), TREC 2004, Proceedings of the Thirteenth Text Retrieval Conference,. Washington, DC:GPO. [2] Belkin, N. J. (1996). Intelligent information retrieval: Whose intelligence? In Proceedings of the Fifth International Symposium for Information Science (ISI-96). Konstanz: Universitätsverlag Konstanz, 25-31.

[3] Belkin, N. J., Cool, C., Stein, A. & Thiel, U. (1995). Cases, scripts, and information-seeking strategies: On the design of interactive information retrieval systems. Expert Systems With Applications, 9(3), 379-395. [4] Belkin, N. J., Marchetti, P. G., & Cool, C. (1993). BRAQUE: Design of an interface to support user interaction in information retrieval. Information Processing & Management, 29(3), 325-344. [5] Borlund, P. (2000). Experimental components for the evaluation of interaction information retrieval systems. Journal of Documentation; 56 (1), p.71-90. [6] Cool, C. & Belkin, N. J. (2002). A classification of interactions with information. Proceedings of the Fourth International Conference on Conceptions of Library and Information Science,1-15. [7] Dumais, S. & Belkin, N.J. (2005). The TREC interactive tracks: Putting the user into search. In E.M. Voorhees & D.K. Harman (Eds.) TREC: Experiment and Evaluation in Information Retrieval (pp. 123-152). Cambridge, MA: MIT Press. [8] Frisse, M. & Cousins, S.B. (1989). Informaiton retrieval from hypertext: update on the dynamic medical handbook project. In Hypertext '89 Proceedings. New York, ACM:199-212. [9] Hearst, M. , Pedersen, J., Pirolli, P., Schuze, H., Grefenstette, G., & Hull, D. (1996). Xerox site report: Four TREC-4 tracks. In D. Harman (ed.) TREC-4, Proceedings of the Fourth Text Retrieval Conference. Washington, DC:GPO. [10] Muresan, G. (2002). Using Document Clustering and Language Modelling in Mediated Information Retrieval. Ph.D. Thesis, School of Computing, Robert Gordon University, Aberdeen, Scotland, United Kingdom. [11] Olston, C. & Chi, E. H. (2003). ScentTrails: Integrating browsing and searching on the web. ACM Tarnsactions on Computer-Human Interaction, 10 (3), 177-197. [12] Croft, W.B. & Thompson, R.H. (1987). I3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38(6), 389-404. [13] Yuan, X.-J. (2007). Supporting Multiple Information-Seeking Strategies in a Single System Framework. Unpublished Ph.D. dissertation, Rutgers University, New Brunswick, NJ.

254