SIGIR 2007 Proceedings Demonstration DiscoverInfo: A Tool for Discovering Information with Relevance and Novelty Chirag Shah School of Information and Librar y Science University of Nor th Carolina Chapel Hill NC 27599, USA Gar y Marchionini School of Information and Library Science University of Nor th Carolina Chapel Hill NC 27599, USA chirag@unc.edu Categories and Sub ject Descriptors: H.3.2 [Information Interfaces and Presentation]: User Interfaces[Interaction styles] General Terms: Design, Human Factors Keywords: Relevance, Novelty, Term-cloud march@ils.unc.edu as across the documents. These clouds can provide a good overview of the underlying collection. The user can browse through the clickable term clouds (see Figure 1) and find associated documents. 3. Discover: This system not only retrieves relevant information from the indexed collection, but can also evaluate novelty across documents. This can help the user to discover not only the relevant, but also novel information. At present novelty for document dj with respect to di is implemented as Novelty(di , dj ) = 1 - Fraction of words overlap between di and dj (1) Figure 1: DiscoverInfo: p ortion of the term cloud A user of an IR system is often left in the dark with a query box. It may be useful in many cases to present some kind of representation of the underlying collection to the user. Obviously, constructing and providing a representation of a massive collection such as the Web is very difficult and probably not very useful. However, such functionality could be feasible and useful for a small or specialized collection. The other issue that keeps the user in the dark is the presentation of the retrieved information. A typical search engine presents the retrieved results ordered by their relevance. While working with specialized collections in which most of the documents are about a specific domain or topic, relevance is not enough to derive and order the retrieved set. A user may also want to know how much the retrieved documents differ from each other. In other words, in addition to relevance, it becomes useful to present novelty among those documents. We demonstrate this concept using DiscoverInfo - a system that allows the user to explore a collection of documents by their relevance and novelty. The system facilitates discovery through three functions. 1. Search: The system provides a simple searching interface for doing full text search and retrieval. Using Indri [http://www.lemurpro ject.org/indri/], DiscoverInfo indexes text, HTML, XML, and PDF documents. 2. Browse: DiscoverInfo system prepares a term cloud based on the term occurrences in the collection as well Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. The system evaluates novelty among the top 10 retrieved documents and presents these pairwise relationships in a matrix format (see Figure 2). In addition to the numbers, the system also uses color-coding to indicate the degree of novelty. Figure 2: DiscoverInfo: displaying novelty evaluation matrix for top 10 relevant do cuments At present we have implemented our system on The North Carolina Election of 1898 [http://www.lib.unc.edu/ncc/1898/] collection available from UNC Chapel Hill Library. The collection has nearly 500 historical documents containing about 8 million terms. DiscoverInfo is at present a preliminary, yet fully functional system. The effectiveness of this system has not yet been measured with user studies, but it does demonstrate two key ideas: (1) providing a representation of the underlying collection, and (2) aiding in discovering relevant as well as novel information. We are working on extending its functionality to include other collections as well as defining user studies that assess usefulness. 902