SIGIR 2007 Proceedings

Demonstration

Nexus: A Real Time QA System
[Demo Overview]
Kisuh Ahn
ICCS, University of Edinburgh Edinburgh, United Kingdom

Bonnie Webber
ICCS, University of Edinburgh Edinburgh, United Kingdom

k.ahn@sms.ed.ac.uk Categories and Subject Descriptors
H.3 [Information Search and Retrieval]: Search Process

bonnie@inf.ed.ac.uk
· The sentential context of each p otential answer b earing expression is collected and collated for every such entity. · The collated context for an answer is then used to construct a sp ecialised term-term matrix, which represents the relation b etween an answer term (Aterm) and its associated context terms (Qterms) via some weighted values. · Finally, this matrix is turned into an inverted index with Qterms as the keys. Once the index has b een produced, the answer is retrieved using a conventonal IR method.

General Terms
Performance

Keywords
Question Answering, Informaion Retrieval

1.

INTRODUCTION

Factoid Question Answering has attracted much research interest in recent years. The p erformances of the state of the art factoid QA systems in terms of the correctness of answers seem to b e approaching reasonab e level as shown by TREC QA exercises [2] to make Question Answering nearly viable for practical uses. However, one imp ortant issue that has attracted less attention is that of whether a QA system can b e efficient enough to scale up to the intensive usages such as a web search engine might b e sub jected to. Unlike web search engines such as Google which can produce results on real time while handling tens of thousands of queries p er second, a typical QA system requires several seconds or even minutes to produce the answer to a single question. This, we b elieve, is due to the fact that a typical QA system p erforms most of its work on-line after a question has b een put. In this demonstration, we present our system, Nexus, which shifts a significant prop ortion of work off-line by pre-processing and constructing a sp ecialized index for p otential answers ­ currently, expressions b earing named entities such as p eople, organisations and locations. This substantially increases the system's sp eed and efficiency.

3. THE NEXUS SYSTEM
We have implemented a fully working system, NEXUS, based on the method describ ed in the ab ove. Our system uses Aquaint corpus as our source data and from this we have generated an answer index containing the following numb er of entities. KIND A LL PERSON ORGANISATION LOCATION NUM 243641 117370 67559 48194

Table 1: Number of Aterms Indexed To retrieve an answer for a question, we use an off-theshelf IR system, Lemur [1], which accesses the answer index created off the afore-mentioned term-term matrix. We found that the structured query retrieval method (based on InQuery retrieval system) offered by Lemur p erforming reasonably well for our purp ose. NEXUS produced answers to the 162 p erson/organisation/location questions from TREC 2003 QA in less than 1 minute on one single processor machine with Intel Pentium IV 2.4 Ghz CPU and 512 MB of RAM. Its a@5 score (the numb er of questions answered within the first 5 hits) was 0.40.

2.

DESCRIPTION OF THE METHOD

Our method shifts most of the work off-line to the preprocessing and indexes the p ossible answers directly via a sp ecially constructed term-term matrix. The method uses a text corpus of substantive size as its main source of knowledge. The method that constructs the answer index is describ ed as follows: · All p otential answer b earing expressions (according to some pre-defined criteria) are identified from the source corpus.
Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007.

4. REFERENCES
[1] P. Ogilvie and J. Callan. Exp eriments using the lemur toolkit. In Proceeding of the 2001 Text Retrieval Conference (TREC 2001), pages 103­108, 2002. [2] E. M. Voorhees. Overview of the TREC 2005 question answering track. In TREC, 2005.

910