SIGIR 2007 Proceedings Demonstration A Full-Text Retrieval Toolkit for Mobile Desktop Search Wei Chen, Jiajun Bu*, Kangmiao Liu, Chun Chen, Chen Zhang College of Computer Science, Zhejiang University Hangzhou 310027, P.R. China *Corresponding Author, +86 571 87952148 {chenw,bjj,lkm,chenc}@zju.edu.cn, lanweizc24@hotmail.com Categories and Sub ject Descriptors: H.3.3 [Information Search and Retrieval]: Search process General Terms: Algorithms, Design, Experimentation Keywords: Full-Text Retrieval, Mobile Desktop Search 1. INTRODUCTION Smart handheld devices such as smart phones and personal digital assistants are already in widespread used today. Typically equipped with 200MHz ARM9 or else based CPU, 64 MB of ROM and 32 MB of RAM memory, they are becoming increasingly powerful. Especially based on the GB level Reduced-Size MultiMediaCards (RS-MMC) or Secure Digital (SD) flash card storage expansion, people can install lots of software and save plenty of data such as MP3, eBook, eMail, Wikipedia etc in them. It further introduces the need to do search inside handhelds. However, smart handhelds are always with power constraints and have limited interaction capabilities. In particular, the asymmetric read/write and wear characteristics of flash storage card make it difficult to offer high-performance indexing capabilities. Very few handhelds include the support for basic search functionality currently. To enrich handhelds, we developed a full-text retrieval toolkit named Titan-Lite especially designed for them. People can embed it easily into various handheld applications and implement search functionality. The first edition is written in Symbian C++, and is designed as a research system to run under Symbian OS. Titan-Lite mainly includes four components: storage manager, indexer, analyzer and searcher. Most of them are specially designed considering the characteristics of handhelds. NAND flash is most widely used storage media in handhelds. Reading from NAND flash can be performed at any granularity and is very fast. However, deleting data can only be performed at block granularity (i.e., 8KB64KB) and writing data can only be performed at page granularity (i.e., 256B512B) after the respective page (and its respective 8KB64KB block) has been deleted. What's more, each page can only be written a limited number of times (typically several hundred thousands). Storage manager is implemented to deal with all of the read/write operations in the system. All writing operations are page based at the offset of multiple page size and the free space in the end of page after writing is right-padded. Indexer uses single-pass inversion index construction method Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. (S. Heinz et al 2003) by geometric partitioning (N. Lester et al 2005) to build the full-text inverted index. Singlepass inversion method can operate within limited resources, and does not sacrifice speed with temporary storage requirements. Geometric partitioning uses the temporary storage produced by single-pass inversion and decreases the number of write operations. What's more, it supports on-line index construction efficiently. We chose B-trees to organize the vocabulary. It fits the flash memory best because of its page-based organization. Most data in handhelds are rich media formats. So the analyzer uses different parsers for different types of documents. Built-in parser splits the text into single word as index terms. Programmers can re-implement the parser to support various formats. At present, Titan-Lite supports Boolean query and rank query. 2. MP3 SEARCH Figure 1: MP3 Search We implemented a MP3 search system based on TitanLite in S60 Platform SDK for Symbian OS. Figure 1 shows six figures of the system in S60 Platform device emulator. People can do indexing, searching and configuring the system using options menu in No.1. No.2 shows the query dialog where users can enter queries. By indexing the text in Song Title, Artist, Album, Year and Comment tags of MP3 files, people can search the music easily. The search results of Beatles(Artist), sea(Song Title), yellow submarine(Album), eagles California(Boolean Query, Artist and Song Title) are showed in No.3No.6. Users can click corresponding title to play the song. 905