WWW 2007 / Poster Paper Topic: User Interfaces and Accessibility System for Reminding a User of Information Obtained through a Web Browsing Experience Tetsushi Morita NTT Corporation NTT Cyber Solutions Laboratories 3-9-11 Midori-Cho Musashino-shi Tokyo, 180-8585 Japan 81-422-59-4840 morita.t@ lab.ntt.co.jp Tetsuo Hidaka NTT Corporation NTT Cyber Solutions Laboratories 3-9-11 Midori-Cho Musashino-shi Tokyo, 180-8585 Japan 81-422-59-7150 hidaka.tetsuo@ lab.ntt.co.jp Akimichi Tanaka NTT Corporation NTT Cyber Solutions Laboratories 3-9-11 Midori-Cho Musashino-shi Tokyo, 180-8585 Japan 81-422-59-4483 tanaka.akimichi@ lab.ntt.co.jp Yasuhisa Kato NTT Corporation NTT Cyber Solutions Laboratories 3-9-11 Midori-Cho Musashino-shi Tokyo, 180-8585 Japan 81-422-59-4420 kato.yasuhisa@ lab.ntt.co.jp ABSTRACT We propose a system for reminding a user of information obtained through a web browsing experience. The system extracts keywords from the content of the web page currently being viewed and retrieves the context of past web browsing related to the keywords. We define the context as a sequence of web browsing when many web pages related to the keyword were viewed intensively because we assume that a lot of information connected to the current content was obtained in the sequence. The information is not only what pages you viewed but also how you found those pages and what knowledge you acquired from them. Specifically, when you browse web pages, this system automatically displays a list of the contexts judged to be important in relation to the current web page. If you select the context, details of the context are shown graphically with marks indicating characteristic activities. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval retrieval model, Search process General Terms: Algorithms, Management, Design. Keywords: Context, Information Retrieval, User's Behavior, History. semantic desktop search systems have been proposed. One of them helps a user to retrieve web pages and e-mails according to intimate information by adding metadata such as the URL of a web page that is visited subsequently and the destination address of an e-mail [1]. A previous version of our method helps users to retrieve web pages viewed in the past by calculating their personal importance by using log data from personal computers [4]. These desktop search systems help a user to find a web page viewed in the past efficiently. However, they do not seem to be so good at reminding us of the various kinds of information that we acquired simultaneously in the past web browsing experience because we need to choose and visit many retrieved independent web pages. 2. PROPOSED SYSTEM We focus on the context in the past. The context is a sequence of web browsing when many web pages related to the content of the web page currently being viewed were viewed intensively and a lot of actions were performed. We call the time of this sequence an "intensive period". We assume that a lot of obtained information is also contained in the context. For example, if a user is researching a product, he first finds the context related to current web pages and then he acquires a lot of obtained information such as the URLs of multiple web pages that were visited at that time. By chronologically tracing his activities, such as which web pages he viewed in the context, he learns how to find the pages and what knowledge he got from them. 1. INTRODUCTION Have you ever been frustrated at failing to rediscover useful web pages that you viewed in the past? We forget and waste various information that we obtain through our own web browsing. Most of us have retrieved the same web page more than once. One report says that the retrieval rate of previously seen web pages among all pages that people view is 81% [3]. The information that we obtain through an experience, such as web browsing, is not limited to the content of web pages. We seem to recognize which web pages we viewed in a session, how the pages were found, and what knowledge we acquired from them. We define the information as "obtained information". In this paper, we describe a system that aims to remind a user of previously obtained information efficiently. The bookmark function of a web browser and a desktop search system are popular to help us to retrieve previously seen web pages by keyword matching [2]. Several Copyright is held by the author/owner(s). WWW 2007, May 812, 2007, Banff, Alberta, Canada. ACM 978-1-59593-654-7/07/0005. 2.1 Collecting action logs It is difficult to force a user to perform the actions required to create history data such as recording when and how he or she viewed a web page. A logging module collects the information about a computer's mouse, keyboard, copying, and printing events and window conditions, the URLs of visited pages, source files, thumbnails, http headers, text selected by the user, and so on. It has an encryption function to protect the user's privacy. 2.2 Extracting keywords of current web page Our system analyzes the content of the current web page to extract keywords that represent the web page. Its technique is very simple. An analyzer of a browser component obtains the current content C and characterizes it by extracting the most frequent terms. A score Si is then determined for each term ti C, 1327 WWW 2007 / Poster Paper Topic: User Interfaces and Accessibility where Si = (1 + 0+... n)R(ti) and i is a weighting coefficient that varies heuristically with the locality of ti. That is, extracted terms instanced in anchor text are assigned a higher weight than those in
text, but a lower one than those in