WWW 2008 / Poster Paper

April 21-25, 2008 · Beijing, China

PivotBrowser: A Tag-Space Image Searching Prototype
Xiaoyan Li, Lidan Shou, Gang Chen, Xiaolong Zhang, Tianlei Hu, and Jinxiang Dong
College of Computer Science, Zhejiang University Hangzhou, P.R.China 310027

kricel_lee@yahoo.com.cn, {should,cg,xiaolongzhang,htl,djx}@zju.edu.cn

ABSTRACT
We propose a novel iterative searching and refining prototype for tagged images. This prototype, named PivotBrowser, captures semantically similar tag sets in a structure called pivot. By constructing a pivot for a textual query, PivotBrowser first selects candidate images possibly relevant to the query. The tags contained in these candidate images are then selected in terms of their tag relevances to the pivot. The shortlisted tags are clustered and one of the tag clusters is used to select the results from the candidate images. Ranking of the images in each partition is based on their relevance to the tag cluster. With the guidance of the tag clusters presented, a user is able to perform searching and iterative query refinement. Categories and Sub ject Descriptors: H.4 [Information Systems Applications]: Miscellaneous General Terms: Algorithms, Design Keywords: Tag, Inconsistency, Ambiguity, Relevance

the spelling variations (plural, abbreviation, etc.), and the other highly relevant terms ("film" vs. "movie"). A good example of tag thesaurus is the one used in the WordNet[3]. b A tag atom A is a set of tags that satisfy the following reb quirements: (1) If a tag atom A contains a tag t, it must also contain all lexically relevant tags of t as defined in the b thesaurus; (2) For any two tags in A, t1 and t2 , they must be lexically relevant to each other. It is important to note that one tag may possibly appear in multiple tag atoms as it can have more than one lexical meaning in the thesaurus. Therefore, given a universe of tags {ti }, we can precompute an inverted list for all possible tag atoms based on the tag thesaurus. Each entry in the inverted list is like following b b < ti , id of Ai,1 , id of Ai,2 , · · · >, b where each tag atom Ai,j contains tag ti . We refer to this inverted list as the Tag Atom Inverted List (TAIL). A pivot atom of tag ti , denoted as P A(ti ), is defined as the union of all tag atoms which contain ti (or those in the same entry of ti in TAIL). A pivot of n tags, P (t1 , t2 , . . . , tn ), is defined as the set containing all pivot atoms of its tags, P (t1 , t2 , . . . , tn ) = {P A(tm )|m = 1, . . . , n}. An n-tag set {tj1 , tj2 , . . . , tjn } is said to be supported by pivot P (t1 , t2 , . . . , tn ), if tjm  P A(tm ) (m = 1, 2, . . . , n). Based on these definitions, we can enumerate all tag sets supported by a pivot. These tag sets are supposed to be semantically similar to each other.

1. INTRODUCTION
Tagging based search systems are known to be prone to semantic errors or limitations [2]. To name a few: Different users may use different tags (maybe synonyms) to describe the same ob ject, causing inconsistency in tagging; The existence of polysemy (single term having multiple meanings) in a query causes ambiguity, and the query is often hard to refine; The distribution of the tags being used is usually skewed and has the long-tail characteristic. Therefore, on one hand, images with rare tags cannot be easily found. On the other hand, queries with rare tags may need to be expanded to larger scopes. We propose PivotBrowser, an iterative searching and refining prototype for tagged images. PivotBrowser employs a novel tag-based structure called pivot to address the above problems. Our approach is different from a previous work on social tag clustering [1] as we can handle both synonymy and ambiguity.

2.1

The precomputation

2. THE PIVOT BROWSING SCHEME
To introduce the concept of pivot, we first give the definition of tag atom based on the availability of a tag thesaurus. The tag thesaurus contains lexical relevance information for all tags, such as the synonyms ("flower, bloom, blossom"),
Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04.

Before any user interaction, we need to precompute some data structures required during pivot browsing. (1) First, given the universe of all tags in the image database, we generate the TAIL based on the thesaurus. (2) Second, we generate an inverted index for the image database, where each entry contains an image ID list for a tag. If each image is regarded as a document, for each key (tag) of the inverted index, we check the length of its image ID list and compute the inverse document frequency (IDF) for each tag. (3) Third, we compute a tag-to-tag affinity matrix for all tags in the database. The tag affinity metric that we use is the well-known Jaccard coefficient over the entire image database: T |I mag eS et(t1 ) I mag eS et(t2 )| S , aff(t1 , t2 ) = |I mag eS et(t1 ) I mag eS et(t2 )| where I mag eS et(t) indicates the set of images having tag t.

1111


WWW 2008 / Poster Paper

April 21-25, 2008 · Beijing, China

2.2 Pivot browsing
The interactive pivot browsing is an iterative process consisting of the following three phases: (1) First, when a user issues a query Q containing tags {q1 , . . . , qn }, the system looks up the TAIL to find the tag atoms for each query tag qi . By merging the tag atoms for each query tag, we obtain a pivot P (Q) = {P A(q1 ), . . . , P A(qn )}. For each tag set Q supported by P (Q), we look up the inverted index of the image database to find images where all tags in Q co-occur. These images are saved as a candidate image set Ican , and all tags associated with them are saved (except for the query tags in Q) as a candidate tag set Tcan for further consideration in the subsequent phases. (2) Second, all candidate tags in Tcan will undergo a selection pass, and the top-K candidates relevant to P (Q) will be obtained. Meanwhile, the relevance value of each tag is saved as its weight. The tag selection method will be discussed in section 2.3. (3) Third, the K tags in the output of the previous phase will be clustered on the fly using a graph-partitioning algorithm as proposed in [4]. The affinity metric for clustering is based on the precomputed tag-to-tag affinity values. These K tags, grouped in their clusters, will then be presented to the user for a new round of tag selection/refinement. Meanwhile, one of the tag clusters (by default the most compact one) will be used to select the images in Ican ­ only candidates with tags which appear in the cluster are selected. Ranking of the output images is based on the relevance between the tag vector of each image and the weighted vector of the cluster. The latter can be obtained from the results of phase 2. A user can certainly choose another tag cluster for image selection and browsing. If a user subsequently adds a new tag to or removes an old one from the query, the pivot browsing process enters the next iteration (goto phase 1).

Figure 1: A Search Result Page for query "window" We perform 200 unique queries on the prototype. Each query is executed for 100 times. Table 1 presents the average CPU time for selecting the candidate image set on the inverted index of the image database (SelectI), generating the top-K relevant tags (SelectT), clustering the K tags (ClusterT), and ranking the results (Rank). The time for creating a pivot is negligible. The results in the table reveal that the query cost is dominated by the selection on the inverted index of the image database. SelectI 535.3 SelectT 15.5 ClusterT 97.5 Rank 78.1

CPU Time (ms)

Table 1: Run-time CPU Cost in Each Phase In conclusion, the pivot browsing scheme realizes effective query expansion and image searching in the tag-space at a low expense of computation and storage. Therefore, it can help users to find the intended results more effectively compared to conventional methods. We believe that pivot browsing can potentially become a general tag-space search paradigm not only limited to images. For future work, we would conduct a usability study on PivotBrowser. We would also consider a comprehensive study on incorporating visual feature comparison, and other tag selection and clustering strategies into it.

2.3 Selecting pivot-relevant tags
The top-K tags are selected from the candidate tag set Tcan based on their relevance to the pivot. The relevance between a candidate tag t and the pivot P (Q) is computed using the tagging statistics as following rel(t, P (Q)) = co(t, P (Q)) · I DF (t) where co(t, P (Q)) is the number of co-occurrences of t and any tag set Q supported by P (Q), on the entire image database. As any image associated with Q must appear in Ican , we can expedite the co-occurrence computation by computing the number of images having tag t in Ican . The IDF values can be obtained from the precomputation.

4.

REFERENCES

3. RESULTS AND CONCLUSION
We implement the PivotBrowser prototype, and evaluate its query performance on a dataset containing 523746 tagged images randomly downloaded from Flickr. The tag universe of the dataset contains 427482 unique tags. Figure 1 shows the PivotBrowser interface with the search results for query of tag "window". The top-right region presents the six clusters of the K tags relevant to the pivot. To give a few examples, cluster1 = {v iew, airplane, condo . . .}, cluster2 = {store, f ashion, display , shopping . . .}, and cluster3 = {white, g reen, red, nikon, canon, blue . . .}. The images shown are the topmost results of cluster1 , which confirm the implied semantics of "sight-view from window" in the cluster.

[1] G. Begelman, P. Keller, and F. Smadja. Automated Tag Clustering: Improving search and exploration in the tag space. In WWW Col laborative Web Tagging Workshop, 2006. [2] S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. J. Inf. Sci., pages 198-208, 2006. [3] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to WordNet: an on-line lexical database. International Journal of Lexicography, 3(4):235-244, 1990. [4] S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM, 2005.

1112