## Submodular Reranking with Multiple Feature Modalities for Image Retrieval

Fan Yang1, Zhuolin Jiang2 and Larry S. Davis1

1 Department of Computer Science, University of Maryland, College Park, United States
2 Noah's Ark Lab, Huawei Technologies, Hong Kong SAR, China

### Abstract

We propose a submodular reranking algorithm to boost image retrieval performance based on multiple ranked lists obtained from multiple modalities in an unsupervised manner. We formulate the reranking problem as maximizing a submodular and non-decreasing objective function that consists of an information gain term and a relative ranking consistency term. The information gain term exploits relationships of initially retrieved images based on a random walk model on a graph, then images similar to the query can be found through their neighboring images. The relative ranking consistency term takes relative relationships of initial ranks between retrieved images into account. It captures both images with similar ranks in the initial ranked lists, and images that are similar to the query but highly ranked by only a small number of modalities. Due to its diminishing returns property, the objective function can be efficiently optimized by a greedy algorithm. Experiments show that our submodular reranking algorithm is effective and efficient in reranking images initially retrieved by multiple modalities. Our submodular reranking framework can be easily generalized to any generic reranking problems for real-time search engines.

### Approach

We aim to address the problem: Given a query image represented by multiple feature modalities, how to improve retrieval quality by fusing these modalities?

We present a submodular objective function for reranking images retrieved by multiple feature modalities, which is very efficient and fully unsupervised. It consists of two terms.

• An information gain term which models pairwise relationships of retrieved images represented by graphs.
• Convert ranked lists to graphs

Denote $$\mathcal{V} = \mathcal{V}_1 \cup \mathcal{V}_2 \cup \cdots \cup \mathcal{V}_M$$, we aim to select a subset of nodes $$\mathcal{S}$$ from $$\mathcal{V}$$ which are the most similar to the query image. Let $$\mathcal{U}$$ denote the set of images which are not selected.

• Information gain term for each graph is defined as $$F_m (\mathcal{S}) = H(\mathcal{V}_m \backslash \mathcal{S}) - H(\mathcal{V}_m \backslash \mathcal{S}|\mathcal{S})$$

which can be represented by a random walk model. The final information gain term of all graphs is $$R(\mathcal{S}) = -\sum_m (\sum_{v \in \mathcal{V}\backslash \mathcal{S}} p_m(v) \log p_m(v) - \sum_{v \in \mathcal{V} \backslash \mathcal{S}, s \in \mathcal{S}} p_m(v,s) \log p_m(v|s))$$

which is a submodular and monotone function.

• A relative ranking consistency term which exploits the inter-relationships among multiple ranked lists obtained by different feature modalities.
• The relative ranking consistency (RRC) is motivated by two criterion: relationships of relative ranks between retrieved images should be maintained, and an image that is similar to the query but highly ranked by only a smaller number of modalities should also be captured.

We define the relative ranking between two images as $$rr_m (v_i,v_j) = | r_{m,v_i} - r_{m,v_j} |, v_i, v_j \in \mathcal{V}$$, where $$r_{m,v_i}$$ is is the position of image $$I_i$$ in the $$m$$-th ranked list. Then the relative ranking consistency measure across multiple ranked lists is $$\mathcal{C} (v_i, v_j) = \frac{1}{Z} \sum_{m,m' \in M, m \neq m'} 1 - \frac{\min (rr_m, rr_{m'})} {K}$$

where $$Z = \frac{M(M-1)}{2}$$ is a normalization factor corresponding to the number of all possible modality pairs. If two images have similar ranks, they have high RRC score and are similarly ranked after reranking. If an image is only highly ranked by a small number of modalities, it still has relatively high RRC score. The complete RRC term is $$T(\mathcal{S}) = (1-q) \sum\nolimits_{s=1}^{|\mathcal{S}|} q^s \cdot \frac{1}{s}\sum\nolimits_{v_i, v_j \in \mathcal{S}, r_{v_i} < r_{v_j} = s} \mathcal{C} (v_i, v_j)$$

$$q$$ is a pre-defined decay weight to give higher ranked images more importance. $$T(\mathcal{S})$$ is also submodular and non-decreasing.

• Combining the information gain and relative ranking consistency terms, we obtain the final objective function $$Q(\mathcal{S})= R(\mathcal{S}) + \lambda T(\mathcal{S})$$ for the reranking problem. The solution is obtained by maximizing the objective function $$\max_{\mathcal{S}} \quad R(\mathcal{S}) + \lambda T(\mathcal{S}) \\ s.t. \quad \mathcal{S} \subseteq \mathcal{V}, |\mathcal{S}| \leq K_s$$

$$K_s$$ is the largest number of selected images, which means we only select and rerank at most $$K_s$$ images. It can be approximately optimized by a greedy algorithm.

### Experimental Results

• Datasets and features

Holidays: 1491 image from 500 categories, where the first image in each category is used as a query.
Ukbench: 10200 images from 2550 objects or scenes.
Oxford and Paris: 5062 and 6412 photos of famous landmarks in Oxford and Paris, respectively. Both datasets have 55 queries, where multiple queries are from the same landmark.

BoW vectors from Hessian affine + SIFT descriptor using single assignment and approximate k-means (AKM). Standard tf-idf weighting is used.
1192-dimension GIST feature.
4000-dimension HSV color feature with 40 bins for H and 10 bins for S and V components.
• Results

Comparisons with state-of-the-art approaches. We N-S score (average top 4 hits) on UKbench and mAP (in %) on other datasets.

Comparisons with other rank aggregation approaches. Runtime (in second) of reranking 1000 images for a single query using direct greedy optimization and lazy evaluation is shown in the right-most columns.

### Code and Datasets

The MATLAB implementation (version 1.0) can be downloaded from here. Due to large size of the pre-computed data (similarity matrices), only an example on Holidays dataset is included. If you would like to use the pre-computed data of the other three datasets, please send me an email. I would be happy to share the data via some other ways.

### Publications

• Submodular Reranking with Multiple Feature Modalities for Image Retrieval
Fan Yang, Zhuolin Jiang and Larry S. Davis
12th Asian Conference on Computer Vision (ACCV), Accepted, 2014. (Oral) [paper][supp. mat.]