SIGIR 2007 Proceedings Doctoral Consortium Automatic Query-time Generation of Retrieval Expert Coefficients for Multimedia Retrieval Peter Wilkins Centre for Digital Video Processing & Adaptive Information Cluster Dublin City University Dublin, Ireland peter.wilkins@computing.dcu.ie ABSTRACT Content-based Multimedia Information Retrieval can b e defined as the task of matching a multi-modal information need against various comp onents of a multimedia corpus and retrieving relevant elements. Generally the matching and retrieval takes place across multiple `features' which can either b e visual or audio, or can b e high-level or low-level, and each of which can b e seen to b e an indep endent retrieval exp ert. The task of answering a query can thus b e formulated as a combination of experts data fusion problem. Dep ending on the query, each exp ert may p erform differently and so retrieval coefficients can b e used to weight each exp ert to increase overall p erformance. Previous approaches to exp ert coefficient generation have included query-indep endent coefficients, identification of query-classes and machine learning methods, to name a few [3]. The approach I prop ose is different, as it seeks to dynamically create exp ert coefficients which are query-dep endent. This approach is based up on earlier exp eriments [5] where an initial correlation was observed b etween the score distribution of a retrieval exp ert, and its relative p erformance when compared against other exp erts for that query. I have created a basic method which leverages these observations to create query-time coefficients which achieve comparable p erformance to oracle-determined query-indep endent weights, for the exp erts and collections used in the aforementioned exp eriment. Previous research which examined score distribution [1, 4] did so with resp ect to relevance, whereas this work seeks to compare exp ert scores for a given query to determine relative p erformance. In my work I aim to explore this correlation by eliminating p otential bias from the data collections, the retrieval exp erts and the queries used in exp eriments to obtain more robust observations. Using and extending previous investigations into data fusion [2], I will explore where data fusion succeeds in multimedia retrieval, and where it does not. I then aim to refine and extend my existing techniques for automatic coefficient generation to incorp orate the new observations, so as to improve p erformance. Finally I will combine this approach with existing data fusion methods, such as query-class coefficients, with each approach complimenting the other to achieve further p erformance improvements. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Retrieval Models; D.2.8 [Software Engineering]: Metrics--complexity measures, performance measures General Terms Exp erimentation, Measurement Keywords Multimedia information retrieval, data fusion Acknowledgments This research was partly supp orted by the Europ ean Commission, contract FP6-027026 (K-Space) and Science Foundation Ireland, grant 03/IN.3/I361. 1. REFERENCES [1] A. Arampatzis and A. van Hameran. The score-distributional threshold optimization for adaptive binary classification tasks. In SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 285­293, New York, NY, USA, 2001. ACM Press. [2] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. Fusion of effective retrieval strategies in the same information retrieval system. J. Am. Soc. Inf. Sci. Technol., 55(10):859­868, 2004. [3] W. B. Croft. Combining approaches to information retrieval. Advances in Information Retrieval, pages 1­36, 2000. [4] R. Manmatha, T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. In SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267­275, New York, NY, USA, 2001. ACM Press. [5] P. Wilkins, P. Ferguson, and A. F. Smeaton. Using score distributions for querytime fusion in multimedia retrieval. In MIR 2006 - 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006. Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. 924