SIGIR 2007 Proceedings Poster Exploration of the Tradeoff between Effectiveness and Efficiency for Results Merging in Federated Search Suleyman Cetintas, Luo Si Department of Computer Science Purdue University {scetinta,lsi}@cs.purdue.edu ABSTRACT Federated search is the task of retrieving relevant documents from different information resources. One of the main research problems in federated search is to combine the results from different sources into a single ranked list. Recent work proposed a regression based method to download some documents from each ranked list of the different sources, calculated comparable scores for the documents and estimated mapping functions that transform source-specific scores into comparable scores. Experiments have shown that downloading more documents improves the accuracy of results merging. However downloading more documents increases the computation and communication costs. This paper proposes a utility based optimization method that enables the system to automatically decide on the desired number of training documents to download according to the user's need for effectiveness and efficiency. approach for the results-merging task [4]. This method tries to obtain the contents of a small number of documents in the ranked lists of each information source either by downloading some retrieved documents from different sources or from the sampled documents by query-based sampling [1] (when they are available). Then an effective centralized retrieval algorithm can be applied to calculate comparable document scores for these documents. This set of documents has both centralized comparable document scores and source-specific document scores. A source-specific logistic transformation model is estimated with the training documents for each information source to transform source-specific document scores to centralized comparable document scores. Furthermore, all documents can be sorted with respect to their estimated centralized documents scores into a final ranked list. Experiments have shown that when the number of documents downloaded as the training documents is increased, the effectiveness increases as well. However downloading a large number of documents for training data requires a high amount of online communication and computation costs. This paper proposes an automated approach to decide on the desired number of documents to download for balanced effectiveness and efficiency. This method considers the effectiveness factor by the mean average precision of the merged results. With a set of training queries with relevance judgment, a logistic regression model is built to estimate the mean average precision values when different number of training documents can be downloaded. The efficiency factor is measured by the number of documents to download. Then for a test query, a utility based optimization method decides on the number of documents to download depending on a user-defined tradeoff parameter between effectiveness and efficiency. Categories and Subject Descriptors H3.3 [Information Search and Retrieval] General Terms Experimentation, Performance, Theory. Keywords Effectiveness and Efficiency Tradeoff, Federated Search, Results Merging. 1. INTRODUCTION The proliferation of online searchable text information sources on local area networks and the Internet creates a problem of finding information that may be distributed among many information sources (federated search) [1]. One of the main research topics of federated search is to merge the search results returned from different information sources into a single ranked list. This task is called results merging. Results merging is difficult because each source may use a different ranking algorithm, and may base its ranking on heterogeneous corpus statistics. Merging based upon unnormalized ("raw") document scores or ranks works well when the retrieval algorithms and corpora of different sources are very similar, but can be very inaccurate when they differ. Merging based upon weighted document scores or ranks [1] is more effective, but is heuristic and prone to unexpected failure. An alternative solution is to download the contents of all retrieved documents and then recalculate their scores by a centralized retrieval algorithm, which produces a consistent ranking but can be very time-consuming. Prior research has proposed an effective regression-based Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. 2. METHOD The method that we propose is composed of two parts. First, a logistic regression model is constructed to estimate the mean average precision values of merged results when different amount of training documents are downloaded for building regression models. The logistic model to estimate the mean average precision value is as follows: MAP( y ) = ea* y +b /(1 + ea* y +b ) (1) where y is the number of downloaded training documents and a, b are the logistic model parameters. The logistic model is trained by the training queries and their pairs. Second, a utility based optimization problem can be formed by exploring the trade off between effectiveness and efficiency. The optimization problem is as follows: 707 SIGIR 2007 Proceedings Poster U ( y ) = * * MAP ( y ) - (1 - ) * y arg max U ( y ) y (2) where U ( y ) is the utility value by downloading y documents for results merging. is a constant that is set to be 250 for adjusting the range of the MAP value. is the user defined tradeoff parameter for effectiveness and efficiency. The user informs the system of his/her preference between effectiveness and efficiency; and the system finds the desired number of training documents to download by computing the y value that maximizes the utility based optimization problem and so by deciding whether it's worth to download more training documents which increases effectiveness and sacrifices efficiency depending on . 3. EXPERIMENTS In this section, we present the experimental results of the results merging task by comparing the proposed technique of choosing the number of documents to download for maximizing utility and a technique that always chooses a fixed number of documents to download. The experiments were conducted on two testbeds with eight multilingual information sources, namely the UNINE testbed (eight ranked lists from the UniNE system [3]) and the Hum testbed (eight ranked lists from Hummingbird system [2]). We used 20 training queries (i.e., queries 141-160) with relevance judgments to train our logistic regression model in Equation 1 and 40 test queries (i.e., queries 161-200) to test our system. The experiments only utilized downloaded documents as training data for building regression models. The results merging task is to merge eight ranked lists in different languages into a single multilingual ranked list. More detailed information about the testbeds can be found in [4]. Table 1. Utility values for the UniNE Testbed. T_X represents the merging method which always downloads top X documents for the training process. We have conducted the experiments for T_X where X is in {3, 5, 10, 15, 30, 100} with different values of is in {0.1, 0.25, 0.5, 0.75, 0.9}. T_Opt represents the merging method which downloads a desired number of top ranked documents for the training process. = 0.9 corresponds to the case in which effectiveness is of higher importance, and = 0.1 corresponds to the case where the main focus is efficiency. The higher the value is set by the user, the more the training documents are downloaded by the system, and the higher the utility value in Equation 2 is, as shown in Table1&2. The utility values for all the T_X methods are directly calculated using human relevance, when the system strictly downloads a fixed number of documents for training. The utility values of the T_Opt model are calculated using human relevance judgment, while system chooses a desirable number of training documents to download with respect to different values. From the results shown in Table 1 and Table 2, it can be seen that no single T_X method can obtain consistent good utility values for different values. However, the T_Opt method can always achieve good utility values for different values by automatically choosing different desired number of documents to download. This explicitly demonstrates the power of the T_Opt method for exploring the trade off between effectiveness and efficiency. 4. CONCLUSION Results merging is an important task for federated search. Regression-based results merging method has been shown as an effective merging method. However, no research has been conducted to explore the trade off between the effectiveness of the regression-based merging method and the efficiency of the merging method measured by the numbers of documents for downloading. This paper proposes an automated approach to decide a desired number of training documents to download for the results merging task by considering both effectiveness and efficiency. Experiment results have demonstrated the advantage of the proposed method to achieve good utility values in different configurations. There are several possibilities to extend the research. The current method uses the same model for estimating mean average precision with respect to different number of documents to download for different queries. A more sophisticated queryspecific estimation model may provide more accurate estimation for mean average precision with respect to different characteristics of different queries. Values 0.5 41.845 43.809 43.254 41.509 33.559 -1.0725 43.626 0.75 64.267 68.213 69.881 69.763 65.338 48.391 69.763 0.9 77.721 82.856 85.857 86.716 84.406 78.069 85.531 M T_3 T_5 T_10 T_15 T_30 T_100 T_Opt 0.1 5.969 4.7618 0.6507 -3.6982 -17.288 -80.215 6.6885 0.25 19.422 19.404 16.627 13.254 1.7794 -50.536 19.721 Table 2. Utility values for the Hum Testbed. Values 0.5 27.139 26.789 25.181 22.912 15.836 -18.946 28.624 0.75 42.208 42.683 42.772 41.869 38.754 21.581 42.956 0.9 51.25 52.22 53.326 53.243 52.505 45.897 52.72 5. REFERENCES [1] Callan, J. 2000. "Distributed information retrieval" In W.B. Croft, editor, Advances in information retrieval. Kluwer Academic Publisher. [2] HummingBird System http://www.hummingbird.com/products/searchserver/ [3] Savoy, J. 2003. Report on CLEF-2003 Experiments. In C. Peters(Ed.), Results of the CLEF2003 cross-language evaluation forum [4] Si, L. and Callan, J. 2005. "CLEF2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists". In C. Peters (Ed.), Results of the CLEF2005 crosslanguage evaluation forum. M T_3 T_5 T_10 T_15 T_30 T_100 T_Opt 0.1 3.0278 1.3577 -2.9638 -7.4175 -20.833 -83.789 4.1248 0.25 12.069 10.894 7.5906 3.9562 -7.0819 -59.473 13.312 708