Theoretical Benchmarks of XML Retrieval Tobias Blanke Queen Mary College, University of London London, United Kingdom Mounia Lalmas Queen Mary College, University of London London, United Kingdom tobias@dcs.qmul.ac.uk mounia@dcs.qmul.ac.uk ABSTRACT This p oster investigates the use of theoretical b enchmarks to describ e the matching functions of XML retrieval systems and the prop erties of sp ecificity and exhaustivity in XML retrieval. Theoretical b enchmarks concern the formal representation of qualitative prop erties of IR models. To this end, a Situation Theory framework for the meta-evaluation of XML retrieval is presented. Categories and Sub ject Descriptors: H.3.3 Information Search and Retrieval General Terms: Theory, Measurement Keywords: Meta-evaluation, XML retrieval 1. INTRODUCTION The aim of XML retrieval is to retrieve not only relevant document comp onents, but those at the right level of granularity, i.e. document comp onents that sp ecifically answer a query. To evaluate how effective XML retrieval approaches are, it is therefore necessary to consider whether the 'right' level is correctly identified. In 2004, INEX, the evaluation initiative for XML retrieval, used two four-graded dimensions of relevance: (1) exhaustivity reflects to which extent the document comp onent satisfies the information need, and (2) sp ecificity refers to the extent to which all the information in the document comp onent is ab out the information need. A scale from 0 to 3 is used to measure how sp ecific or exhaustive a document comp onent is in relation to an information need, where 0 means that the effect is not measurable and 3 means that the comp onent is highly sp ecific/exhaustive. E.g., a (3,3) result designates a highly exhaustive and sp ecific answer. Van Rijsb ergen [6] suggested, that given the increasing complexity of the retrieval task due to more complex information units like XML elements, an exp erimental approach to information retrieval (IR) should b e complemented with a theoretical evaluation technique. Therefore, this p oster will investigate the use of theoretical b enchmarks to describ e the matching functions of XML retrieval systems and the properties of sp ecificity and exhaustivity. 2. THEORETICAL EVALUATION IN IR of retrieval models and if it can identify the characteristics leading to a particular exp erimental b ehaviour. INEX has used various relevance scales; for demonstration purp oses we use the INEX 2004 scale [5]. A theoretical evaluation can b e done through the use of a meta-theory, as prop osed in previous work based on the logical approach to IR [4]. In 1971, Coop er coined the term 'logical relevance' for an ob jective view on relevance [3]. Van Rijsb ergen and others have expressed the logical relevance in terms of the implication d q [6]. Following Huib ers' formalism and approach [4], we call such an implication b etween query and document 'ab outness'. With ab outness, we aim to theoretically capture the b enchmarks of an IR model in general and an XML retrieval model in particular. Theoretical b enchmarks [7] concern the formal representation of qualitative prop erties of IR models. The prop erties are describ ed in terms of supp orted logical axioms and p ostulates. We use Situation Theory (ST), develop ed by Barwise and Perry [1], as our logic-based model for XML retrieval. ST offers a logic of information rather than truth assignments and is therefore closer to real-world applications. ST is a mathematical theory of meaning and information with situations as primitives. Situations are partial descriptions of the world and are comp osed of information items formalised as infons [4]. For IR modelling, queries and documents are modelled as situations, while infons represent a model's information items like keywords or phrases. Theoretical b enchmarks need formalisms, p owerful enough to mark the fundamental prop erties of retrieval models. In this p oster we only present the fundamentals of the formalism of a ST-based ab outness language for XML retrieval. If we consider XML elements to b e XML situations, then S T means that the XML situation S is ab out T . In order to describ e an XML retrieval model, this could express that document comp onent situation S is ab out the information need in query T . Likewise S / T symb olises that S is not ab out T . With we formalise the comp osition of situations. Preclusion, symb olised by , expresses that information clashes and leads to anti-ab outness, for which the symb ol is used. states that two situations are equivalent, e.g. two document comp onents containing the same information. Theoretical evaluation can b e complementary to an exp erimental evaluation if it helps to clarify the assumptions Copyright is held by the author/owner(s). SIGIR'06, August 6­11, 2006, Seattle, Washington, USA. ACM 1-59593-369-7/06/0008. 3. THEORETICAL BENCHMARKS OF XML RETRIEVAL This section describ es how we can apply a theoretical b enchmark, based on ST, to show how XML approaches provide exhaustive or sp ecific answers to queries. Firstly, 613 Scale 0 1 2 3 Table 1: INEX exhaustivity and specificity situations Exhaustivity Specificity D Q Q D D1 / Q, ..., Dn / Q Q1 / D, ..., Qm / D D / Q Q / D D1 / Q , ... , Di Q , ... , Dn / Q Q1 / D , ... , Qi D , ... , Qm / D D / Q , ... , D Q , ... , D / Q Q / D , ... , Q D , ... , Q / D D1 / Q, ..., Di Q, ..., Dn / Q Q1 / D, ..., Qi D, ..., Qm / D D Q Q D D1 Q, ..., Dn Q Q1 D, ..., Qn D D Q Q D ing, leading to very small indications ab out the document comp onent's relevance. Table 1 shows for scale 1 multiple conclusions demonstrating undecidedness ab out the comp onent's relevance. For scale 2 the overall conclusion can b e derived that D Q or Q D. For sp ecificity, the topic is a ma jor theme of the document comp onent and Q D can b e concluded. Scale 0 indicates that no Di is ab out the query making the whole document comp onent not ab out the query. The highest satisfaction is achieved with scale 3. For exhaustivity, all subsituations of the comp onent are ab out the query, while for sp ecificity all subsituations of the query are ab out the document comp onent. Looking at combined exhaustivity and sp ecificity assessments in Table 1, users focussed on sp ecificity want results like (1,3), (2,3) or (3,3). They want to b e able to conclude Q D, but care less ab out how many document comp onent subsituations are ab out the query situation. A (3,3) result is achieved if all subsituations of the query are ab out the document comp onent situation and vice versa. Therefore, (3,3) describ es a p erfect match. we look at the prop erties of XML ab outness systems that result in either higher exhaustivity or sp ecificity. Secondly, we describ e with our formalism the two dimensions of the XML retrieval relevance assessment: exhaustivity and sp ecificity. Some logical prop erties of an XML retrieval model supp ort exhaustivity, while others supp ort sp ecificity. The monotonicity p ostulates are an example of logical prop erties. They claim that ab outness is preserved under comp osition. E.g. Left Monotonic Union (LMU) states that if situation S is ab out T , then S U is also ab out T . S S T U T In an ab outness model unconditionally supp orting LMU, a query situation containing 'house' does not only lead to document comp onent situations having 'house', but equally valid answers are comp onents with 'house' and 'garden'. However, the right level of granularity is missed. A naive vector space model based on simple overlap supp orts b oth left and right monotonic union [4] and cannot lead to the retrieval of highly sp ecific answers. Being able to provide sp ecific answers is only p ossible from models supp orting LMU only conditionally, as for example the vector space models with trained parameters or probabilistic models do [7]. Such models regularly achieve the b est results at INEX. Yet, models not supp orting LMU at all, neglect exhaustivity. Using our ST-based framework, we can formally represent the two relevance dimensions used in INEX and their scale. In earlier work, Chiaramella [2] demonstrated that to capture the relevance of structured documents, two implications like those ab ove from Rijsb ergen should b e used: d q modelling exhaustivity and q d modelling sp ecificity. Table 1 shows how we can express these implications with our ST framework. D stands for the document comp onent situation, and Q for the query situation. Then, D Q models exhaustivity and Q D models sp ecificity. So far, we have discussed how a theoretical evaluation could help to understand exp erimental results for XML retrieval. We continue by modelling how a user p erceives how exhaustive and sp ecific a comp onent is. We argue that a user will assess the relevance of a comp onent according to the information contained in b oth Q and D. In Table 1, the document comp onent and query situations are divided into sub-situations, with D D1 ... Dn and Q Q1 ... Qm . Scale 1 states that only some subsituations are ab out the query but none involves anti-ab outness. With this, we can e.g. formalise what is meant by a marginally exhaustive document comp onent (1): the topic is only mentioned in pass- 4. CONCLUSIONS AND FUTURE WORK A ST-based meta-evaluation of XML retrieval can firstly demonstrate b enchmarks of XML retrieval models that make them appropriate to particular tasks, and secondly formally represent the exhaustivity and sp ecificity of document comp onents. In the future, we would like to continue our work by elab orating our b enchmarks and going into more detail regarding the different INEX metrics. A theoretical evaluation could assist in the difficult INEX discussion ab out the correct metric [5] and help understand b etter the implications of the two measures of exhaustivity and sp ecificity. 5. REFERENCES [1] J. Barwise and J. Perry. Situations and Attitudes. MIT Press, Cambridge, MA, 1982. [2] Y. Chiaramella. Information retrieval and structured do cuments. In Lectures on information retrieval, pages 286­309. Springer-Verlag, New York, 2001. [3] W. Co op er. A definition of relevance for information retrieval. Information Storage and Retrieval, 7:19­37, 1971. [4] T. W. Huib ers. An Axiomatic Theory for Information Retrieval. PhD-Thesis, Universiteit Utrecht, Utrecht, 1996. [5] G. Kazai and M. Lalmas. Notes on what to measure in INEX. In INEX 2005 Workshop on Element Retrieval Methodology, Glasgow, July 2005. [6] C. J. v. Rijsb ergen. Towards an information logic, 1989. 77-86. [7] D. Song, K.-F. Wong, P. Bruza, and C.-H. Cheng. Application of ab outness to functional b enchmarking in information retrieval. ACM Trans. Inf. Syst., 19(4):337­370, 2001. 614