SIGIR 2007 Proceedings Doctoral Consortium A Summarisation Logic for Structured Documents Jan Frederik Forst Queen Mary, University of London London, E1 4NS UK frederik@dcs.qmul.ac.uk ABSTRACT The logical approach to Information Retrieval tries to model the relevance of a document given a query as the logical implication between documents and queries. In early work, van Rijsbergen states that the retrieval status value of a document given a query is proportional to the degree of implication between a document and a query. Based on this, several probabilistic logics for information retrieval have been conceived, which add an additional layer of abstraction to the information retrieval task: probabilistic models for the retrieval of documents are expressed in those logics, rather than implemented directly. The aim of the research presented here is to develop a logic for the IR task of document summarisation. Such a summarisation logic adds an abstraction layer to the summarisation task, similarly to the way a logic for document retrieval adds a layer to the retrieval task. Probabilistic models for document summarisation will thus be expressed as logical formulae, with the actual implementation hidden by the logical expressions. The "extract-worthyness" of textual components in this logic is measured as the degree of implication from those textual components to their surrounding contexts, providing a measure of how much said components are "about" the context within which they are situated. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Theory Keywords Information retrieval logic, summarisation, structured document retrieval Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. 919