Extractive Summarization using Inter- and Intra- Event Relevance

Wenjie Li, Mingli Wu and Qin Lu Department of Computing The Hong Kong Polytechnic University {cswjli,csmlwu,csluqin}@comp .polyu.edu.hk

Wei Xu and Chunfa Yuan Department of Computer Science and Technology, Tsinghua University {vivian00,cfyuan}@mail.ts inghua.edu.cn

Abstract
Event-based summarization attempts to select and organize the sentences in a summary with respect to the events or the sub-events that the sentences describe. Each event has its own internal structure, and meanwhile often relates to other events semantically, temporally, spatially, causally or conditionally. In this paper, we define an event as one or more event terms along with the named entities associated, and present a novel approach to derive intra- and inter- event relevance using the information of internal association, semantic relatedness, distributional similarity and named entity clustering. We then apply PageRank ranking algorithm to estimate the significance of an event for inclusion in a summary from the event relevance derived. Experiments on the DUC 2001 test data shows that the relevance of the named entities involved in events achieves better result when their relevance is derived from the event terms they associate. It also reveals that the topic-specific relevance from documents themselves outperforms the semantic relevance from a general purpose knowledge base like Word-Net.

techniques that extract key textual elements, such as keywords (also known as significant terms) as weighed by their tf*idf score, or concepts (such as events or entities) with linguistic and/or statistical analysis. Then, sentences are selected according to either the important textual units they contain or certain types of intersentence relations they hold. Event-based summarization which has emerged recently attempts to select and organize sentences in a summary with respect to events or sub-events that the sentences describe. With regard to the concept of events, people do not have the same definition when introducing it in different domains. While traditional linguistics work on semantic theory of events and the semantic structures of verbs, studies in information retrieval (IR) within topic detection and tracking framework look at events as narrowly defined topics which can be categorized or clustered as a set of related documents (TDT). IR events are broader (or to say complex) events in the sense that they may include happenings and their causes, consequences or even more extended effects. In the information extraction (IE) community, events are defined as the pre-specified and structured templates that relate an action to its participants, times, locations and other entities involved (MUC-7). IE defines what people call atomic events. Regardless of their distinct perspectives, people all agree that events are collections of activities together with associated entities. To apply the concept of events in the context of text summarization, we believe it is more appropriate to consider events at the sentence level, rather than at the document level. To avoid the complexity of deep semantic and syntactic processing, we complement the advantages of statistical techniques from the IR community and structured information provided by the IE community.
369

1. Introduction
Extractive summarization selects sentences which contain the most salient concepts in documents. Two important issues with it are how the concepts are defined and what criteria should be used to judge the salience of the concepts. Existing work has typically been based on

Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 369­376, Sydney, July 2006. c 2006 Association for Computational Linguistics


We propose to extract semi-structured events with shallow natural language processing (NLP) techniques and estimate their importance for inclusion in a summary with IR techniques. Though it is most likely that documents narrate more than one similar or related event, most event-based summarization techniques reported so far explore the importance of the events independently. Motivated by this observation, this paper addresses the task of event-relevance based summarization and explores what sorts of relevance make a contribution. To this end, we investigate intra-event relevance, that is actionentity relevance, and inter-event relevance, that is event-event relevance. While intra-event relevance is measured with frequencies of the associated events and entities directly, inter-event relevance is derived indirectly from a general WordNet similarity utility, distributional similarity in the documents to be summarized, named entity clustering and so on. Pagerank ranking algorithm is then applied to estimate the event importance for inclusion in a summary using the aforesaid relevance. The remainder of this paper is organized as follows. Section 2 introduces related work. Sections 3 introduces our proposed event-based summarization approaches which make use of intra- and inter- event relevance. Section 4 presents experiments and evaluates different approaches. Finally, Section 5 concludes the paper.

based on co-occurrence statistics of the named entity relations and the event connectors involved. The proposed approach claimed to outperform conventional tf*idf approach. Apparently, named entities are key elements in their model. However, the constraints defining events seemed quite stringent. The application of dependency parsing, anaphora and co-reference resolution in recognizing events were presented involving NLP and IE techniques more or less (Yoshioka and Haraguchi, 2004), (Vanderwende, Banko and Menezes, 2004) and (Leskovec, Grobelnik and Fraling, 2004). Rather than pre-specifying events, these efforts extracted (verb)-(dependent relation)-(noun) triples as events and took the triples to form a graph merged by relations. As a matter of fact, events in documents are related in some ways. Judging whether the sentences are salient or not and organizing them in a coherent summary can take advantage from event relevance. Unfortunately, this was neglected in most previous work. Barzilay and Lapata (2005) exploited the use of the distributional and referential information of discourse entities to improve summary coherence. While they captured text relatedness with entity transition sequences, i.e. entity-based summarization, we are particularly interested in relevance between events in event-based summarization. Extractive summarization requires ranking sentences with respect to their importance. Successfully used in Web-link analysis and more recently in text summarization, Google's PageRank (Brin and Page, 1998) is one of the most popular ranking algorithms. It is a kind of graph-based ranking algorithm deciding on the importance of a node within a graph by taking into account the global information recursively computed from the entire graph, rather than relying on only the local node-specific information. A graph can be constructed by adding a node for each sentence, phrase or word. Edges between nodes are established using intersentence similarity relations as a function of content overlap or grammatically relations between words or phrases. The application of PageRank in sentence extraction was first reported in (Erkan and Radev, 2004). The similarity between two sentence nodes according to their term vectors was used to generate links and define link strength. The same idea was followed and investigated exten370

2. Related Work
Event-based summarization has been investigated in recent research. It was first presented in (Daniel, Radev and Allison, 2003), who treated a news topic in multi-document summarization as a series of sub-events according to human understanding of the topic. They determined the degree of sentence relevance to each sub-event through human judgment and evaluated six extractive approaches. Their paper concluded that recognizing the sub-events that comprise a single news event is essential for producing better summaries. However, it is difficult to automatically break a news topic into sub-events. Later, atomic events were defined as the relationships between the important named entities (Filatova and Hatzivassiloglou, 2004), such as participants, locations and times (which are called relations) through the verbs or action nouns labeling the events themselves (which are called connectors). They evaluated sentences


sively (Mihalcea, 2005). Yoshioka and Haraguchi (2004) went one step further toward eventbased summarization. Two sentences were linked if they shared similar events. When tested on TSC-3, the approach favoured longer summaries. In contrast, the importance of the verbs and nouns constructing events was evaluated with PageRank as individual nodes aligned by their dependence relations (Vanderwende, 2004; Leskovec, 2004). Although we agree that the fabric of event constitutions constructed by their syntactic relations can help dig out the important events, we have two comments. First, not all verbs denote event happenings. Second, semantic similarity or relatedness between action words should be taken into account.

occurrences. They roughly relate to "did What". One or more associated named entities are considered as what are denoted by linguists as event arguments. Four types of named entities are currently under the consideration. These are <Person>, <Organization>, <Location> and <Date>. They convey the information of "Who", "Whom", "When" and "Where". A verb or an action noun is deemed as an event term only when it presents itself at least once between two named entities. Events are commonly related with one another semantically, temporally, spatially, causally or conditionally, especially when the documents to be summarized are about the same or very similar topics. Therefore, all event terms and named entities involved can be explicitly connected or implicitly related and weave a document or a set of documents into an event fabric, i.e. an event graphical representation (see Figure 1). The nodes in the graph are of two types. Event terms (ET) are indicated by rectangles and named entities (NE) are indicated by ellipses. They represent concepts rather than instances. Words in either their original form or morphological variations are represented with a single node in the graph regardless of how many times they appear in documents. We call this representation an event map, from which the most important concepts can be pick out in the summary.

3. Event-based Summarization
3.1. Event Definition and Event Map Events can be broadly defined as "Who did What to Whom When and Where". Both linguistic and empirical studies acknowledge that event arguments help characterize the effects of a verb's event structure even though verbs or other words denoting event determine the semantics of an event. In this paper, we choose verbs (such as "elect") and action nouns (such as "supervision") as event terms that can characterize or partially characterize actions or incident

<Organization> America Online </Organization> was to buy <Organization> Netscape </Organization> and forge a partnership with <Organization> Sun </Organization>, benefiting all three and giving technological independence from <Organization> Microsoft </Organization>.

Figure 1 Sample sentences and their graphical representation The advantage of representing with separated action and entity nodes over simply combining them into one event or sentence node is to provide a convenient way for analyzing the relevance among event terms and named entities either by their semantic or distributional similarity. More importantly, this favors extraction of concepts and brings the conceptual compression available. We then integrate the strength of the connections between nodes into this graphical model in terms of the relevance defined from different perspectives. The relevance is indicated by r (nodei , node j ) , where nodei and node j represent two nodes, and are either event terms ( eti ) or named entities ( ne j ). Then, the significance of each node, indicated by w(nodei ) , is calcu-

371


lated with PageRank ranking algorithm. Sections 3.2 and 3.3 address the issues of deriving r (nodei , node j ) according to intra- or/and interevent relevance and calculating w(nodei ) in detail. 3.2 Intra- and Inter- Event Relevance We consider both intra-event and inter-event relevance for summarization. Intra-event relevance measures how an action itself is associated with its associated arguments. It is indicated as R( ET , NE ) and R( NE , ET ) in Table 1 below. This is a kind of direct relevance as the connections between actions and arguments are established from the text surface directly. No inference or background knowledge is required. We consider that when the connection between an event term eti and a named entity ne j is symmetry, then R( NE , ET ) = R( ET , NE )T . Events are related as explained in Section 2. By means of inter-event relevance, we consider how an event term (or a named entity involved in an event) associate to another event term (or another named entity involved in the same or different events) syntactically, semantically and distributionally. It is indicated by R( ET , ET ) or R ( NE , NE ) in Table 1 and measures an indirect connection which is not explicit in the event map needing to be derived from the external resource or overall event distribution.
Named Entity (NE) Event Term (ET) R ( ET , NE ) Named Entity (NE) R ( NE , ET ) R ( NE , NE ) Table 1 Relevance Matrix Event Term (ET) R ( ET , ET )

or in our case event terms, based on WordNet (Pedersen, Patwardhan and Michelizzi, 2004). It supports three measures. The one we choose is the function lesk.
rWordNet (et i , et j ) = similarity (et i , et j ) = lesk (et i , et j )

(E2) Alternatively, term relevance can be measured according to their distributions in the specified documents. We believe that if two events are concerned with the same participants, occur at same location, or at the same time, these two events are interrelated with each other in some ways. This observation motivates us to try deriving event term relevance from the number of name entities they share.
rDocument (et i , et j ) =| NE (eti )  NE (et j ) |

(E3)

Where NE (et i ) is the set of named entities eti associate. | | indicates the number of the elements in the set. The relevance of named entities can be derived in a similar way.
rDocument (nei , ne j ) =| ET (nei )  ET (ne j ) | (E4)

The relevance derived with (E3) and (E4) are indirect relevance. In previous work, a clustering algorithm, shown in Figure 2, has been proposed (Xu et al, 2006) to merge the named entity that refer to the same person (such as
Ranariddh, Prince Norodom Ranariddh and President Prince Norodom Ranariddh). It is used for

The complete relevance matrix is:
 R( ET , ET ) R( ET , NE )  R=   R ( NE , ET ) R( NE , NE )

The intra-event relevance R( ET , NE ) can be simply established by counting how many times eti and ne j are associated, i.e.
rDocument (et i , ne j ) = freq (et i , ne j )

co-reference resolution and aims at joining the same concept into a single node in the event map. The experimental result suggests that merging named entity improves performance in some extend but not evidently. When applying the same algorithm for clustering all four types of name entities in DUC data, we observe that the name entities in the same cluster do not always refer to the same objects, even when they are indeed related in some way. For example, "Mississippi" is a state in the southeast United States, while "Mississippi River" is the secondlongest rever in the United States and flows through "Mississippi".
Step1: Each name entity is represented by nei = wi1 wi 2 ...wik , where wi is the ith word in it. The cluster it belongs to, indicated by C (nei ) , is initialled by wi1 wi 2 ...wik itself. Step2: For each name entity nei = wi1wi 2 ...wik For each name entity

(E1)

One way to measure the term relevance is to make use of a general language knowledge base, such as WordNet (Fellbaum 1998). WordNet::Similarity is a freely available software package that makes it possible to measure the semantic relatedness between a pair of concepts,
372


ne j = w j1w j 2 ...w jl , if C (nei ) is a

sub-string of C (ne j ) , then
C (nei ) = C (ne j ) .

equation calculating w(nodei ) using PageRank of a certain nodei is shown as follows.
w(nodei ) = (1 - d ) + d ( w(node j ) w(node1 ) + ... r (nodei , node1 )

Continue Step 2 until no change occurs. Figure 2 The algorithm proposed to merge the named entities Location Person Date Organization
Mississippi Professor Sir Richard Southwood Mississippi Sir Richard River Southwood Richard Southwood first six Long Beach months of City Council last year last year San Jose City Council City Council

w(nodet ) + + ... + ) r (nodei , node j ) r (nodei , nodet )

(E7)

In (E7), node j ( j = 1, 2,...t , j  i ) are the

nodes linking to nodei . d is the factor used to avoid the limitation of loop in the map structure. It is set to 0.85 experimentally. The significance of each sentence to be included in the summary is then obtained from the significance of the Table 2 Some results of the named entity events it contains. The sentences with higher merged significance are picked up into the summary as long as they are not exactly the same sentences. It therefore provides a second way to measure We are aware of the important roles of informanamed entity relevance based on the clusters tion fusion and sentence compression in sumfound. It is actually a kind of measure of lexical mary generation. However, the focus of this pasimilarity. per is to evaluate event-based approaches in ex1, nei , ne j are in the same cluster tracting the most important sentences. ConceprCluster (nei , ne j ) =  tual extraction based on event relevance is our 0, otherwise future direction. (E5) In addition, the relevance of the named entities can be sometimes revealed by sentence context. Take the following most frequently used sentence patterns as examples:
<Person>, a-position-name of <Organization>, does something. <Person> and another <Person> do something.

4. Experiments and Discussions
To evaluate the event based summarization approaches proposed, we conduct a set of experiments on 30 English document sets provide by the DUC 2001 multi-document summarization task. The documents are pre-processed with GATE to recognize the previously mentioned four types of name entities. On average, each set contains 10.3 documents, 602 sentences, 216 event terms and 148.5 name entities.

Figure 3 The example patterns Considering that two neighbouring name entities in a sentence are usually relevant, the following window-based relevance is also experimented with.

To evaluate the quality of the generated summaries, we choose an automatic summary evaluation metric ROUGE, which has been used in DUCs. ROUGE is a recall-based metric for rPattern (nei , ne j ) fixed length summaries. It bases on N-gram co1, nei , ne j are within a pre - specified window size occurrence and compares the system generated = summaries to human judges (Lin and Hovy, 0, otherwise 2003). For each DUC document set, the system (E6) creates a summary of 200 word length and pre3.3 Significance of Concepts sent three of the ROUGE metrics: ROUGE-1 (unigram-based), ROUGE-2 (bigram-based), The significance score, i.e. the weight and ROUGE-W (based on longest common subw(nodei ) of each nodei , is then estimated recursequence weighed by the length) in the followsively with PageRank ranking algorithm which ing experiments and evaluations. assigns the significance score to each node according to the number of nodes connecting to it We first evaluate the summaries generated as well as the strength of their connections. The based on R( ET , NE ) itself. In the pre-evaluation experiments, we have observed that some fre373


quently occurring nouns, such as "doctors" and "hospitals", by themselves are not marked by general NE taggers. But they indicate persons, organizations or locations. We compare the ROUGE scores of adding frequent nouns or not to the set of named entities in Table 3. A noun is considered as a frequent noun when its frequency is larger than 10. Roughly 5% improvement is achieved when high frequent nouns are taken into the consideration. Hereafter, when we mention NE in latter experiments, the high frequent nouns are included.
NE Without High NE With High Frequency Nouns Frequency Nouns ROUGE-1 0.33320 0.34859 ROUGE-2 0.06260 0.07157 ROUGE-W 0.12965 0.13471 Table 3 ROUGE scores using R( ET , NE ) itself
R ( ET , NE )

"Louisiana" and "Florida". Although their relevance is not as explicit as the same of event terms (their relevance is more contextual than semantic), we can still deduce that some events may happen in both Louisiana and Florida, or about Andrew in Florida. In addition, it also shows that the relevance we would have expected to be derived from patterns and clustering can also be discovered by RDocument ( NE , NE ) . The window size is set to 5 experimentally in window-based practice.
Relevance Relevance Relevance from from from WindowDocuments Clustering based Context ROUGE-1 0.35212 0.33561 0.34466 ROUGE-2 0.07107 0.07286 0.07508 ROUGE-W 0.13603 0.13109 0.13523 Table 5 ROUGE scores using R( NE , NE ) itself
R ( NE , NE )

Table 4 below then presents the summarization results by using R ( ET , ET ) itself. It compares two relevance derivation approaches, RWordNet and RDocument . The topic-specific relevance derived from the documents to be summarized outperforms the general purpose Word-Net relevance by about 4%. This result is reasonable as WordNet may introduce the word relatedness which is not necessary in the topic-specific documents. When we examine the relevance matrix from the event term pairs with the highest relevant, we find that the pairs, like "abort" and "confirm", "vote" and confirm", do reflect semantics (antonymous) and associated (causal) relations to some degree.
Semantic Rele- Topic-Specific vance from Relevance from Word-Net Documents ROUGE-1 0.32917 0.34178 ROUGE-2 0.05737 0.06852 ROUGE-W 0.11959 0.13262 Table 4 ROUGE scores using R( ET , ET ) itself
R ( ET , ET )

Next,

we

evaluate

the

integration

of

R ( ET , NE ) , R ( ET , ET ) and R ( NE , NE ) . As

Surprisingly, the best individual result is from document distributional similarity R Document ( NE , NE ) in Table 5. Looking more closely, we conclude that compared to event terms, named entities are more representative of the documents in which they are included. In other words, event terms are more likely to be distributed around all the document sets, whereas named entities are more topic-specific and therefore cluster in a particular document set more. Examples of high related named entities in relevance matrix are "Andrew" and "Florida",
374

DUC 2001 provides 4 different summary sizes for evaluation, it satisfies our desire to test the sensibility of the proposed event-based summarization techniques to the length of summaries. While the previously presented results are evaluated on 200 word summaries, now we move to check the results in four different sizes, i.e. 50, 100, 200 and 400 words. The experiments results show that the event-based approaches indeed prefer longer summaries. This is coincident with what we have hypothesized. For this set of experiments, we choose to integrate the best method from each individual evaluation presented previously. It appears that using the named entities relevance which is derived from the event terms gives the best ROUGE scores in almost all the summery sizes. Compared with the results provided in (Filatova and Hatzivassiloglou, 2004) whose average ROUGE-1 score is below 0.3 on the same data set, the significant improvement is revealed. Of course, we need to test on more data in the future.
R ( NE , NE )

ROUGE-1 ROUGE-2 ROUGE-W R ( ET , NE ) ROUGE-1 ROUGE-2 ROUGE-W R ( ET , ET )

50 100 200 400 0.22383 0.28584 0.35212 0.41612 0.03376 0.05489 0.07107 0.10275 0.10203 0.11610 0.13603 0.13877 50 100 200 400 0.22224 0.27947 0.34859 0.41644 0.03310 0.05073 0.07157 0.10369 0.10229 0.11497 0.13471 0.13850 50 100 200 400


ROUGE-1 0.20616 0.26923 0.34178 0.41201 ROUGE-2 0.02347 0.04575 0.06852 0.10263 ROUGE-W 0.09212 0.11081 0.13262 0.13742 R ( ET , NE ) + 50 100 200 400 R ( ET , ET ) + R ( NE , NE ) ROUGE-1 0.21311 0.27939 0.34630 0.41639 ROUGE-2 0.03068 0.05127 0.07057 0.10579 ROUGE-W 0.09532 0.11371 0.13416 0.13913 Table 6 ROUGE scores using complete R matrix and with different summary lengths

order to build a more powerful event-based summarization system. This would be one of our future directions. We also want to see how concepts rather than sentences are selected into the summary in order to develop a more flexible compression technique and to know what characteristics of a document set is appropriate for applying event-based summarization techniques.

Acknowledgements
The work presented in this paper is supported partially by Research Grants Council on Hong Kong (reference number CERG PolyU5181/03E) and partially by National Natural Science Foundation of China (reference number: NSFC 60573186).

As discussed in Section 3.2, the named entities in the same cluster may often be relevant but not always be co-referred. In the following last set of experiments, we evaluate the two ways to use the clustering results. One is to consider them as related as if they are in the same cluster and derive the NE-NE relevance with (E5). The other is to merge the entities in one cluster as one reprehensive named entity and then use it in ET-NE with (E1). The rationality of the former approach is validated.
Clustering is Clustering is used to used to derive merge entities and NE-NE then to derive ET-NE ROUGE-1 0.34072 0.33006 ROUGE-2 0.06727 0.06154 ROUGE-W 0.13229 0.12845 Table 7 ROUGE scores with regard to how to use the clustering information

References
Chin-Yew Lin and Eduard Hovy. 2003. Automatic Evaluation of Summaries using N-gram Cooccurrence Statistics. In Proceedings of HLTNAACL 2003, pp71-78. Christiane Fellbaum. 1998, WordNet: An Electronic Lexical Database. MIT Press. Elena Filatova and Vasileios Hatzivassiloglou. 2004. Event-based Extractive summarization. In Proceedings of ACL 2004 Workshop on Summarization, pp104-111. Gunes Erkan and Dragomir Radev. 2004. LexRank: Graph-based Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research. Jure Leskovec, Marko Grobelnik and Natasa MilicFrayling. 2004. Learning Sub-structures of Document Semantic Graphs for Document Summarization. In LinkKDD 2004. Lucy Vanderwende, Michele Banko and Arul Menezes. 2004. Event-Centric Summary Generation. In Working Notes of DUC 2004. Masaharu Yoshioka and Makoto Haraguchi. 2004. Multiple News Articles Summarization based on Event Reference Information. In Working Notes of NTCIR-4, Tokyo. MUC-7. http://www-nlpir.nist.gov/related_projects/ muc/proceeings/ muc_7_toc.html Naomi Daniel, Dragomir Radev and Timothy Allison. 2003. Sub-event based Multi-document Summarization. In Proceedings of the HLT-NAACL 2003 Workshop on Text Summarization, pp9-16.

5. Conclusion
In this paper, we propose to integrate eventbased approaches to extractive summarization. Both inter-event and intra-event relevance are investigated and PageRank algorithm is used to evaluate the significance of each concept (including both event terms and named entities). The sentences containing more concepts and highest significance scores are chosen in the summary as long as they are not the same sentences. To derive event relevance, we consider the associations at the syntactic, semantic and contextual levels. An important finding on the DUC 2001 data set is that making use of named entity relevance derived from the event terms they associate with achieves the best result. The result of 0.35212 significantly outperforms the one reported in the closely related work whose average is below 0.3. We are interested in the issue of how to improve an event representation in

375


Page Lawrence, Brin Sergey, Motwani Rajeev and Winograd Terry. 1998. The PageRank Citation Ranking: Bring Order to the Web. Technical Report, Stanford University. Rada Mihalcea. 2005. Language Independent Extractive Summarization. ACL 2005 poster. Regina Barzilay and Michael Elhadad. 2005. Modelling Local Coherence: An Entity-based Approach. In Proceedings of ACL, pp141-148. TDT. http://projects.ldc.upenn.edu/TDT. Ted Pedersen, Siddharth Patwardhan and Jason Michelizzi. 2004. WordNet::Similarity ­ Measuring the Relatedness of Concepts. In Proceedings of AAAI, pp25-29. Wei Xu, Wenjie Li, Mingli Wu, Wei Li and Chunfa Yuan. 2006. Deriving Event Relevance from the Ontology Constructed with Formal Concept Analysis, in Proceedings of CiCling'06, pp480489.

376