SIGIR 2007 Proceedings Poster Opinion Holder Extraction from Author and Authority Viewpoints Yohei Seki Toyohashi University of Technology Aichi, 441-8580, Japan seki@ics.tut.ac.jp ABSTRACT Opinion holder extraction research is important for discriminating between opinions that are viewed from different perspectives. In this paper, we describe our experience of participation in the NTCIR-6 Opinion Analysis Pilot Task by focusing on opinion holder extraction results in Japanese and English. Our approach to opinion holder extraction was based on the discrimination between author and authority viewpoints in opinionated sentences, and the evaluation results were fair with respect to the Japanese documents. 2. NTCIR-6 OPINION ANALYSIS PILOT TASK 2.1 Task & annotation overview The opinion extraction subtask was conducted in Japanese, Chinese, and English. For opinion extraction the participants submitted two mandatory results: opinionated sentence and opinion holder extraction. Five, three, and six teams (14 teams in total) submitted the 21 runs. The test collection size for Japanese, Chinese, and English are 15,279, 11,907, and 8,379 sentences in 490, 843, and 439 documents for 30, 32, and 28 shared topics each. Opinionated sentences and opinion holders in sentences with three holder types were annotated using three annotators in Japanese and English. One sample topic was used for the intercoder session to improve the agreement between the assessors. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--Information Filtering, Selection Process General Terms Experimentation 2.2 Evaluation methodology The evaluation was based on precision, recall, and Fmeasure values obtained: the number of correct opinionated sentences and the correct opinion holder. The correctness was defined using two standards. The lenient standard was based on the agreement between two out of three assessors. The strict standard was based on the agreement between three out of three assessors. The population parameters for the precision and recall values were computed from the total number of sentences assessed. We applied a sentence-based evaluation to evaluate the opinion holders. If multiple holders existed in one sentence, and the system detected one of them, then we regarded the system's extraction as valid. Keywords Opinion Holder, Opinion Extraction, and NTCIR 1. INTRODUCTION Recently, opinion research has been paid much attention by the information retrieval community. Opinion extraction research is divided into three subcategories. The first subcategory is opinion detection, involving opinionated document detection, opinionated sentence/phrase extraction. The second subcategory is polarity detection, involving positive or negative document classification and positive, neutral, or negative sentence/phrase detection. The third subcategory is opinion holder extraction [1]. Opinion holder extraction research is important for discriminating between opinions that are viewed from different perspectives. The first Opinion Analysis Pilot Task was conducted at the NTCIR-6 workshop in 2006­20071 . The author played roles as one of organizers and participants. In this paper, we describe our experience of participation by focusing on the opinion holder extraction results in Japanese and English. 1 http://research.nii.ac.jp/ntcir/ntcir-ws6/opinion/indexen.html 3. OPINION HOLDER EXTRACTION FROM AUTHOR AND AUTHORITY VIEWPOINTS 3.1 Opinionated sentence classification from author and authority viewpoints An automatic opinionated sentence classification from author and authority viewpoints was implemented in terms of two types of opinionated sentence estimation using a SVM2 . As training data for author and authority viewpoints, we utilized the annotation information for opinion holder types [2] in Japanese. If opinionated sentences contain a type 3 opinion holder (an agent expressing expressive sub jective elements), we regard them as having the opinion-from-author viewpoint. If they contain another opinion holder type, 2 Copyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. S V M light from . 841 SIGIR 2007 Proceedings Poster namely type 1 (a person, nation, or organization expressing private states explicitly) or type 2 (an agent of speaking/writing events), we regard them as having the opinionfrom-authority viewpoint. For English training data, we utilized the MPQA corpus 3 . For author and authority viewpoints, we discriminated opinionated sentence types using "nested source" attributes. If the value was "w" (writer), we regard those sentences as having the opinion-from-author viewpoint. Otherwise, we regard them as having the opinionfrom-authority viewpoint. The effective feature set was as follows. · 155 (author) / 569 (authority) syntactic pairs of grammatical sub jects and predicates were used in Japanese. ­ Sub jects were categorized using named entities, semantic primitives, or key terms such as pronouns. ­ Predicates were categorized using semantic primitives from a thesaurus Bunrui-Goi-Hyou. · 565 (author) / 376 (authority) syntactic pairs following five syntactic patterns such as nouns and adjectives/verbs were used in English. ­ Terms were categorized using named entities, semantic hypernyms from a thesaurus, key terms such as pronouns, and polarity term types. ­ Polarity term types were determined using adjective entries and the General Inquirer in English. Syntactic dependency was checked using Cabocha in Japanese and Minipar in English. Table 1: Japanese opinion extraction results Group EHBN-1 EHBN-2 NICT-1,2 TUT L/S L S L S L S L S Opinionated P R F 0.531 0.453 0.489 0.414 0.479 0.444 0.531 0.453 0.489 0.414 0.479 0.444 0.671 0.315 0.429 0.546 0.348 0.425 0.552 0.609 0.579 0.414 0.620 0.497 P 0.138 0.079 0.314 0.183 0.238 0.133 0.226 0.131 Holder R 0.085 0.094 0.097 0.110 0.102 0.110 0.224 0.251 F 0.105 0.086 0.149 0.137 0.143 0.120 0.225 0.172 Table 2: English opinion extraction results Group I IT-1 TUT-1 Cornell NI I GATE-1 ICU-KR L/S L S L S L S L S L S L S Opinionated P R F 0.325 0.588 0.419 0.070 0.578 0.125 0.310 0.575 0.403 0.065 0.553 0.117 0.317 0.651 0.427 0.069 0.662 0.125 0.325 0.624 0.427 0.073 0.642 0.131 0.324 0.905 0.477 0.070 0.940 0.130 0.396 0.524 0.451 0.102 0.616 0.175 P 0.198 0.054 0.117 0.029 0.163 0.041 0.066 0.018 0.121 0.029 0.303 0.085 Holder R 0.409 0.461 0.218 0.241 0.346 0.392 0.166 0.169 0.349 0.398 0.404 0.515 F 0.266 0.097 0.153 0.051 0.222 0.074 0.094 0.032 0.180 0.055 0.346 0.146 (because of lack of training data in English side of NTCIR-6 opinion corpus) was not matched with holder type 3, but rather holder type 1. This is partly from referring style difference for authors between Japanese and English. 3.2 Opinion holder extraction based on opinionated sentence types 5. CONCLUSIONS We proposed opinion and holder extraction system from author and authority viewpoints in Japanese and English. We participated in the NTCIR-6 Opinion Analysis Pilot Task and evaluated the effectiveness of our system. The results show that our system performed fairly well with respect to Japanese documents, but we found that improvements could be made with respect to English documents by conducting post submission analysis. The opinion holder was extracted using a named entity extraction approach. Basically, the author's holder and other authority holder was discriminated using the opininated sentence types in 3.1. In the Japanese case, the author's name was discriminated and extracted from the signature. To determine authority holder elements, we set four-grade priority rules using three named entity elements as follows: (1) bracketed elements of person, organization, and location (prioritized in this order and also was in 2, 3, and 4) in the sentence; (2) grammatical sub ject elements of person, organization, and location in the sentence; (3) grammatical sub ject elements of person, organization, and location in the previous sentences; (4) person, organization, and location elements other than (1)-(2) in the sentence. Acknowledgments This work was partially supported by the Grants-in-Aid for Young Scientists (B) (#18700241) from the Ministry of Education, Culture, Sports, Science and Technology, Japan. 4. EVALUATION AND DISCUSSION 6. REFERENCES [1] Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan. Identifying sources of opinions with conditional random fields and extraction patterns. In Proc. of the 2005 Human Language Technology Conf. and Conf. on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, B. C., 2005. [2] J. Wiebe, T. Wilson, and C. Cardie. Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation, 39(2-3):165­210, 2005. Table 1 and Table 2 list the evaluation results of a Japanese and English opinion analysis based on lenient (L) and strict (S) standards. Our group represented as "TUT". The results of opinion holder extraction were fair for Japanese, but not as good for English. We checked our author/authority classification results by holder types. In Japanese, although author & authority opinionated sentence estimation was still not matched straightforwardly from the estimation of holder types, they attained 0.8-0.9 precision based on lenient standards. In English, author opinionated sentence estimated from MPQA corpus 3 http://www.cs.pitt.edu/mpqa/databaserelease/ 842