A Method of Rating the Credibility of News Documents on the Web Ryosuke Nagura, Yohei Seki Toyohashi University of Technology Aichi, 441-8580, Japan Noriko Kando National Institute of Informatics Tokyo, 101-8430, Japan Masaki Aono Toyohashi University of Technology Aichi, 441-8580, Japan nagu@kde.ics.tut.ac.jp, seki@ics.tut.ac.jp ABSTRACT kando@nii.ac.jp 1. aono@ics.tut.ac.jp We propose a method to rate the credibility of news articles using three clues: (1) commonality of the contents of articles among different news publishers; (2) numerical agreement versus contradiction of numerical values reported in the articles; and (3) objectivity based on subjective speculative phrases and news sources. We tested this method on news stories taken from seven different news sites on the Web. The average agreement between the system-produced "credibility" and the manual judgments of three human assessors on the 52 sample articles was 69.1%. The limitations of the current approach and future directions are discussed. Commonality. The more news publishers delivered articles with similar content to the target article being assessed, the higher the credibility was rated. Numerical Agreement. Numerical expressions such as "100 passengers" or "three tracks" occur in news reports. When numerical expressions contradicted those in other articles from different news publishers, the credibility was rated lower. Objectivity. The credibility of articles containing subjective speculation was rated differently from those containing objective news sources. 2. 3. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval ­ Information Filtering, Selection Process 2.1 Commonality We defined the commonality among the contents of articles delivered from different news publishers to rate the credibility. To compute commonality, articles were divided into sentences. The heading of an article was treated as a sentence. We collected news articles published within an x-hour (x = 2 was used in this paper) period and computed the cosine similarities between sentences in all the articles from different news publishers. The dimensionality of term vectors in sentences was reduced to one third of its original value using LSI. If sentence similarities exceeded a threshold, we defined the sentences as "similar". If two articles from different news publishers contained similar sentences, we regarded those news articles as having higher commonality. To calculate the degree of commonality, a ratio that expressed the number of articles from different news publishers that contained similar contents was computed. We defined the trustworthiness of an article as the averaged similar content ratios for all sentences in the article. This was defined as: General Terms: Experimentation Keywords: Filtering Web Document Credibility and Information 1. INTRODUCTION When a user collects information from the Web, the information is not always correct. Many lies are placed on electronic bulletin boards or in blogs by malicious persons. In addition, online news articles may include incorrect information. To select credible information, Web users must filter out wrong information by themselves. The purpose of this study is to propose a method to rate the credibility of the information in news articles on the Web. Abdulla et al. [1] manually analyzed the credibility of online news. They found it was mainly measured by three dimensions: trustworthiness, currency and bias. Danielson [2] focused on the credibility of Web sites. In Google News [3], news items are ranked according to the reliability of the news publishers. Rubin et al. [4] proposed a four-dimensional analytical framework for certainty identification at the sentence level. "Certainty" is somehow related to "credibility", but was defined from the writer's viewpoint, whereas "credibility" related to the reader's judgment. We propose a method to rate the credibility of Web documents. We restrict our analysis to news stories on the Web. f ( n, t ) = (S k =1 n k /(t - 1)) n (1) Here, n is the number of sentences in an article. Sk is the number of news publishers with articles containing a sentence similar to the sentence k in the article. t is the number of all news publishers that published articles within the x-hour period. 2.2 Numerical Agreement To rate the credibility, we also focused on the agreement of numerical expressions such as "100 passengers" that appeared in the news articles. The combination of attributes related to numerical values like "passengers" and values were extracted using the Japanese syntax dependency analyzer CaboCha1. Then, 1 2. METRICS We defined three metrics to rate the credibility of news articles on the Web. The first two are combined into a "credibility score" and only the third assesses if an article is considered credible or not. Copyright is held by the author/owner(s). SIGIR'06, August 6­11, 2006, Seattle, Washington, USA. ACM 1-59593-369-7/06/0008. http://chasen.org/~taku/software/cabocha 683 the numerical expressions were compared among articles from different news publishers within an x-hour period. When numerical expressions agreed, we added a positive score. If the numerical expressions disagreed, we added a negative score. We defined the numerical agreement as: g (i, c, n) = i - (c 2 ) n combined them into 26 pairs that were published in the same time period. We then asked three assessors to compare credibilities. The agreement rates between our proposed method and the assessments of the three human assessors are shown in Table 2. Agreement rates were defined in the following expression: Agreement rates = # of agreements / # of article pairs Topic (# of article pairs assessed by three human assessors) Airplane Accident (2) Car Accident (2) Fujimori to Chile (5) Azerbaijan Election (4) Flood in China (2) Six-party Talks (5) Cabinet Shuffle (5) Injury of Photographer (1) Total: 26 pairs (3) Table 2. Agreement Rates between System and Assessors Agreement rates averaged on assessors 100.0 100.0 76.6 58.3 8.3 73.3 70.0 66.6 69.1 (macro average) (2) Here, i is the number of agreed numerical expressions, and c is the number of contradictory numerical expressions. We set the optimized weight of contradictions as 2 by changing parameters. When the added scores of (1) and (2) exceeded the threshold (= 0.5), we categorized them as candidates for credible articles. 2.3 Objectivity For candidates for credible articles, the credibility score was rated using a list of speculative clue phrases and the indication of the news sources in the articles. We defined a list of speculative clue phrases with four grade scores2 (see Table 1). All the sentences in the article were rated using the score of the speculative clue phrases they contained. The heading was also rated. When an interrogative expression appeared in the heading, the score of the heading was degraded, according to the context. Table 1. Clue Phrases (Originally Japanese) Score 4 (objective) 3 (somewhat speculative) 2 (very speculative) Terms [expressing-policy][guarantee-with] [tell][say][report][isn't it?][seem] [plan][convincing][expect][prospect][policy] [become][look-like][idea][attitude][information] [strongly-possible][highly-possible][motivation] [prospect][outlook][assume][affirmation][objective] [unclarity][subtlety] [hope][possibility][predict][plan][aim][maybe] "Flood in China" showed the lowest agreement rates because the news documents were not published within x hours (x = 2), and commonality/numerical agreement could not be taken into account. 4. CONCLUSION We have proposed a method to rate the credibility of news articles on the Web. In an experiment with three assessors, the average agreement between our proposed method and human assessments was 69.1%. Parameter tuning and using affirmation, such as positive/negative nuances among different news publishers, will be part of future work. As the proposed method uses commonality and agreement among news stories published prior to the target article, it successfully rates the credibility of "ordinary" reliable news as high and identifies unreliable news containing wrong information. However, it tends to rate a "scoop" rather lower when the method is applied to developing news stories. Investigating such currency [1] in the online environment is also future work. Moreover, the proposed method may retrospectively identify reliable "scoops," identifying the first news that shows high commonality and agreement with the news articles published after it. This could be useful for purposes such as news site rating. For the news source, we raised the objectivity score if news sources were given in the article. News sources were extracted using surface-level clues such as "from" or "according to" in the sentences. The objectivity score was raised according to the news source types and their frequencies. We categorized news source types into four: (a) news agencies and publishers, (b) government agencies, (c) police and (d) TV/radio. The appearance of news agencies such as "Associated Press" raised the objectivity higher than the appearance of "TV/radio". REFERENCES [1] Abdulla, R. A., Garrison, B., Salwen, M., Driscoll, P., and Casey, D. The Credibility of Newspapers, Television News, and Online News. In Proc. of the Association for Education in Journalism and Mass Communication Annual Convention, Miami Beach, FL, 2002. 3. EVALUATION For the experiment, 55,994 news stories on the Web (published from October to November, 2005) were collected every 30 minutes from seven news sites (four newspaper sites, one TV news site, one overseas news agency site, and one evening daily site; all stories were written in Japanese). From the collected articles, 52 articles were selected manually for human assessment. We manually categorized the 52 articles into 8 topic groups as shown in Table 2 and divided them as having high (4, 3) or low (2, 1) credibility scores calculated by our proposed method, then 2 [2] Danielson, D. R. Web Credibility. In Encyclopedia of Human Computer Interaction, pages 713­721. Idea Group Reference, 2005. [3] Ord, R. Google News Patent Application ­ Full Text [online], 2005 [cited 2005-12-26]. Available from: . [4] Rubin, V. L., Liddy, E. D., and Kando, N. Certainty Identification in Texts: Categorization Model and Manual Tagging Results. In Computing Attitude and Affect in Text: Theories and Applications, pages 61­74. Springer, Dordrecht, The Netherlands, 2005. The score "4" was set as "credible" and "1" as "not credible". 684