Structural, Transitive and Latent Models for Biographic Fact Extraction Nikesh Garera and David Yarowsky Department of Computer Science, Johns Hopkins University Human Language Technology Center of Excellence Baltimore MD, USA {ngarera,yarowsky}@cs.jhu.edu Abstract This paper presents six novel approaches to biographic fact extraction that model structural, transitive and latent properties of biographical data. The ensemble of these proposed models substantially outperforms standard pattern-based biographic fact extraction methods and performance is further improved by modeling inter-attribute correlations and distributions over functions of attributes, achieving an average extraction accuracy of 80% over seven types of biographic attributes. 1 Introduction Extracting biographic facts such as "Birthdate", "Occupation", "Nationality", etc. is a critical step for advancing the state of the art in information processing and retrieval. An important aspect of web search is to be able to narrow down search results by distinguishing among people with the same name leading to multiple efforts focusing on web person name disambiguation in the literature (Mann and Yarowsky, 2003; Artiles et al., 2007, Cucerzan, 2007). While biographic facts are certainly useful for disambiguating person names, they also allow for automatic extraction of encylopedic knowledge that has been limited to manual efforts such as Britannica, Wikipedia, etc. Such encyploedic knowledge can advance vertical search engines such as http://www.spock.com that are focused on people searches where one can get an enhanced search interface for searching by various biographic attributes. Biographic facts are also useful for powerful query mechanisms such as finding what attributes are common between two people (Auer and Lehmann, 2007). Figure 1: Goal: extracting attribute-value biographic fact pairs from biographic free-text While there are a large quantity of biographic texts available online, there are only a few biographic fact databases available1 , and most of them have been created manually, are incomplete and are available primarily in English. This work presents multiple novel approaches for automatically extracting biographic facts such as "Birthdate", "Occupation", "Nationality", and "Religion", making use of diverse sources of information present in biographies. In particular, we have proposed and evaluated the following 6 distinct original approaches to this 1 E.g.: http://www.nndb.com, http://www.biography.com, Infoboxes in Wikipedia Proceedings of the 12th Conference of the European Chapter of the ACL, pages 300­308, Athens, Greece, 30 March ­ 3 April 2009. c 2009 Association for Computational Linguistics 300 task with large collective empirical gains: 1. An improvement to the Ravichandran and Hovy (2002) algorithm based on Partially Untethered Contextual Pattern Models 2. Learning a position-based model using absolute and relative positions and sequential order of hypotheses that satisfy the domain model. For example, "Deathdate" very often appears after "Birthdate" in a biography. 3. Using transitive models over attributes via co-occurring entities. For example, other people mentioned person's biography page tend to have similar attributes such as occupation (See Figure 4). 4. Using latent wide-document-context models to detect attributes that may not be mentioned directly in the article (e.g. the words "song, hits, album, recorded,.." all collectively indicate the occupation of singer or musician in the article. 5. Using inter-attribute correlations, for filtering unlikely biographic attribute combinations. For example, a tuple consisting of < "Nationality" = India, "Religion" = Hindu > has a higher probability than a tuple consisting of < "Nationality" = France, "Religion" = Hindu >. 6. Learning distributions over functions of attributes, for example, using an age distribution to filter tuples containing improbable - lifespan values. We propose and evaluate techniques for exploiting all of the above classes of information in the next sections. seeds of the semantic relationship of interest and learns contextual patterns such as " was born in " or " (born )" (Hearst, 1992; Riloff, 1996; Thelen and Riloff, 2002; Agichtein and Gravano, 2000; Ravichandran and Hovy, 2002; Mann and Yarowsky, 2003; Jijkoun et al., 2004; Mann and Yarowsky, 2005; Alfonseca et al., 2006; Pasca et al., 2006). There has also been some work on extracting biographic facts directly from Wikipedia pages. Culotta et al. (2006) deal with learning contextual patterns for extracting family relationships from Wikipedia. Ruiz-Casado et al. (2006) learn contextual patterns for biographic facts and apply them to Wikipedia pages. While the pattern-learning approach extends well for a few biography classes, some of the biographic facts like "Gender" and "Religion" do not have consistent contextual patterns, and only a few of the explicit biographic attributes such as "Birthdate", "Deathdate", "Birthplace" and "Occupation" have been shown to work well in the pattern-learning framework (Mann and Yarowsky, 2005; Alfonesca, 2006; Pasca et al., 2006). Secondly, there is a general lack of work that attempts to utilize the typical information sequencing within biographic texts for fact extraction, and we show how the information structure of biographies can be used to improve upon pattern based models. Furthermore, we also present additional novel models of attribute correlation and age distribution that aid the extraction process. 3 Approach 2 Related Work The literature for biography extraction falls into two major classes. The first one deals with identifying and extracting biographical sentences and treats the problem as a summarization task (Cowie et al., 2000, Schiffman et al., 2001, Zhou et al., 2004). The second and more closely related class deals with extracting specific facts such as "birthplace", "occupation", etc. For this task, the primary theme of work in the literature has been to treat the task as a general semantic-class learning problem where one starts with a few We first implement the standard pattern-based approach for extracting biographic facts from the raw prose in Wikipedia people pages. We then present an array of novel techniques exploiting different classes of information including partially-tethered contextual patterns, relative attribute position and sequence, transitive attributes of co-occurring entities, broad-context topical profiles, inter-attribute correlations and likely human age distributions. For illustrative purposes, we motivate each technique using one or two attributes but in practice they can be applied to a wide range of attributes and empirical results in Table 4 show that they give consistent performance gains across multiple attributes. 301 4 Contextual Pattern-Based Model P (r(p, q)|A1 pA2 qA3 ) = x,yr c(A1 xA2 yA3 ) c(A1 xA2 zA3 ) x,z A standard model for extracting biographic facts is to learn templatic contextual patterns such as "was born in" . Such templatic patterns can be learned using seed examples of the attribute in question and, there has been a plethora of work in the seed-based bootstrapping literature which addresses this problem (Ravichandran and Hovy, 2002; Thelen and Riloff, 2002; Mann and Yarowsky, 2005; Alfonseca et al., 2006; Pasca et al., 2006) Thus for our baseline we implemented a standard Ravichandran and Hovy (2002) pattern learning model using 100 seed2 examples from an online biographic database called NNDB (http://www.nndb.com) for each of the biographic attributes: "Birthdate", "Birthplace", "Deathdate", "Gender", "Nationality", "Occupation" and "Religion". Given the seed pairs, patterns for each attribute were learned by searching for seed pairs in the Wikipedia page and extracting the left, middle and right contexts as various contextual patterns3 . While the biographic text was obtained from Wikipedia articles, all of the 7 attribute values used as seed and test person names could not be obtained from Wikipedia due to incomplete and unnormalized (for attribute value format) infoboxes. Hence, the values for training/evaluation were extracted from NNDB which provides a cleaner set of gold truth, and is similar to an approach utilizing trained annotators for marking up and extracting the factual information in a standard format. For consistency, only the people names whose articles occur in Wikipedia where selected as part of seed and test sets. Given the attribute values of the seed names and their text articles, the probability of a relationship r(Attribute Name), given the surrounding context "A1 p A2 q A3 ", where p and q are and respectively, is given using the rote extractor model probability as in (Ravichandran and Hovy, 2002; Mann and Yarowsky 2005): The seed examples were chosen randomly, with a bias against duplicate attribute values to increase training diversity. Both the seed and test names and data will be made available online to the research community for replication and extension. 3 We implemented a noisy model of coreference resolution by resolving any gender-correct pronoun used in the Wikipedia page to the title person name of the article. Gender is also extracted automatically as a biographic attribute. 2 Thus, the probability for each contextual pattern is based on how often it correctly predicts a relationship in the seed set. And, each extracted attribute value q using the given pattern can thus be ranked according to the above probability. We tested this approach for extracting values for each of the seven attributes on a test set of 100 held-out names and report Precision, Pseudo-recall and Fscore for each attribute which are computed in the standard way as follows, for say Attribute "Birthplace (bplace)": Precisionbplace = # people with bplace correctly extracted # of people with bplace extracted # people with bplace correctly extracted # of people with bplace in test set Pseudo-recbplace = 2·Precision ·Pseudo-rec F-scorebplace = Precision bplace Pseudo-recbplace bplace + bplace Since the true values of each attribute are obtained from a cleaner and normalized person-database (NNDB), not all the attribute values maybe present in the Wikipedia article for a given name. Thus, we also compute accuracy on the subset of names for which the value of a given attribute is also explictly stated in the article. This is denoted as: # people with bplace correctly extracted Acctruth pres = # of people with true bplace stated in article We further applied a domain model for each attribute to filter noisy targets extracted from lexical patterns. Our domain models of attributes include lists of acceptable values (such as lists of places, occupations and religions) and structural constraints such as possible date formats for "Birthdate" and "Deathdate". The rows with subscript "RH02"in Table 4 shows the performance of this Ravichandran and Hovy (2002) model with additional attribute domain modeling for each attribute, and Table 3 shows the average performance across all attributes. 5 Partially Untethered Templatic Contextual Patterns The pattern-learning literature for fact extraction often consists of patterns with a "hook" and "target" (Mann and Yarowsky, 2005). For example, in the pattern " was born in ", "" is the hook and "" is the target. The disadvantage of this approach is that the intervening duallytethered patterns can be quite long and highly variable, such as " was highly influ- 302 Figure 2: Distribution of the observed document mentions of Deathdate, Nationality and Religion. ential in his role as ". We overcome this problem by modeling partially untethered variable-length ngram patterns adjacent to only the target, with the only constraint being that the hook entity appear somewhere in the sentence4 . Examples of these new contextual ngram features include "his role as " and `role as ". The pattern probability model here is essentially the same as in Ravichandran and Hovy, 2002 and just the pattern representation is changed. The rows with subscript "RH02imp " in tables 4 and 3 show performance gains using this improved templatic-pattern-based model, yielding an absolute 21% gain in accuracy. In the above equation, posnv is the absolute position ratio (position/length) and µA , A 2 are ^ ^ the sample mean and variance based on the sample of correct position ratios of attribute values in biographies with attribute A. Figure 2, for example, shows the positional distribution of the seed attribute values for deathdate, nationality and religion in Wikipedia articles, fit to a Gaussian distribution. Combining this empirically derived position model with a domain model6 of acceptable attribute values is effective enough to serve as a stand-alone model. Attribute Birthplace Birthdate Deathdate Gender Occupation Nationality Religion Best rank in seed set 1 1 2 1 1 1 1 P(Rank) 0.61 0.98 0.58 1.0 0.70 0.83 0.80 Table 1: Majority rank of the correct attribute value in the Wikipedia pages of the seed names used for learning relative ordering among attributes satisfying the domain model 6 Document-Position-Based Model 6.1 Learning Relative Ordering in the Position-Based Model One of the properties of biographic genres is that primary biographic attributes5 tend to appear in characteristic positions, often toward the beginning of the article. Thus, the absolute position (in percentage) can be modeled explicitly using a Gaussian parametric model as follows for choosing the best candidate value v for a given attribute A: v = argmaxvdomain(A) f (posnv |A) where, f (posnv |A) 2 = N (posnv ; µA , A ) ^ ^ 2 2 1 A A = ^ 2 e-(posnv -µ^ ) /2^ A 4 This constraint is particularly viable in biographic text, which tends to focus on the properties of a single individual. 5 We use the hyperlinked phrases as potential values for all attributes except "Gender". For "Gender" we used pronouns as potential values ranked according to the their distance from the beginning of the page. In practice, for attributes such as birthdate, the first text pattern satisfying the domain model is often the correct answer for biographical articles. Deathdate also tends to occur near the beginning of the article, but almost always some point after the birthdate. This motivates a second, sequence-based position model based on the rank of the attribute values among other values in the domain of the attribute, as follows: v = argmaxvdomain(A) P (rankv |A) where P (rankv |A) is the fraction of biographies having attribute a with the correct value occuring at rank rankv , where rank is measured according to the relative order in which the values belonging to the attribute domain occur from the beginning 6 The domain model is the same as used in Section 4 and remains constant across all the models developed in this paper 303 of the article. We use the seed set to learn the relative positions between attributes, that is, in the Wikipedia pages of seed names what is the rank of the correct attribute. Table 1 shows the most frequent rank of the correct attribute value and Figure 3 shows the distribution of the correct ranks for a sample of attributes. We can see that 61% of the time the first location mentioned in a biography is the individuals's birthplace, while 58% of the time the 2nd date in the article is the deathdate. Thus, "Deathdate" often appears as the second date in a Wikipedia page as expected. These empirical distributions for the correct rank provide a direct vehicle for scoring hypotheses, and the rows with "rel. posn" as the subscript in Table 4 shows the improvement in performance using the learned relative ordering. Averaging across different attributes, table 3 shows an absolute 11% average gain in accuracy of the position-sequence-based models relative to the improved Ravichandran and Hovy results achieved here. 7 Implicit Models Some of the biographic attributes such as "Nationality", "Occupation" and "Religion" can be extracted successfully even when the answer is not directly mentioned in the biographic article. We present two such models for doing so in the following subsections: 7.1 Extracting Attributes Transitively using Neighboring Person-Names Attributes such as "Occupation" are transitive in nature, that is, the people names appearing close to the target name will tend to have the same occupation as the target name. Based on this intution, we implemented a transitive model that predicts occupation based on consensus voting via the extracted occupations of neighboring names7 as follows: v = argmaxvdomain(A) P (v|A, Sneighbors ) where, P (v|A, Sneighbors ) = # neighboring names with attrib value v # of neighboring names in the article The set of neighboring names is represented as Sneighbors and the best candidate value for an attribute A is chosen based on the the fraction of neighboring names having the same value for the respective attribute. We rank candidates according to this probability and the row labeled "trans" in Table 4 shows that this model helps in subsantially improving the recall of "Occupation" and "Religion", yielding a 7% and 3% average improvement in F-measure respectively, on top of the position model described in Section 6. 7.2 Latent Model based on Document-Wide Context Profiles In addition to modeling cross-entity attribute transitively, attributes such as "Occupation" can also be modeled successfully using a documentwide context or topic model. For example, the distribution of words occuring in a biography Figure 3: Empirical distribution of the relative position of the correct (seed) answers among all text phrases satisfying the domain model for "birthplace" and "death date". 7 We only use the neighboring names whose attribute value can be obtained from an encylopedic database. Furthermore, since we are dealing with biographic pages that talk about a single person, all other person-names mentioned in the article whose attributes are present in an encylopedia were considered for consensus voting 304 Figure 4: Illustration of modeling "occupation" and "nationality" transitively via consensus from attributes of neighboring names of a politician would be different from that of a scientist. Thus, even if the occupation is not explicitly mentioned in the article, one can infer it using a bag-of-words topic profile learned from the seed examples. Given a value v, for an attribute A, (for example v = "Politician" and A = "Occupation"), we learn a centroid weight vector: Cv = [w1,v , w2,v , ..., wn,v ] where, wt,v = 1 N |A| tft,v · log |tA| attribute. Thus, the best value a is chosen as: v = argmaxv w1 ·w1,v +w2 ·w2,v +....+wn ·wn,v 2 w12 +w22 +...+wn 2 2 2 w1,v +w2,v +...+wn,v tft,v is the frequency of word t in the articles of People having attribute A = v |A| is the total number of values of attribute A |t A| is the total number of values of attribute A, such that the articles of people having one of those values contain the term t N is the total number of People in the seed set Given a biography article of a test name and an attribute in question, we compute a similar word weight vector C = [w1 , w2 , ..., wn ] for the test name and measure its cosine similarity to the centroid vector of each value of the given Tables 3 and 4 show performance using the latent document-wide-context model. We see that this model by itself gives the top performance on "Occupation", outperforming the best alternative model by 9% absolute accuracy, indicating the usefulness of implicit attribute modeling via broad-context word frequencies. This latent model can be further extended using the multilingual nature of Wikipedia. We take the corresponding German pages of the training names and model the German word distributions characterizing each seed occupation. Table 4 shows that English attribute classification can be successful using only the words in a parallel German article. For some attributes, the performance of latent model modeled via cross-language (noted as latentCL) is close to that of English suggesting potential future work by exploiting this multilingual dimension. It is interesting to note that both the transitive model and the latent wide-context model do not rely on the actual "Occupation" being explicitly mentioned in the article, they still outperform ex- 305 Occupation Physicist Singer Politician Painter Auto racing Physicist Singer Politician Painter Auto racing Weight Vector English German u a a Table 2: Sample of occupation weight vectors in English and German learned using the latent model. plicit pattern-based and position-based models. This implicit modeling also helps in improving the recall of less-often directly mentioned attributes such as a person's "Religion". Model Fscore Acc truth pres Ravichandran and Hovy, 2002 Improved RH02 Model Position-Based Model Combinedabove 3+trans+latent+cl Combined + Age Dist + Corr 8 Model Combination While the pattern-based, position-based, transitive and latent models are all stand-alone models, they can complement each other in combination as they provide relatively orthogonal sources of information. To combine these models, we perform a simple backoff-based combination for each attribute based on stand-alone model performance, and the rows with subscript "combined" in Tables 3 and 4 shows an average 14% absolute performance gain of the combined model relative to the improved Ravichandran and Hovy 2002 model. 0.37 0.54 0.53 0.59 0.62 (+24%) 0.43 0.64 0.75 0.78 0.80 (+37%) Table 3: Average Performance of different models across all biographic attributes similarly we can find positive and negative correlations among other attribute pairings. For implementation, we consider all possible 3-tuples of ("Nationality", "Birthplace", "Religion")8 and search on NNDB for the presence of the tuple for any individual in the database (excluding the test data of course). As an agressive but effective filter, we filter the tuples for which no name in NNDB was found containing the candidate 3-tuples. The rows with label "combined+corr" in Table 4 and Table 3 shows substantial performaance gains using inter-attribute correlations, such as the 7% absolute average gain for Birthplace over the Section 8 combined models, and a 3% absolute gain for Nationality and Religion. 9.2 Using Age Distribution 9 Further Extensions: Reducing False Positives Since the position-and-domain-based models will almost always posit an answer, one of the problems is the high number of false positives yielded by these algorithms. The following subsections introduce further extensions using interesting properties of biographic attributes to reduce the effect of false positives. 9.1 Using Inter-Attribute Correlations One of the ways to filter false positives is by filtering empirically incompatible inter-attribute pairings. The motivation here is that the attributes are not independent of each other when modeled for the same individual. For example, P(Religion=Hindu | Nationality=India) is higher than P(Religion=Hindu | Nationality=France) and Another way to filter out false positives is to consider distributions on meta-attributes, for example: while age is not explicitly extracted, we can use the fact that age is a function of two extracted attributes (-) and use the age distribution to filter out false positives for 8 The test of joint-presence between these three attributes were used since they are strongly correlated 306 Attribute BirthdateRH02 BirthdateRH02imp Birthdaterel. posn Birthdatecombined Birthdatecomb+age dist DeathdateRH02 DeathdateRH02imp Deathdaterel. posn Deathdatecombined Deathdatecomb+age dist BirthplaceRH02 BirthplaceRH02imp Birthplacerel. posn Birthplacecombined Birthplacecombined+corr OccupationRH02 OccupationRH02imp Occupationrel. posn Occupationtrans Occupationlatent OccupationlatentCL Occupationcombined NationalityRH02 NationalityRH02imp Nationalityrel. posn Nationalitytrans Nationalitylatent NationalitylatentCL Nationalitycombined Nationalitycomb+corr GenderRH02 GenderRH02imp Genderrel. posn Gendertrans Genderlatent GenderlatentCL Gendercombined ReligionRH02 ReligionRH02imp Religionrel. posn Religiontrans Religionlatent ReligionlatentCL Religioncombined Religioncombined+corr Prec 0.86 0.52 0.42 0.58 0.63 0.80 0.50 0.46 0.49 0.51 0.42 0.41 0.47 0.44 0.53 0.54 0.38 0.48 0.49 0.48 0.48 0.48 0.40 0.75 0.73 0.51 0.56 0.55 0.75 0.77 0.76 0.99 1.00 0.79 0.82 0.83 1.00 0.02 0.55 0.49 0.38 0.36 0.30 0.41 0.44 P-Rec 0.38 0.52 0.40 0.58 0.60 0.19 0.49 0.44 0.49 0.49 0.38 0.41 0.41 0.44 0.50 0.18 0.34 0.35 0.46 0.48 0.48 0.48 0.25 0.75 0.72 0.48 0.56 0.48 0.75 0.77 0.76 0.99 1.00 0.75 0.82 0.72 1.00 0.02 0.18 0.24 0.33 0.36 0.26 0.41 0.44 Fscore 0.53 0.52 0.41 0.58 0.61 0.30 0.49 0.45 0.49 0.50 0.40 0.41 0.44 0.44 0.51 0.27 0.36 0.40 0.47 0.48 0.48 0.48 0.31 0.75 0.71 0.49 0.56 0.51 0.75 0.77 0.76 0.99 1.00 0.77 0.82 0.77 1.00 0.04 0.27 0.32 0.35 0.36 0.28 0.41 0.44 Figure 5: Age distribution of famous people on the web (from www.spock.com) and . Based on the age distribution for famous people9 on the web shown in Figure 5, we can bias against unusual candidate lifespans and filter out completely those outside the range of 25-100, as most of the probability mass is concentrated in this range. Rows with subscript "comb + age dist" in Table 4 shows the performance gains using this feature, yielding an average 5% absolute accuracy gain for Birthdate. 10 Conclusion This paper has shown six successful novel approaches to biographic fact extraction using structural, transitive and latent properties of biographic data. We first showed an improvement to the standard Ravichandran and Hovy (2002) model utilizing untethered contextual pattern models, followed by a document position and sequence-based approach to attribute modeling. Next we showed transitive models exploiting the tendency for individuals occurring together in an article to have related attribute values. We also showed how latent models of wide document context, both monolingually and translingually, can capture facts that are not stated directly in a text. Each of these models provide substantial performance gain, and further performance gain is achived via classifier combination. We also showed how inter-attribution correlations can be 9 Since all the seed and test examples were used from nndb.com, we use the age distribution of famous people on the web: http://blog.spock.com/2008/02/08/age-distributionof-people-on-the-web/ Acc truth pres 0.88 0.67 0.93 0.95 1.00 0.36 0.59 0.86 0.86 0.86 0.42 0.45 0.48 0.48 0.55 0.26 0.48 0.50 0.50 0.59 0.54 0.59 0.27 0.81 0.78 0.49 0.56 0.48 0.81 0.84 0.76 0.99 1.00 0.75 0.82 0.72 1.00 0.06 0.45 0.73 0.48 0.45 0.22 0.76 0.79 Table 4: Attribute-wise performance comparison of all the models across several biographic attributes. modeled to filter unlikely attribute combinations, and how models of functions over attributes, such as deathdate-birthdate distributions, can further constrain the candidate space. These approaches collectively achieve 80% average accuracy on a test set of 7 biographic attribute types, yielding a 37% absolute accuracy gain relative to a standard algorithm on the same data. 307 References E. Agichtein and L. Gravano. 2000. Snowball: extracting relations from large plain-text collections. Proceedings of ICDL, pages 85­94. E. Alfonseca, P. Castells, M. Okumura, and M. RuizCasado. 2006. A rote extractor with edit distancebased generalisation and multi-corpora precision calculation. Proceedings of COLING-ACL, pages 9­16. J. Artiles, J. Gonzalo, and S. Sekine. 2007. The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. In Proceedings of SemEval, pages 64­69. S. Auer and J. Lehmann. 2007. What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. Proceedings of ESWC, pages 503­ 517. A. Bagga and B. Baldwin. 1998. Entity-Based CrossDocument Coreferencing Using the Vector Space Model. Proceedings of COLING-ACL, pages 79­ 85. R. Bunescu and M. Pasca. 2006. Using encyclopedic knowledge for named entity disambiguation. Proceedings of EACL, pages 3­7. J. Cowie, S. Nirenburg, and H. Molina-Salgado. 2000. Generating personal profiles. The International Conference On MT And Multilingual NLP. S. Cucerzan. 2007. Large-scale named entity disambiguation based on wikipedia data. Proceedings of EMNLP-CoNLL, pages 708­716. A. Culotta, A. McCallum, and J. Betz. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. Proceedings of HLT-NAACL, pages 296­303. E. Filatova and J. Prager. 2005. Tell me what you do and I'll tell you what you are: Learning occupationrelated activities for biographies. Proceedings of HLT-EMNLP, pages 113­120. M. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of COLING, pages 539­545. V. Jijkoun, M. de Rijke, and J. Mur. 2004. Information extraction for question answering: improving recall through syntactic patterns. Proceedings of COLING, page 1284. G.S. Mann and D. Yarowsky. 2003. Unsupervised personal name disambiguation. In Proceedings of CoNLL, pages 33­40. G.S. Mann and D. Yarowsky. 2005. Multi-field information extraction and cross-document fusion. In Proceedings of ACL, pages 483­490. A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. Proceedings of HLTNAACL companion volume, pages 70­72. M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. 2006. Organizing and searching the World Wide Web of Facts Step one: The One-Million Fact Extraction Challenge. Proceedings of AAAI, pages 1400­1405. D. Ravichandran and E. Hovy. 2002. Learning surface text patterns for a question answering system. Proceedings of ACL, pages 41­47. Y. Ravin and Z. Kazi. 1999. Is Hillary Rodham Clinton the President? Disambiguating Names across Documents. Proceedings of ACL. M. Remy. 2002. Wikipedia: The Free Encyclopedia. Online Information Review Year, 26(6). E. Riloff. 1996. Automatically Generating Extraction Patterns from Untagged Text. Proceedings of AAAI, pages 1044­1049. M. Ruiz-Casado, E. Alfonseca, and P. Castells. 2005. Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia. Proceedings of NLDB 2005. M. Ruiz-Casado, E. Alfonseca, and P. Castells. 2006. From Wikipedia to semantic relationships: a semiautomated annotation approach. Proceedings of ESWC. B. Schiffman, I. Mani, and K.J. Concepcion. 2001. Producing biographical summaries: combining linguistic knowledge with corpus statistics. Proceedings of ACL, pages 458­465. M. Thelen and E. Riloff. 2002. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of EMNLP, pages 14­21. N. Wacholder, Y. Ravin, and M. Choi. 1997. Disambiguation of proper names in text. Proceedings of ANLP, pages 202­208. C. Walker, S. Strassel, J. Medero, and K. Maeda. 2006. Ace 2005 multilingual training corpus. Linguistic Data Consortium. R. Weischedel, J. Xu, and A. Licuanan. 2004. A Hybrid Approach to Answering Biographical Questions. New Directions In Question Answering, pages 59­70. M. Wick, A. Culotta, and A. McCallum. 2006. Learning field compatibilities to extract database records from unstructured text. In Proceedings of EMNLP, pages 603­611. L. Zhou, M. Ticrea, and E. Hovy. 2004. Multidocument biography summarization. Proceedings of EMNLP, pages 434­441. 308