Typological Implications

In our paper, A Bayesian Model for Discovering Typological Implications, we present a Bayesian method for finding linguistic "universals." 3000+ of them won't fit in 8 pages, so we just gave a sample. Here are all the outputs of the different versions of our system. In the files, each line corresponds to a universal; the first ten lines are given as an example to explain each of the columns (I've truncated some numbers so that it fits better:

p(true) Feature 1 Feature 2 Value 1 Value 2 C11 C12 C21 C2 p(true) prob
0.943Indefinite ArticlesDefinite ArticlesNo definite or indefinite articleNo definite or indefinite article226001590.9430.835
0.941Definite ArticlesIndefinite ArticlesNo definite or indefinite articleNo definite or indefinite article226001590.9410.837
0.926Coding of EvidentialitySemantic Distinctions of EvidentialityNo grammatical evidentialsNo grammatical evidentials181001430.9260.841
0.925Semantic Distinctions of EvidentialityCoding of EvidentialityNo grammatical evidentialsNo grammatical evidentials181001430.9250.847
0.922Relationship between the Order of Object and Verb and the Order of Adposition and Noun PhraseOrder of Adposition and Noun PhraseOV and PostpositionsPostpositions4705503370.9220.780
0.920Order of Subject, Object and VerbOrder of Object and VerbSOVOV6203003920.9200.793
0.919Relationship between the Order of Object and Verb and the Order of Adposition and Noun PhraseOrder of Object and VerbOV and PostpositionsOV5242603370.9190.778
0.907Order of Adposition and Noun PhraseOrder of Genitive and NounPostpositionsGenitive-Noun35084233100.9070.720
0.903Order of Object and VerbRelationship between the Order of Object and Verb and the Order of Adposition and Noun PhraseOVOV and Postpositions5240263370.9030.730
0.903Order of Subject, Object and VerbRelationship between the Order of Object and Verb and the Order of Adposition and Noun PhraseSOVOV and Postpositions50821192560.9030.738

The columns are as follows. The first is the sorting column and is the probability that the "m" variable is one in the model. The next four columns show the features that give rise to this universal. For instance, the first two are what we refered to as "tautalogical" and simply have to do with the fact that the same feature occasionally occurs multiple times in the data set. The next four columns are the raw counts. If you look at the last line, we see that there are 508 languages which are both SOV and OV-Post; there are 21 languages that are SOV but not OV-Post; there are 19 that are not SOV but are OV-Post; and there are 256 languages that are neither SOV nor OV-Post. The p(true) is then repeated and the last column is the probability of the implication under a slightly different calculation (you can ignore it).

There are three versions of the data available, dependent on which of our three models you use: flat, hierarchical or distance-based. We also have a similar file for conditional implications. These are a bit less clean, but you can download them as well (the format is roughly the same, but now there are three features and three values listed. The interpretation is that the first two together imply the third).

If you use this data for research purposes, we'd appreciate that you cite the ACL07 paper until such time that there's a more complete version available.