Non-Parametric Modeling of Partially Ranked Data Abstract Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. 1 Introduction Rankers such as humans, search engines, and classifiers, output full or partial rankings representing preference relations over n items. The absence of numeric scores or the lack of calibration between existing numeric scores output by the rankers necessitates modeling rankings rather than numeric scores. To effectively analyze modern data, a statistical ranking model has the following desiderata. (1) (2) (3) (4) (5) Handle efficiently a very large number of items n by reverting to partial rather than full rankings. Probability assignment to full and partial rankings should be coherent and contradiction-free. Conduct inference based on training data consisting of partial rankings of different types. Correct retrieval of the underlying process as training data increases (statistical consistency). In the case of large n convergence of estimator to the underlying process can be extremely slow for fully ranked data but should be much faster when restricted simpler partial rankings. In this paper, we present a model achieving the above requirements without any parametric assumptions on the underlying generative process. The model is based on the non-parametric Parzen window estimator with a Mallows kernel on permutations. By considering partial rankings as censored data we are able to define the model on both full and partial rankings in a coherent and contradictionfree manner. Furthermore, we are able to estimate the underlying structure based on data containing partial rankings of different types. We demonstrate computational efficiency for partial rankings, even in the case of a very large n, by exploiting the combinatorial and algebraic structure of the lattice of partial rankings. We start below by reviewing basic concepts concerning partially ranked data (see [1] for further details) and the Mallows model and then proceed to define our non-parametric estimator. We conclude by demonstrating computational efficiency and some experiments. 2 Permutations and Cosets A permutation is a bijective function : {1, . . . , n} {1, . . . , n} associating with each item i {1, . . . , n} a rank (i) {1, . . . , n}. In other words, (i) denotes the rank given to item i and -1 (i) denotes the item assigned to rank i. We denote a permutation using the following vertical bar notation -1 (1)| -1 (2)| · · · | -1 (n). For example, the permutation (1) = 2, (2) = 3, (3) = 1 would be denoted as 3|1|2. In this notation the numbers correspond to items and the locations of the items in their corresponding compartments correspond to their ranks. The collection of all permutations of n items forms the non-Abelian symmetric group of order n, denoted by S n , using function composition as the group operation = . We denote the identity permutation by e. The concept of inversions and the result below, taken from [7], will be of great use later on. Definition 1. The inversion set of a permutation is the set of pairs U ( ) = {(i, j ) : i < j, (i) > (j )} {1, . . . , n} × {1, . . . , n} 1 def whose cardinality is denoted by i( ) = |U ( )|. For example, i(e) = || = 0, and i(3|2|1|4) = |{(1, 2), (1, 3), (2, 3)}| = 3. Proposition 1 (e.g., [7]). The map U ( ) is a bijection. When n is large, the enormous number of permutations raises difficulties in using the symmetric group for modeling rankings. A reasonable solution is achieved by considering partial rankings which correspond to cosets of the symmetric group. For example, the subgroup of S n consisting of all permutations that fix the top k positions is denoted S1,...,1,n-k = { Sn : (i) = i, i = 1, . . . , k }. The right coset S1,...,1,n-k = { : S1,...,1,n-k } is the set of permutations consistent with the ordering of on the k top-ranked items. It may thus be interpreted as a partial ranking of the top k items, that does not contain any information concerning the relative ranking of the bottom n - k items. The set of all such partial rankings forms the quotient space S n /S1,...,1,n-k . Figure 1 (left) displays the set of permutations that correspond to partial ranking of the top 2 out of 4 items. We generalize the concept above to arbitrary partial rankings using the concept of composition. Definition 2. A composition of n is a sequence = (1 , . . . , r ) of positive integers whose sum is n. Note that in contrast to a partition, in a composition the order of the integers matters. A composition = (1 , . . . , r ) corresponds to a partial ranking with 1 items in the first position, 2 items in the second position and so on. For such a partial ranking it is known that the first set of 1 items are to be ranked before the second set of 2 items etc., but no further information is conveyed about the orderings within each set. The partial ranking introduced earlier S1,...,1,n-k of the top k items is a special case corresponding to = (1, . . . , 1, n - k ). More formally, let N1 = {1, . . . , 1 }, N2 = {1 + 1, . . . , 1 + 2 }, · · · , Nr = {1 + · · · + r-1 + 1, . . . , n}. Then the subgroup S contains all permutations Sn for which the set equality (Ni ) = Ni holds for each i; that is, all permutations that only permute within each Ni . A partial ranking of type is equivalent to a coset S = { : S } and the set of such partial rankings forms the quotient space Sn /S . The permutation notation described above is particularly convenient for denoting partial rankings. We list items 1, . . . , n separated by vertical bars, indicating that items on the left side of each vertical bar are preferred to (ranked higher than) items on the right side of the bar. On the other hand, there is no knowledge concerning the preference of items that are not separated by one or more vertical bars. For example, the partial ranking displayed in Figure 1 (left) is denoted by 3|1|2, 4. In the notation above, the ordering of items not separated by a vertical line is meaningless, and for consistency we use the conventional ordering e.g., 1|2, 3|4 rather than 1|3, 2|4. The set of all partial rankings Wn = {S : Sn , } def def (1) which includes all full rankings Sn , is a subset of all possible partial orders on {1, . . . , n}. While the formalism of partial rankings in Wn cannot realize all partial orderings, it is sufficiently powerful to include many useful naturally occurring orderings as special cases. Furthermore, as demonstrated in later sections, it enables simplification of the otherwise overwhelming computational difficulty. Special cases include the following partial rankings. · Sn corresponds to permutation or a full ordering e.g. 3|2|4|1. · S1,n-1 where Sn , e.g. 3|1, 2, 4, corresponds to selection of the top alternative such as a multiclass classification. · S1,...,1,n-k where Sn , e.g. 1|3|2, 4, corresponds to top k ordering such as the ranked list of top webpages output by search engines. · Sk,n-k where Sn , e.g. 1, 2, 4|3, 5, corresponds to a more preferred and a less preferred dichotomy such as a multilabel classification. In the cases above, we often have a situation where n is large (or even approaching infinity as in the third example above) but k is of manageable size. Traditionally, data from each one of the special cases above was modeled using different tools and was considered fundamentally different. That problem was aggravated as different special cases were usually handled by different communities such as statistics, computer science, and information retrieval. We make an additional step by presenting a computationally efficient and coherent non-parametric estimator for partially ranked data. 2 In constructing a statistical model on permutations or cosets, it is essential to relate one permutation to another. We do this using a distance function on permutations d : Sn × Sn R that satisfies the usual metric function properties, and in addition is invariant under item relabeling or right action of the symmetric group [1] d( , ) = d( , ) , , Sn . There have been many propositions for such right-invariant distance functions, the most popular of them being Kendall's tau [3] d( , ) = n-1 i =1 >i l I ( -1 (i) - -1 (l)) (2) where I (x) = 1 for x > 0 and I (x) = 0 otherwise. Kendall's tau d( , ) can be interpreted as the number of pairs of items for which and have opposing orderings (called disconcordant pairs) or the minimum number of adjacent transpositions needed to bring -1 to -1 (adjacent transposition flips a pair of items having adjacent ranks). By right invariance, d( , ) = d( -1 , e) which, for Kendall's tau equals the number of inversions i( -1 ). This is an important observation that will allow us to simplify many expressions concerning Kendall's tau using results on permutation inversions from the combinatorics literature. 3 The Mallows Model and its Extension to Partial Rankings The Mallows model [5] is a simple model on permutations based on Kendall's tau distance using a location parameter and a scale parameter c (which we often treat as a constant) p ( ) = exp (-cd( , ) - log (c)) , Sn c R+ . (3) The normalization term doesn't depend on and has the closed form (c) = e-c d(,) = (1 + e-c )(1 + e-c + e-2c ) · · · (1 + e-c + · · · + e-(n-1)c ) Sn (4) as shown by the fact that d( , ) = i( -1 ) and the following proposition. n-1 j i( ) = j =1 k=0 q k . Proposition 2 (e.g., [7]). For q > 0, Sn q Model (3) has been motivated on axiomatic grounds by Mallows and has been a major focus of statistical modeling on permutations. A natural extension to partially ranked data is to consider a partial ranking as censored data equivalent to the set of permutations in its related coset: def p (S ) = p ( ) = -1 (c) exp (-c d( , )) . (5) S S Fligner and Verducci [2] have shown that in the case of = (1, . . . , 1, n - k ) the above summation has a closed form expression. However, the apparent absence of a closed form formula for more general partial rankings prevented the widespread use of the above model for large n and encouraged more ad-hoc and heuristic models [1, 6]. This has become especially noticeable due to a new surge of interest, especially in the computer science community, in partial ranking models for large n. The ranking lattice presented next enables extending Fligner and Verducci's closed form to a more general setting which is critical to the practicality of our non-parametric estimator. 4 The Ranking Lattice Partial rankings S relate to each other in a natural way by expressing more general, more specific or inconsistent ordering. We define below the concepts of partially ordered sets and lattices and then relate them to partial ranking by considering the set of partial rankings Wn as a lattice. Some of the definitions below are taken from [7], where a thorough introduction to posets can be found. Definition 3. A partially ordered set or poset (Q, ), is a set Q endowed with a binary relation s atisfying x, y , z Q (i) reflexibility: x x, (ii) anti-symmetry: x y and y x x = y , and (iii) transitivity: x y and y z x z. We write x y when x y and x = y . We say that y covers x when x y and there is no z Q such that x z y. A finite poset is completely described by its covering relation. The 3 1 Ranks Sfrag repla1 ents cem 2 3 4 1,2,3 Items: web pages movies labels etc. 1 2 1 2 3 3 4 4 2 3 4 1 1|2,3 2 1,2|3 1,3|2 2|1,3 3|1,2 2,3|1 PSfrag replacements S1,1,2 = {1 , 2 } = 3|1|2, 4 P asdf 1|2|3 1|3|2 2|1|3 3|1|2 2|3|1 3|2|1 Figure 1: A partial ranking corresponds to a coset or a set of permutations (left). The Hasse diagram of W3 . Some lines are dotted for 3D visualization purposes (right). planar Hasse diagram of (Q, ) is the graph connecting the elements of Q as nodes using edges that correspond to the covering relation. An additional requirement is that if y covers x then y is drawn higher than x. Two elements x, y are comparable if x y or y x and otherwise are incomparable. The set of partial rankings Wn defined in (1) is naturally endowed with the partial order of ranking refinement i.e. if refines or alternatively if we can get from to by dropping vertical lines [4]. Figure 1 (right) shows the Hasse diagram of W3 . A lower bound z of two elements in a poset x, y satisfies z x and z y. The greatest lower bound of x, y or infimum is a lower bound of x, y that is greater than or equal to any other lower boun{ of x, y . Infimum, { nd the analogous concept of supremum are denoted by x y and x y d a or x1 , . . . , xk } and x1 , . . . , xk } respectively. Two elements x, y Wn are consistent if there exists a lower bound in Wn . Note that consistency is a weaker relation than comparability. For example, 1|2, 3|4 and 1, 2|3, 4 are consistent but incomparable while 1|2, 3|4 and 2|1, 3|4 are both inconsistent and incomparable. Using the vertical bar notation, two elements are inconsistent iff there exists two items i, j that appear on opposing sides of a vertical bar in x, y i.e. x = · · · i|j · · · while y = · · · j |i · · · . A poset for which and always exist is called a lattice. Lattices satisfy many useful combinatorial properties - one of which is that they are completely described by the and operations. While the ranking poset is not a lattice, it may be turned into one by augmenting it with a minimum element ^. 0 ~ def ^ Proposition 3. The union Wn = Wn {0} of the ranking poset and a minimum element is a lattice. ~ Proof. Since Wn is finite, it is enough to show existence of , for pairs of elements [7]. We begin by showing existence of x y . If x, y are inconsistent, there is no lower bound in W n and therefore 0 the unique lower bound ^ is also the infimum x y . If x, y are consistent, their infimum may be obtained as follows. Since x and y are consistent, we do not have a pair of items i, j appearing as i|j in x and j |i in y . As a result we can form a lower bound z to x, y by starting with a list of numbers and adding the vertical bars that are in either x or y , for example for x = 3|1, 2, 5|4 and y = 3|2|1, 4, 5 we have z = 3|2|1, 5|4. The resulting z Wn , is smaller than x and y since by construction it contains all preferences (encoded by vertical bars) in x and y . It remains to show z z that for every other lower bound z to x and y we have z . If z is comparable to z , z i since removing any vertical bar from z results in an element that is not a lower bound. If z s not comparable to z , then both z , z contain the vertical bars in x and vertical bars in y possibly with some additional ones. By construction z contains only the essential vertical bars to make it a lower z bound and hence z , contradicting the assumption that z , z are non-comparable. By Proposition 3.3.1 of [7] a poset for which an infimum is always defined and that has a supremum ele~ ent is m ~ necessarily a lattice. Since we just proved that always exists for Wn and 1, . . . , n = Wn , the proof is complete. 5 Non-Parametric Models on the Ranking Lattice The censored data approach to partial ranking described by Equation (5) may be generalized to ~ arbitrary probability models p on Sn . Extending a probability model p on Sn to Wn by defining it 4 1, . . . , n S S 1, . . . , n S S PSfrag replacements PSfrag replacements ^ 0 ^ 0 ~ Figure 2: Censored data in the Hasse diagram of Wn corresponding to two partial rankings with the same (left) and different (right) number of vertical bars. The two big triangles correspond to the Hasse diagram of Figure 1 (right) with permutations occupying the bottom level. ~ to be zero on Wn \ Sn and considering the partial ranking model ~ g (S ) = p( ) = p( ), S Wn . S Sn : S (6) The function g , when restricted to partial rankings of the same type G = {S : Sn } constitutes a distribution over G. The relationship between p and g may be more elegantly described ~ ¨ through Mobius inversion on lattices: for the functions p, g : Wn [0, 1] defined above we have ~ g ( ) = p( ) iff p( ) = g ( )µ( , ) , Wn (7) ~ ~ ~ ¨ where µ : Wn × Wn R is the Mobius function of the lattice Wn [7]. For large n, modeling partial, rather than full, rankings is a computational necessity. It is tempting to construct a statistical model on partial rankings directly without reference to an underlying permutation model, e.g. [1, 6]. However, doing so may lead to contradicting probabilities in the permutation levels i.e. there exists no distribution p on Sn consistent with the specified values of g at g (S ) and g (S ), = . Figure 2 illustrates this problem for partial rankings with the same (left) and different (right) number of vertical bars. Verifying that no contradictions exist involves solving a lengthy and complicated set of equations. The alternative presented in this paper of starting with a ¨ permutation model p : Sn R and extending it to g via the Mobius inversion is a simple way of avoiding such lack of coherence. Identifying partially ranked training data D = {Si i : i = 1, . . . , m} as censored data, a nonparametric Parzen window estimator based on the Mallows kernel is p( ) = ^ im 1 1 m (c) =1 |Si | exp(-cd( , )) Si i Sn (8) where we used the fact that |Si i | = |Si e| = |Si |, or its censored data extension g (S ) = ^ im 1 1 m (c) =1 |Si | exp(-cd(, )) ~ S Wn . (9) S Si i Model (8) and its partial ranking extension (9) satisfy requirement 3 in Section 1 since D contains partial rankings of possibly different types. Similarly, by the censored data interpretation of partial rankings, they satisfy requirement 2. Requirement 4 holds as m, c by standard properties of the Parzen window estimator. Requirement 5 holds since g in (9) restricted to G = {S : Sn } ^ becomes a consistent model on a much smaller probability space. Requirement 1 is demonstrated in the next section by deriving an efficient computation of (9). In the case of a very large number of items reverting to partial ranking of type is a crucial element. The coherence between p, g and ^^ the nature of D are important factors in modeling partially ranked data. In the next section we show that even for n (as is nearly the case for web-search), the required computation is feasible as it depends only on the complexity of the composition characterizing the data D and the partial rankings on which g is evaluated. ^ 5 6 Efficient Computation and Inversion Combinatorics In order to apply the estimators p, g in practice, it is crucial that the inner summations in Equa^^ tions (8)-(9) are efficiently computed. By considering how the pairs constituting i( ) decompose with respect to certain cosets we can obtain efficient computational schemes for (8),(9). Proposition 4. The following decomposition of i( ) with respect to a composition holds i( ) = def a ( ) = k kr a ( ) k + =1 kr l r bl ( ) k k-1 j =1 k-1 j =1 -1 Sn where jk jl . (10) ( =1 =k+1 (s, t) : s < t , j < (t) < -1 (s) j 11) =1 bl ( ) = k def (s, t) : s < t , j < -1 (t) jk j =1 l j-1 j < -1 (s) j (12) =1 =1 Proof. First note that by the right invariance of Kendall's tau d( , ) = i( -1 ), we have i( ) = i( -1 ) and we may decompose i( -1 ) instead of i( ). The set appearing in the definition of a ( ) k contains all label pairs (s, t) that are inversions of -1 and that appear in the k -compartment of the decomposition . The set appearing in the definition of bkl ( ) contains label pairs (s, t) that are inversions of -1 and for which s and t appear in the l and k compartments of respectively. Since any inversion pair appear in either one or two compartments, the decomposition holds. Decomposition (10) is actually a family of decompositions as it holds for all possible compositions . For example, i( ) = 4 for = 4|1|3|2 S4-2 = 1, 4|2, 3, with inversions (4, 1), (4, 3), (4, 2), (3, 2) for -1 . The first compartment 1, 4 contains the inversion (4, 1) and so a ( ) = 1. The second compartment 2, 3 contains the inversion (3, 2) and so a ( ) = 1. The cross 1 2 compartment inversions are (4, 3), (4, 2) making b2 ( ) = 2. The significance of (10) is that as we 1 sum over all representatives of the coset S the cross compartmental inversions bl ( ) remain k constant while the within-compartmental inversions a ( ) vary over all possible combinations. This k leads to powerful extensions of Proposition 2 which in turn leads to efficient computation of (8), (9). Proposition 5. For Sn , q > 0, and a composition we have Proof. q i( ) = q S r k=1 r l=k+1 bl ( ) k s r js -1 k j =1 =1 qk . (13) =0 q i( ) = S q S r r r k=1 a ( )+ k r k=1 r l=k+1 bl ( ) k =q r k=1 r l=k+1 bl ( ) k q S r k=1 a ( ) k =q k=1 l=k+1 bl ( ) k sr =1 Ss q i( ) = q r k=1 r l=k+1 bl ( ) k s r js -1 k j =1 =1 qk . =0 Above, we used two ideas: (i) disconcordant pairs between two different compartments of the coset S are invariant under change of the coset representative, and (ii) the number of disconcordant pairs within a compartment varies over all possible choices enabling the replacement of the summation by a sum over a lower order symmetric group. An important feature of (13) is that only the first and relatively simple term q k=1 l=k+1 bkl () depends on . The remaining terms depend only on the type of partial ranking and thus may be pre-computed and tabulated for efficient computation. The following two corollaries generalize the well known Proposition 2 to arbitrary cosets enabling efficient computation of (8), (9). r r r s -1 j i( ) k = q k=1 l=k+1 bkl () s=1 j =1 Corollary 1. Sn . S q k=0 q 6 r r (1, n - 1) (1, · · · , 1, n - k) (k, n - k) (1, n - 1) O(1) O(k) O(k) (1, · · · , 1, n - t) O(1) O(k + t) O(k + t) (t, n - t) O(1) O(k + t) O(k + t) Table 1: Computational complexity for computing Equation (9) for each training example. Notice that the absence of dependence on n. Proof. Using group theory, it can be shown that the set equality (S ) = S ( ) holds. As a i( ) i( ) result, = . Proposition 5 completes the proof. S ( ) q S q Corollary 2. The partial ranking extension g corresponding to a Mallows model p is s -1 j r -kc r r r r -1 -1 k=0 e j =1 s=1 e-c k=1 l=k+1 bkl ( ) e-c k=1 l=k+1 bkl ( ) p (S ) = n-1 j -kc k=0 e j =1 Proof. Using Corollary 1 we have -1 )) S exp(-c d( , )) S exp(-c i( g (S ) = p ( ) = = n-1 j -kc Sn exp(-c d( , )) j =1 k =0 e S s -1 j r -1 i( ) -kc r r S (exp(-c)) k =0 e j =1 s=1 -c k=1 l=k+1 bl ( -1 ) k = =e n-1 j n-1 j -kc -kc j =1 j =1 k=0 e k=0 e Despite its daunting appearance, the expression in Corollary 2 can be computed relatively easily. The fraction does not depend on or and in fact may be considered as a normalization constant that may be easily pre-computed and tabulated. The remaining term is relatively simple and depends on the location parameter and the coset representative . Corollary 2 and Proposition 6 below (whose proof is omitted due to lack of space), provide efficient computation for the estimators (8), (9). The complexity of computing (14) and (8), (9) for some popular partial ranking types appear in Table 1. Proposition 6. kr l r s r s -1 k j j e-c d(, ) = e-kc . (14) e-c bkl ( ) S 1 S 2 - 1 2 1 S =1 =k+1 =1 =1 =0 7 Applications Figure 3 (top left) compares the average test log-likelihood between the Mallows model and the nonparametric model with different c as a function of training size averaged over 10 cross validations. We use fully ranked APA election data (rankings are ballots for five APA presidential candidates), and during each iteration, 30% of the examples are randomly selected for testing. The parameters of the mallows model are estimated by maximum likelihood. The figure illustrates the advantage of using a non-parametric estimator over the parametric Mallows model given enough training data. Also note how when c increases, the non-parametric model approaches the empirical histogram thus performing worse for small datasets and better for large datasets. To visualize the advantage of the non-parametric model over the Mallows model we display in Figure 3 (bottom row) their estimated probabilities by scaling the vertices of the permutation polytope proportionally. The displayed polytope has vertices corresponding to rankings of 4 items and whose edges correspond to an adjacent transposition (Kendall's tau distance is the shortest path between two vertices). In this case the four ranked items are movies no. 357, 1356, 440, 25 from the EachMovie dataset containing rankings of 1628 movies. Note how the probabilities assigned by the Mallows model (left) form a unimodal function centered at 2|1|3|4 while the non-parametric estimator (right) discovers the true modes 2|3|1|4 and 4|1|2|3 that were undetected by the Mallows model. Figure 3 (top right) demonstrates modeling partial rankings of a much larger n. We used 10043 rankings from the Jester dataset which contains users rankings of n = 100 jokes. We kept the partial 7 ranking type of the testing data fixed at (5, n - 5) and experimented with different censoring of the training data. The figure illustrates the slower consistency rate for fully ranked training data and the statistical benefit in censoring full rankings in the training data. This striking statistical advantage demonstrates the achievement of property 5 in Section 1 and is independent of the computational advantage obtained from censoring the training data. -4.65 -17.5 -4.7 -18 (k(6),n-k(6)) (k(0),n-k(0)) (1,1,n-2) (k(8),n-k(8)) (1,n-1) (1,1,1,n-3) (1,1,1,1,1,n-5) fully ranked -18.5 average log-likelihood -4.75 average log-likelihood ... -20 -40 -60 -4.8 -4.85 mallows c=1 c=2 c=5 -4.9 800 1600 2400 3200 4000 -80 1000 2000 3000 2341 # of samples 2341 3241 2431 2314 3214 3421 4321 2143 4213 3124 3412 1234 4312 3142 1324 4123 4132 1342 1432 1423 4132 1342 1432 1243 4123 4312 3142 3412 4213 3124 4231 2413 2134 4321 3421 4231 3214 3241 4000 # of samples 5000 6000 7000 2431 2314 2413 2134 2143 1234 1324 1243 1423 Figure 3: Top row: Average test log-likelihood as a function of the training size: Mallows model vs. non-parametric model for APA election data (left) and non-parametric model with different partial ranking types for Jester data (right). Bottom row: Visualizing estimated probabilities for EachMovie data by permutation polytopes: Mallows model (left) and non-parametric model for c = 2 (right). 8 Discussion In this paper, we demonstrate for the first time a non-trivial effective modeling framework satisfying properties 1-5 in Section 1. The key component is our ability to efficiently compute (14) for simple partial ranking types and large n. Table 1 indicates the resulting complexity scales up with complexity of the composition k but is independent of n which is critical for modeling practical situations of k n partial rankings. Experiments show the statistical advantage of the non-parametric partial ranking modeling in addition to its computational feasibility. References [1] D. E. Critchlow. Metric Methods for Analyzing Partially Ranked Data. Lecture Notes in Statistics, volume 34, Springer, 1985. [2] M. A. Fligner and J. S. Verducci. Distance based ranking models. Journal of the Royal Statistical Society B, 43:359­369, 1986. [3] M. G. Kendall. A new measure of rank correlation. Biometrika, 30, 1938. [4] G. Lebanon and J. Lafferty. Conditional models on the ranking poset. In Advances in Neural Information Processing Systems, 15, 2003. [5] C. L. Mallows. Non-null ranking models. Biometrika, 44:114­130, 1957. [6] J. I. Marden. Analyzing and modeling rank data. CRC Press, 1996. [7] R. P. Stanley. Enumerative Combinatorics, volume 1. Cambridge University Press, 2000. 8