SIGIR 2007 Proceedings Poster More Efficient Parallel Computation of PageRank John R. Wicks jwicks@cs.brown.edu Amy Greenwald amy@cs.brown.edu Depar tment of Computer Science Brown University, Box 1910 Providence, RI 02912 Categories and Sub ject Descriptors: H.3.3 Information Search and Retrieval: Information filtering General Terms: Algorithms, Exp erimentation, Performance, Theory. Keywords: Web graph, Power iteration, Pagerank. 1. INTRODUCTION The first order, homogeneous, linear recurrence wn+1 = Awn + b occurs in various settings.PWhen A 1 < 1, it -1 is well-known that wn = An w0 + n=0 Aj b and wn j P j j =0 A b, indep endent of w0 . This recurrence arises naturally when computing PageRank [4] via p ower iteration. Sp ecifically, given a web graph matrix, M 0, with "normalized" columns (i.e., each column sums to 1), a (normalized) p ersonalization vector, v 0, and a telep ortation probability, , define the p erturb ed Markov matrix, Mv, = (1 - )M + vJ, where J is a row of 1's. Power iteration takes an arbitrary, normalized initial vector, v0 0, computes rn+1 = Mv, rn , with r0 = v0 , and terminates when rn - rn-1 1 < . Since Mv, and v0 are normalized, so is rn , n. This implies that rn+1 = (1 - )M rn + v, which is just the linear recurrence from ab ove with A = (1 - )M and b = v. Therefore, rn converges to r = P 0 [(1 - )M ]j v M v for any v0 . In particular, r j= is the unique p ositive, normalized eigenvector of Mv, with eigenvalue 1, which is the usual definition of PageRank. If we instead take the unnormalized initial vector, v0 = v, the partial sums, r n = Pn=0 [(1 - )M ]j v , give another sej quence converging to r . This sequence can b e computed by the pair of recurrence equations: tn+1 = (1 - )M tn and r n+1 = r n + tn , with t0 = r0 = v. Since M , v 0, the termination condition b ecomes simply > rn - r n-1 1 = tn 1 = J [(1 - )M ]n v = (1 - )n . We refer to this modified algorithm as GeoRank. The computationally intensive step in b oth GeoRank and p ower iteration is the matrixvector multiplication, Aw, of the recurrence. Kamvar et al. [2] have observed that M is sparse, and when pages are group ed by top-level domain (TLD) name, the matrix is almost block diagonal, where the blocks corresp ond to TLD's. Kohlschutter et al. [3] represent the web ¨ graph as a block-structured matrix with relatively few large blocks, by merging groups of TLD blocks together. They exploit this block structure to distribute the computationCopyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. ally intensive multiplication in each step of p ower iteration among a small numb er of processors. To illustrate, supp ose for simplicity that w is partitioned into three segments, wj , and A is partitioned into 9 corresp onding blocks, Ai,j , i, j = 0, . . . , 2. Three processors may then compute w = Av , as follows. Processor j stores Ai,j and wj , computes wi,j = Ai,j wj , i = 0, . . . , 2, sends ^ wi,j , i = j to processor i, and accumulates the results, ^ P wj = wj,j + i=j wj,i . Kohlschutter et al. combine this ^ ^ ¨ technique with p ower iteration to obtain a "parallel" computation of PageRank. Their key contribution was to observe that, since the off-diagonal blocks are sparse, the segments transmitted among the processors are sparse vectors. Since they are computationally equivalent, differing only in initial condition, we use the corresp onding distributed version of GeoRank as a proxy for Kohlschutter et al.'s ¨ algorithm in our exp eriments. As we will see, while these algorithms are distributed, they are not truly parallel, in that they do not scale particularly well as the numb er of processors increases. We present a modified algorithm, FastRank, which scales more efficiently. 2. THE FASTRANK ALGORITHM Assume that M is partitioned into blocks, as describ ed ab ove, where each partition corresp onds to a union of TLD's. Define M0 to b e the block-diagonal matrix consisting of the diagonal blocks of M , and let M1 = M - M0 . That is, M0 consists (primarily) of links within any given TLD (intralinks), while M1 consists entirely of links b etween TLD's (interlinks). Multiplication by M = M0 + M1 is effectively multiplication by M0 plus multiplication by M1 . The former can b e p erformed in parallel, since M0 is block-diagonal. While the latter can b e distributed using the technique describ ed ab ove, this computation is not truly parallel. In particular, runtime does not decrease as 1/ (# of processors). Although the time to p erform each block multiplication decreases, the amount of data sent and received by each processor actually increases! Hence, we arrive at the main idea of this p oster: By reducing the number of M1 multiplications relative to M0 multiplications, we can increase the amount of computation done in paral lel, thus obtaining a more efficient algorithm to compute the PageRank vector, which we cal l FastRank. Expanding p owers of [(1 - )M ]j = [(1 - ) (M0 + M1 )]j , we may express r = P 0 [(1 - )M ]j v = M v as the j= product of , v , and a sum over words in M i (1 - )Mi . In P Q -1 other words, r = P 0 d{0,1}j j=0 M di v . Since M0 j= i 861 SIGIR 2007 Proceedings Poster dominates M1 , the terms with fewer M1 factors dominate the sum. Now we group terms according to the numb er of P j M1 factors, using the fact that 1 M0 j =0 M 0 is the sum over arbitrary length words in M 0 only. r= ,,1 « « ­ ,,1 M + ... v 1 0+ 0 0 M M M " # "1- 1- M X = M0 I+ 1 M0 + . . . v = M0 j =0 » 1 #j M v 1 M0 Table 1: FastRank vs. GeoRank # of Slaves 12 16 20 24 28 32 FastRank Time M0 M1 481 4.5 37 339 3.0 31 265 2.2 29 217 1.8 23 202 1.5 24 181 1.3 23 GeoRank Time M 784 25 659 21 641 20 601 19 596 19 604 19 Ratio 1.63 1.94 2.42 2.77 2.94 3.34 (1) 1- M ^ 1 tn , and rn = PThus, s0 = v , tn = M0 sn , sn+1 = tj defines another sequence rn which converges to r . ^ j n Since r0 is precisely Kamvar's BlockRank [2], Equation 1 ^ illustrates nicely how BlockRank approximates PageRank. Like GeoRank, FastRank halts when |tn |1 < . How ever, we can compute tn+1 = M0 sn+1 = 1- M0 M1 tn only 1- M to a desired tolerance. Since 0 M1 magnifies errors in tn by at most a factor of 1- , at each step we apply GeoRank with a tolerance of 1- . More precisely, in our distributed implementation, we p erform GeoRank in parallel dim(Mj,j ) to within dim(M ) 1- on the j th processor. Since the actual error magnification factor is most likely much smaller, this is probably an unnecessarily stringent termination condition. We have yet to obtain theoretical error estimates, but in exp eriments these stopping conditions achieved the desired accuracy. We exp ect that b etter error analysis will lead to more appropriate stopping conditions, fewer multiplications, and even faster p erformance. 3. EXPERIMENTS To see how FastRank and GeoRank compare in practice, we used the (decompressed) version of Stanford's web graph (http://webgraph.dsi.unimi.it/), from a 2001 crawl as part of its WebBase pro ject. This graph has on the order of 108 nodes and 109 links. We re-indexed the pages so that those within common TLD's were contiguous. We implemented b oth algorithms as distributed systems with one master and k slaves. The (normalized) web graph matrix and ranking vector were partitioned to resp ect TLD's, in the manner of the example in Section 1, so that the numP b er of links assigned to the j th slave, i