SIGIR 2007 Proceedings

Poster

More Efficient Parallel Computation of PageRank
John R. Wicks jwicks@cs.brown.edu Amy Greenwald amy@cs.brown.edu

Depar tment of Computer Science Brown University, Box 1910 Providence, RI 02912

Categories and Sub ject Descriptors: H.3.3 Information Search and Retrieval: Information filtering General Terms: Algorithms, Exp erimentation, Performance, Theory. Keywords: Web graph, Power iteration, Pagerank.

1.

INTRODUCTION

The first order, homogeneous, linear recurrence wn+1 = Awn + b occurs in various settings.PWhen A 1 < 1, it -1 is well-known that wn = An w0 + n=0 Aj b and wn  j P j j =0 A b, indep endent of w0 . This recurrence arises naturally when computing PageRank [4] via p ower iteration. Sp ecifically, given a web graph matrix, M  0, with "normalized" columns (i.e., each column sums to 1), a (normalized) p ersonalization vector, v  0, and a telep ortation probability, , define the p erturb ed Markov matrix, Mv, = (1 - )M + vJ, where J is a row of 1's. Power iteration takes an arbitrary, normalized initial vector, v0  0, computes rn+1 = Mv, rn , with r0 = v0 , and terminates when rn - rn-1 1 <  . Since Mv, and v0 are normalized, so is rn ,  n. This implies that rn+1 = (1 - )M rn + v, which is just the linear recurrence from ab ove with A = (1 - )M and b = v. Therefore, rn converges to r = P 0 [(1 - )M ]j v  M  v for any v0 . In particular, r j= is the unique p ositive, normalized eigenvector of Mv, with eigenvalue 1, which is the usual definition of PageRank. If we instead take the unnormalized initial vector, v0 = v, the partial sums, r n = Pn=0 [(1 - )M ]j v , give another sej quence converging to r . This sequence can b e computed by the pair of recurrence equations: tn+1 = (1 - )M tn and r n+1 = r n + tn , with t0 = r0 = v. Since M , v  0, the termination condition b ecomes simply  > rn - r n-1 1 = tn 1 = J [(1 - )M ]n v = (1 - )n . We refer to this modified algorithm as GeoRank. The computationally intensive step in b oth GeoRank and p ower iteration is the matrixvector multiplication, Aw, of the recurrence. Kamvar et al. [2] have observed that M is sparse, and when pages are group ed by top-level domain (TLD) name, the matrix is almost block diagonal, where the blocks corresp ond to TLD's. Kohlschutter et al. [3] represent the web ¨ graph as a block-structured matrix with relatively few large blocks, by merging groups of TLD blocks together. They exploit this block structure to distribute the computationCopyright is held by the author/owner(s). SIGIR'07, July 23­27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007.

ally intensive multiplication in each step of p ower iteration among a small numb er of processors. To illustrate, supp ose for simplicity that w is partitioned into three segments, wj , and A is partitioned into 9 corresp onding blocks, Ai,j , i, j = 0, . . . , 2. Three processors may then compute w = Av , as follows. Processor j stores Ai,j and wj , computes wi,j = Ai,j wj , i = 0, . . . , 2, sends ^ wi,j , i = j to processor i, and accumulates the results, ^ P wj = wj,j + i=j wj,i . Kohlschutter et al. combine this ^ ^ ¨ technique with p ower iteration to obtain a "parallel" computation of PageRank. Their key contribution was to observe that, since the off-diagonal blocks are sparse, the segments transmitted among the processors are sparse vectors. Since they are computationally equivalent, differing only in initial condition, we use the corresp onding distributed version of GeoRank as a proxy for Kohlschutter et al.'s ¨ algorithm in our exp eriments. As we will see, while these algorithms are distributed, they are not truly parallel, in that they do not scale particularly well as the numb er of processors increases. We present a modified algorithm, FastRank, which scales more efficiently.

2. THE FASTRANK ALGORITHM
Assume that M is partitioned into blocks, as describ ed ab ove, where each partition corresp onds to a union of TLD's. Define M0 to b e the block-diagonal matrix consisting of the diagonal blocks of M , and let M1 = M - M0 . That is, M0 consists (primarily) of links within any given TLD (intralinks), while M1 consists entirely of links b etween TLD's (interlinks). Multiplication by M = M0 + M1 is effectively multiplication by M0 plus multiplication by M1 . The former can b e p erformed in parallel, since M0 is block-diagonal. While the latter can b e distributed using the technique describ ed ab ove, this computation is not truly parallel. In particular, runtime does not decrease as 1/ (# of processors). Although the time to p erform each block multiplication decreases, the amount of data sent and received by each processor actually increases! Hence, we arrive at the main idea of this p oster: By reducing the number of M1 multiplications relative to M0 multiplications, we can increase the amount of computation done in paral lel, thus obtaining a more efficient algorithm to compute the PageRank vector, which we cal l FastRank. Expanding p owers of [(1 - )M ]j = [(1 - ) (M0 + M1 )]j , we may express r = P 0 [(1 - )M ]j v = M  v as the j= product of , v , and a sum over words in M i  (1 - )Mi . In P Q -1 other words, r = P 0 d{0,1}j j=0 M di v . Since M0 j= i

861


SIGIR 2007 Proceedings

Poster

dominates M1 , the terms with fewer M1 factors dominate the sum. Now we group terms according to the numb er of P j  M1 factors, using the fact that 1 M0  j =0 M 0 is the sum over arbitrary length words in M 0 only.
r= ,,1 Ť Ť ­ ,,1  M  + ... v 1 0+ 0 0 M M M " #  "1- 1- M   X = M0 I+ 1 M0 + . . . v = M0 j =0 ť 1 #j M  v 1 M0

Table 1: FastRank vs. GeoRank
# of Slaves 12 16 20 24 28 32 FastRank Time M0 M1 481 4.5 37 339 3.0 31 265 2.2 29 217 1.8 23 202 1.5 24 181 1.3 23 GeoRank Time M 784 25 659 21 641 20 601 19 596 19 604 19 Ratio 1.63 1.94 2.42 2.77 2.94 3.34

(1)

 1- M ^ 1 tn , and rn = PThus, s0 = v , tn = M0 sn , sn+1 = tj defines another sequence rn which converges to r . ^ j n Since r0 is precisely Kamvar's BlockRank [2], Equation 1 ^ illustrates nicely how BlockRank approximates PageRank. Like GeoRank, FastRank halts when |tn |1 <  . How  ever, we can compute tn+1 = M0 sn+1 = 1- M0 M1 tn only 1- M  to a desired tolerance. Since 0 M1 magnifies errors in tn by at most a factor of 1- , at each step we apply GeoRank with a tolerance of 1-  . More precisely, in our distributed implementation, we p erform GeoRank in parallel dim(Mj,j ) to within dim(M ) 1-  on the j th processor. Since the actual error magnification factor is most likely much smaller, this is probably an unnecessarily stringent termination condition. We have yet to obtain theoretical error estimates, but in exp eriments these stopping conditions achieved the desired accuracy. We exp ect that b etter error analysis will lead to more appropriate stopping conditions, fewer multiplications, and even faster p erformance.

3.

EXPERIMENTS

To see how FastRank and GeoRank compare in practice, we used the (decompressed) version of Stanford's web graph (http://webgraph.dsi.unimi.it/), from a 2001 crawl as part of its WebBase pro ject. This graph has on the order of 108 nodes and 109 links. We re-indexed the pages so that those within common TLD's were contiguous. We implemented b oth algorithms as distributed systems with one master and k slaves. The (normalized) web graph matrix and ranking vector were partitioned to resp ect TLD's, in the manner of the example in Section 1, so that the numP b er of links assigned to the j th slave, i<k nnz (Mi,j ), were approximately equal across slaves.1 We used a uniform dis1 tribution for v , i.e., vi = dim M ,  i. In the following exp eriments, each slave ran on a different Apple PowerPC G5 (3.0) with dual 2GHz processors and 2G of RAM, running OSX Server 10.4.7 in the Brown Internet Lab. Timings are in CPU-seconds and do not include time sp ent in loading the initial graph partitions from disk nor writing the final rankings to disk. First, consider a typical run of each algorithm with k = 20 slaves. Each partition was roughly 6 × 106 dimensional and, on average, nnz (Mj,j )  5 × 107 , while nnz (Mj, )  105 , which confirms that M0 dominates M1 . Likewise, on average, nnz (si )  3 × 105 , which confirms the sparsity of transmitted vectors noticed by Kohlschutter et al. [3]. ¨ FastRank converged at  = 10-3 by i = 3 in 265 sec. Each GeoRank call took, on average, 2.2 sec./mult., with 41 M0 multiplications at i = 0, 24 multiplications at i = 1, 11 multiplications at i = 2, and 2 multiplications at i = 3. Since ti  0 quickly, the numb er of M0 multiplications decreases rapidly. The remaining computational time was
1

sp ent p erforming just 3 M1 multiplications (i.e., computing si ). These calculations include not only matrix multiplications, but buffer allocation, socket i/o, and sum accumulation, as well; in total, this took, on average, 29 sec. iteration. In comparison, GeoRank converged by i = 31 in 641 sec. Thus, it required 31 multiplications by M , each of which took, on average, 20 sec. While the total numb er of multiplications is less for GeoRank (effectively, 31 multiplications each by M0 and M1 ) than FastRank (78 by M0 and 3 by M1 ), GeoRank required many more M1 multiplications, and thus took significantly longer. The average time to multiply by M (20 sec.) was less than multiplication by M0 and M1 (31.2 sec.), only b ecause the time sp ent by GeoRank doing buffer allocation was amortized over a greater numb er of iterations. Corresp onding individual timings were, in fact, comparable. Table 1 shows how the two algorithms compare as we vary the numb er of slaves with  fixed at 10-3 . The numb ers of M0 , M1 , and M multiplications were roughly indep endent of k. Notice that time p er M0 multiplication continued to decrease as k increased, since the size of dim Mj,j decreased and these multiplications were done in parallel. However, as the time sp ent in data transmission b egan to dominate, the cost of M1 and M multiplications leveled off. Thus, FastRank outp erformed GeoRank at an increasing rate.

4. CONCLUSIONS
By employing sufficiently many slaves, FastRank enables us to efficiently compute the PageRank vector. Coupled with the compression techniques of [1], it may b e feasible to do so for much larger graphs, such as the Deep Web [5]. Further numerical analysis is needed to determine precise error estimates for FastRank. As indicated at the end of Section 2, this may lead to even b etter p erformance.

5. REFERENCES
[1] P. Boldi and S. Vigna. Co des for the world-wide web. Internet Mathematics Journal, 2(4):405­427, 2005. [2] S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the blo ck structure of the web for computing pagerank. Technical rep ort, Stanford University Technical Rep ort, 2003. [3] C. Kohlschutter, P.-A. Chirita, and W. Nejdl. Efficient parallel ¨ computation of pagerank. In Advances in Information Retrieval, volume 3936 of Lecture Notes in Computer Science, pages 241­252. Springer, 2006. [4] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical rep ort, Stanford Digital Library Technologies Pro ject, 1998. [5] Y. Wang and D. DeWitt. Computing pagerank in a distributed internet search system. In Proceedings 2004 VLDB Conference: The 30th International Conference on Very Large Databases (VLDB), San Francisco, CA, USA, 2004. Morgan Kaufmann Publishers Inc.

nnz (M ) = numb er of non-zero entries of M

862