~ Approximate Distance Oracles for Unweighted Graphs in O(n2 ) time Surender Baswana Abstract Let G(V , E ) be an undirected weighted graph with |V | = n, |E | = m. Recently Thorup and Zwick introduced a remarkable data-structure that stores allpairs approximate distance information implicitly in o(n2 ) space, and yet answers any approximate distance query in constant time. They named this data-structure approximate distance oracle because of this feature. Given an integer k > 1, a (2k - 1)-approximate distance oracle requires O(k n1+1/k ) space and answers a (2k - 1)approximate distance query in O(k ) time. Thorup and Zwick showed that a (2k - 1)-approximate distance oracle can be computed in O(k mn1/k ) time, and posed the following question : Can (2k - 1)-approximate ~ distance oracle be computed in O(n2 ) time ? In this paper, we answer their question in affirmative for unweighted graphs. We present an algorithm that computes (2k - 1)-approximate distance oracle for ~ a given unweighted graph in O(n2 ) time. One of the new ideas used in the improved algorithm also leads to the first linear time algorithm for computing an optimal size (2, 1)-spanner of an unweighted graph. 1 Intro duction Computing all-pairs shortest paths (APSP) in a graph is one of the most fundamental algorithmic graph problem. ~ There exist classical algorithms that require O(mn) time for solving this problem, where m and n are respectively the number of edges and vertices in the given graph. There also exist algorithms based on fast matrix multiplication that achieve sub-cubic time. In particular, using the fastest known matrix multiplication algorithms [3]), the best bound for computing all-pairs shortest paths is O(n2.575 ) [11]. However, there is still no combinatorial algorithm that could achieve O(n3- ) running time for APSP problem. In recent past, many simple combinatorial algorithms have been designed that compute all-pairs approximate shortest paths (APASP) for undirected Dept. of Comp. Sc. and Engg., I.I.T. Delhi, Hauz Khas, New Delhi. Work was supported in part by a fellowship from Infosys Technologies Ltd., Bangalore. Email : sbaswana@cse.iitd.ernet.in Dept. of Comp. Sc. and Engg., I.I.T. Delhi, Hauz Khas, New Delhi. Email : ssen@cse.iitd.ernet.in Sandeep Sen graphs. These algorithms achieve significant improvement in the running time compared to those designed for APSP, but the distance reported has some additive or/and multiplicative error. An algorithm is said to compute all pairs -approximate shortest paths with additive error , if for each pair of vertices u, v V , the distance reported is bounded by (u, v ) + , where (u, v ) denotes the actual distance between u and v . For unweighted graphs, Dor et al. [4] developed an algo1 1 ~ rithm that requires O(k n2- k m k ) time to compute allpairs shortest paths with additive error 2(k - 1). They also showed that all-pairs 3-approximate shortest paths ~ can be computed in O(n2 ) time. Cohen and Zwick [2] later extended this result to weighted graphs. The output of all these algorithms is an n × n matrix that stores pairwise approximate distance for all the vertices explicitly to answer each distance query in constant time. An equally important measure of efficiency of an algorithm is its space requirement. In the context of APASP problem, therefore, the following question arises : Is there an implicit way of storing al l-pairs approximate distance information that takes sub-quadratic space and stil l answers any query in O(1) time ? Thorup and Zwick [9] gave a novel algorithm that achieves simultaneous improvement in running time (sub-cubic) as well as space (sub-quadratic), and still answers any approximate distance query in constant time. For any k 1, their algorithm runs in O(k mn1/k ) time to output a data-structure of size O(k n1+1/k ) that would answer any (2k - 1)-approximate distance query in O(k ) time. They named the data-structure approximate distance oracle. The space requirements of these approximate distance oracles are optimal for k = 1, 2, 3, 5, and are conjectured to be optimal for any value of k . Therefore, there are two efficient algorithms for computing all-pairs 3-approximate shortest paths. The ~ first algorithm [2] runs in O(n2 ) time and outputs a matrix of size (n2 ) that stores pairwise 3-approximate distances. The second algorithm [9] runs in O(m n) time and computes a 3-approximate distance oracle of size O(n3/2 ). Thorup and Zwick posed the following question : Can a 3-approximate distance oracle of size ~ O(n3/2 ) be computed in O(n2 ) time? Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 271 1.1 Our contribution This paper answers the question of Thorup and Zwick in affirmative for unweighted graphs. In fact, for any k 1, we design an algorithm that takes O(min(n2 ln n, k mn1/k ) time to compute a (2k - 1)-approximate distance oracle of size O(k n1+1/k ) for a given unweighted graph. Another result of this paper is the first linear time algorithm for computing a (2, 1)-spanner of (optimal) size O(n3/2 ). For an unweighted graph G(V , E ), a sub-graph G(V , E ), E E is said to be an (, )spanner if for each pair of vertices u, v V , the distance (u, v ) between u and v in the sub-graph is at-most ( · (u, v ) + ). Previously existing linear time algorithms [1, 8] compute spanners of stretch three or more. Elkin and Peleg [6] presented algorithm for computing (1 + , ) spanner, but with O(n2+µ ) running time. 1.2 An overview of the previous algorithm and new techniques To give an intuition of the existing algorithm, and an exposition to the new techniques used, we first address 3-approximate distance oracle. For a given undirected graph, storing distance information from each vertex to all the vertices requires (n2 ) space. To achieve sub-quadratic space, the following simple idea comes to mind. From each vertex, if we store distance information I : to a small number of vertices, can we stil l be able to report distance between any pair of vertices ? Thorup and Zwick [9] showed that the above idea can indeed be realized but at the expense of reporting approximate, instead of exact, distance as an answer to a distance query. Based on the idea I , the scheme for computing 3-approximate distance oracle is outlined as follows. 1. Let S be a small set of o(n) vertices. For each vertex u V , store the distances to all the vertices of the sample set S . 2. For each vertex u V , let nu be the vertex nearest to u among all the vertices of set S , and let N (u) be the set of all the vertices of the graph G that lie nearer to u than the vertex nu . Store the vertices of set N (u) along with their distance from u. Let u, v V be any two vertices whose intermediate distance is to be determined approximately. If either u or v belong to set S , we can report exact distance between the two. Otherwise also exact distance (u, v ) will be reported if v lies in N (u) or vice versa. The only case, that is left, is when neither v N (u) nor u N (v ). In this case, we report (u, nu ) + (v , nu ) as approximate distance between u and v . This distance is nu Sampled vertices u v N(u) Figure 1: v is farther to u than nu , bounding (nu , v ) using triangle inequality bounded by 3 (u, v ) using triangle inequality as shown below (see Figure 1). (u, nu ) + (v , nu ) (u, nu ) + ( (v , u) + (u, nu )) = 2 (u, nu ) + (u, v ) {since graph is undirected } 2 (u, v ) + (u, v ) = 3 (u, v ) {since v is farther to u than nu } The problem is to ensure a sub-quadratic bound on the size of this oracle. In particular, we have to show that the number of vertices in set N (u) would be o(n). Thorup and Zwick [9] show that if set S is a set of n vertices chosen unifo ly from the graph, we can rm indeed get a bound of O( n) on expected size of N (u), and thus get a 3-approximate oracle of expected O(n3/2 ) size. The computation of 3-approximate distance oracle thus involves two tasks. The first task is to compute the vertices that form the set N (u) and their distances from u, for each u. The second task is to compute distance from each vertex of set S to all the vertices in the graph. Thorup and Zwick [9] present O(m n) time algorithms for each of the two tasks. It can be seen that the running ~ time is not O(n2 ) for dense graphs. We show that in addition to a uniformly chosen n vertices, if the set S also contains dominating set for a the vertices with degree more than n, the first task can ~ be performed in O(n2 ) time provided that the graph is unweighted. The second task involves computation of shortest paths from each vertex of S to all the vertices. Note that the computation of shortest paths from a vertex to all the vertices takes (m) time, therefore it appears that the second task would require at-least (m|S |) time, which could be (n2.5 ) for dense graphs. To perform the second task in O(n2 ) time for dense graphs, one approach could be to first compute a sparse spanner of 272 the graph with at-most n3/2 edges, and then compute distance from each u S to all the vertices in this sparse spanner (instead of the original graph). This approach has been used by Elkin in his work [5]. However, from the description given above, to get a bound of 3 (u, v ) on the distance reported between u and v , it is crucial that the distance between u and the vertex nu must be preserved in the spanner. But sparseness of a spanner is achieved at the expense of stretching the distance between vertices. Therefore, in order to improve the running time of the second task so as to compute ~ (2k - 1)-approximate distance oracle in O(n2 ) time if we employ a typical sparse spanner, it would invariably increase the stretch of the oracle beyond (2k - 1). Parameterized Spanner : We present a linear time algorithm that takes an unweighted graph G as input and a parameter R V to compute a sparse spanner of the graph G. The characteristic of this spanner is that each edge of the graph whose at-least one endpoint is not adjacent to any vertex from set R is surely present in the spanner too. We use this property of the parameterized spanner cleverly to ensure that for the 3-approximate distance oracle, the distance between u and its nearest vertex from set S is preserved for each u V in the spanner. We also ensure sparseness of the spanner by using random sampling to choose the parameter R V . For computing (2k - 1)-approximate distance oracle, we use a parameterized (2, 1)-spanner of 2k-1 size O(n k ) computed with a set of O(n1/k ) vertices as parameter. 2 clusters as follows. Each cluster will have exactly one vertex x R, called center of the cluster. Note that each vertex v VR - R has at-least one neighbor from set R, so assign the vertex v to the cluster centered at that neighbor. (Break the tie arbitrarily in case v has multiple neighbors from R). Add all the edges incident on vertices of R to the set ER . Finally remove all the intra-cluster edges from set ER , and pass the clustering of set VR and the remaining edges, to the second step. 2. Adding edges between vertices and clusters : For each vertex v VR , we group all its neighbors into their respective clusters. There will be atmost |R| neighboring clusters of v . For each cluster adjacent to vertex v , we select an edge from the set of edges between (the vertices of ) the cluster and v , and add it to the set ER . It is easy to verify that implementation of the two steps of the algorithm merely requires two traversals of the adjacency list of each vertex. Thus the running time of the algorithm is O(m). Lemma 2.1. The number of edges of the spanner G(VR , ER ) is bounded by O(|R| · |VR |). Lemma 2.2. Let e(u, v ) ER be an edge not present in the spanner-edges ER . There is a one-edge or a two edge path in the spanner (VR , ER ) between the vertex u and the center of the cluster containing vertex v , and vice versa. Proof. There are two cases depending upon whether u A linear time algorithm for computing a and v belong to same cluster or different clusters. parameterized (2, 1)-spanner Let G(V , E ) be the given undirected unweighted graph. For a sample R V of vertices, let VR be the set of x all those vertices that either belong to the set R or v u are adjacent to at-least one vertex from the set R. Case 1 Let G(VR , ER ) denote the sub-graph induced by the vertices VR , thus ER consists of all those edges both w w' of whose end-points belong to VR . Let ER denote the remaining edges E - ER . We first present an algorithm x A(R) that computes a sparse (2, 1)-spanner for the y u v sub-graph G(VR , ER ). Case 2 Algorithm A(R) for (2, 1)-spanner of G(VR , ER ) Input : sub-graph G(VR , ER ) for a sample R V Figure 2: The two cases depending upon whether u, v Output : G(VR , ER ) as a (2, 1)-spanner of G(VR , ER ) belong to same/different clusters Initially ER = , and the algorithm adds edges to ER in two steps. Case 1 : (u and v belong to same cluster) 1. Forming clusters : Let the cluster centered at x R contains both the Partition the set VR into subsets of vertices called vertices u and v . Therefore, the edges e(u, x) and 273 e(v, x) are present in the spanner (refer to step 1.(a) of algorithm A(R). So there is one edge path from v to the center of the cluster containing u, and vice versa. Case 2 : (u and v belong to different clusters) Let x and y be the centers of the clusters containing u and v respectively. From step 2 of the algorithm, it follows that since u is adjacent to cluster centered at y , therefore, an edge between u and some vertex, say w of this cluster has been added to ER . From step 1 of the algorithm, it follows that the edge e(w, v ) is also present in the spanner. So there is a two-edge path u - w - y between u and y . In a similar way, there is a two-edge path v - w - x between v and the vertex x. On the basis of the lemma given above, we shall now prove the following Lemma to show that the spanner G(VR , ER ) is indeed a (2, 1)-spanner of G(VR , ER ). Lemma 2.3. For each pair of vertices u, v VR , if they are connected by some path in the sub-graph G(VR , ER ), the distance (u, v ) between the two in the span ner G(VR , ER ) is related to their actual distance in G(VR , ER ) by the fol lowing inequality : (u, v ) 2 (u, v ) + 1 Proof. For a vertex x VR , let C (x) denote the center of the cluster to which the vertex x belongs. Let u, w VR be any two vertices separated by a shortest path of length j in the sub-graph G(VR , ER ). Let suw : {u = v0 , v1 , · · · , vj = w} be the sequence of vertices (as they appear) on the shortest path between u and w. Consider another sequence cuw : {v0 , C (v1 ), v2 , C (v3 ), · · ·} formed by replacing each vertex v2i+1 of the sequence suw by C (v2i+1 ), for i j /2. It follows from Lemma 2.2 that for each edge e(y , z ) ER , there is a path between C (y ) and z of length at-most two in the spanner, and there is also a path between y and C (z ) of length at-most two in the spanner. Hence each pair of consecutive vertices in the sequence cuw is connected by a path of length at-most two in the spanner. So there is a path of length at-most 2j between u and the last vertex of the sequence cuw in the spanner. Note that the last vertex of the sequence cuw is w or C (w) depending on whether the path length j is even or odd. If the path length is even, it implies that (u, w) 2j . Otherwise (j is odd) note that either C (w) is the vertex w itself or there is an edge between C (w) and w in the spanner G(VR , ER ). Thus (u, v ) 2j + 1. Combining the two cases (even and odd j ), we can conclude that (u, v ) 2 (u, v ) + 1. remaining vertices V - VR along with all their edges ER (recall that ER = E - ER ). Using arguments similar to Lemma 2.3, the following lemma shows that this subgraph is indeed a (2, 1)-spanner of the original graph G(V , E ). Lemma 2.4. For a given R V , if G(VR , ER ) is the (2, 1)-spanner of sub-graph G(VR , ER ) reported by the algorithm A(R), then the sub-graph G(V , ER ER ) is also a (2, 1)-spanner of the original graph G(V , E ). Proof. The proof is similar to that of Lemma 2.3. Let C (x) denotes the vertex x if x VR , otherwise C (x) / denotes the center of the cluster containing vertex x. Consider an edge e(x, y ) E . Note that the set ER has all the edges whose at-least one end-point belongs to V - VR . So if e(x, y ) is not present in set ER ER , it must be in the sub-graph G(VR , ER ), for whom G(VR , ER ) is the spanner. Therefore the statement of Lemma 2.2 can be restated as follows : For each edge e(x, y ) E , there is a path of length at-most two between C (x) and y in the subgraph G(V , ER ER ), and vice versa. To ensure that (u, w) 2 (u, w) + 1 in the subgraph G(V , ER ER ), consider the sequence suw : {u = v0 , v1 , · · · , vj = w} of vertices of the shortest path between u and v in the order they appear on it. Now consider another sequence cuw : {v0 , C (v1 ), v2 , C (v3 ), · · ·} formed by replacing each vertex v2i+1 of the sequence suw by C (v2i+1 ), for i j /2; and proceed exactly as in case of Lemma 2.3 to complete the proof. For a given graph G(V , E ) and parameter R, henceforth we shall use GR to denote the parameterized (2, 1)-spanner G(V , ER ER ). 3 Preliminaries In this section, we give definitions, notations and lemmas to be used in the context of our description of approximate distance oracles. 3.1 Ball around a vertex An important construct of the approximate distance oracles is B all(·). Definition : For a vertex u, and subsets X, Y of vertices, B all(u, X, Y ) is the set consisting of al l those vertices of the set X that lie closer to u than any vertex from set Y . (see Figure 3) It follows from the definition given above that B all(u, X, ) is the set X itself, whereas B all(u, X, X ) = . It can also be seen that the 3-approximate distance oracle outlined previously Consider the sub-graph G(V , ER ER ) formed by adding stores distance from each u V to all the vertices of to the spanner computed by algorithm A(R), all the B all(u, V , S ) and B all(u, S, ). 274 u Ball(u,X,Y) vertex of set X vertex of set Y vertex of set V-X-Y Figure 3: The vertices pointed by solid-arrows constitute B all(u, X, Y ) that a vertex with degree ni/k ln n or more has a neighi i bor from set Sk+1 , i.e., the set Sk+1 is a dominating set for the set of vertices with degree ni/k ln n or more. As a i i result, B all(u, Sk , Sk+1 ) would be just the singleton set {u}, and so for each vertex u V with degree ni/k ln n or more, the contribution in above equation would be reduced to deg(u) from n1/k deg(u). Hence, the overi i all time required for computing B all(u, Sk , Sk+1 ) for all u V would become d d deg(u) + n1/k deg(u) eg(u)>ni/k ln n eg(u) 1, let {Ri |i k } denote a hierarchy -1j 1 denote the vertex from Sk that is nearest B all(u, Ri , Ri+1 ) for each u V , 1 i < k . Asymp1 k k to vertex u, and p (u) denote the vertex u itself. totically the time required in their computation is 3.3 Preserving the distance from u to all the u u k i+1 i 1/k vertices of B all(u, V , Sk ) in the parameterized |B all(u, Rk , Rk )| deg(u) = n deg(u) (2, 1)-spanner GS k For a given unweighted graph V V k G(V , E ), consider the parameterized (2, 1)-spanner {by lemma 3.1} k GS k built using the set Sk as the parameter. Let k We show that the running time of the algorithm can u = v0 , v1 , v2 , · · · , vj -1 , vj = v be the shortest path k be improved for unweighted graphs by using a slightly between u and v B all(u, V , Sk ) in the graph G(V , E ). i different hierarchy of vertex-sets {Sk |1 i k } as de- Note that none of the vertices vi , i < j have any neighk k scribed below. The new hierarchy of vertex-sets ensures bor from the set Sk -1 (otherwise v B all(u, V , Sk ), / Lemma 3.1. Given a graph G(V , E ), let the set Y be formed by picking each vertex of a set X V independently with probability . The expected size of B all(u, X, Y ) is bounded by 1/ . 275 a contradiction !). Since the parameterized spanner would keep all those edges whose one of the end-point k has no neighbor in the parameter vertex-set Sk , so all the edges incident on the vertices vi , i < j are present in the parameterized (2, 1)-spanner GS k . Hence the shortk k est path from u to each v B all(u, V , Sk ) is present in the (2, 1)-spanner GS k too. Also note that all the edges k k incident on each x Sk are also kept in the parameterized spanner. Therefore, the following equality holds. { k (3.1) (u, v ) = (u, v ) v B all(u, V , Sk ) pk (u)} 4 u 1 vertex of set Rk 2 vertex of set Rk vertex of set Rk3 . . . (i) Ball (u) 3 Ball 4(u) u (2k - 1)-approximate distance oracle for unweighted graph Ball 1(u) Ball 2 (u) In this section we shall give the construction of a (ii) (2k - 1)-approximate distance oracle which is also based on the idea I , and can be viewed as a generalization of Figure 4: (i) Close description of B alli (u), i < k , (ii) 3-approximate distance oracle. The (2k - 1)-approximate distance oracle is obtained as hierarchy of balls around u follows. 1 2 k 1. Let Sk Sk · · · Sk be the hierarchy of subsets of vertices as defined in Theorem 3.1. 4.1 Rep orting distance with stretch at-most 2. For each vertex u V , store the distance from u (2k - 1) Given any two vertices u, v V whose interk to all the vertices of Sk in a hash table, denoted by mediate distance has to be determined approximately, we shall now present the procedure to find approximate k B all (u) henceforth. distance between two vertices using the k -level data3. For each u V and each i < k , store the vertices structure described above. i i of B all(u, Sk , Sk+1 ) along with their distance from Recall that p1 (u) = u and pi (u) for i > 1 is the i u in a hash table. vertex from the set Sk that is nearest to u. Since i i For sake of conciseness and without causing any p (u) B all (u) for each u V , so distance from each i ambiguity in notations, henceforth we shall use u to p (u) is known for each i k . i i The query answering process performs at-most k B alli (u) to denote B all(u, Sk , Sk+1 ) or the correi+1 i search steps. In the first step, we search B all1 (u) for sponding hash-table storing B all(u, Sk , Sk ) for the vertex p1 (v ). If p1 (v ) is not present in B all1 (u), i < k. we move to the next level and in the second step we The collection of the hash-tables B alli (u) : u search B all2 (v ) for vertex p2 (u). We proceed in this V , i k constitute the data-structure that will facil- way querying balls of u and v alternatively : In ith itate answering of any approximate distance query in step, we search B alli (x) for pi (y ), where (x = u, y = v ) constant time. Note that Instead of using typical hash if i is odd, and (x = v , y = u) otherwise. The search table that achieves expected O(1) query time, we im- ends at ith step if pi (y ) belongs to B alli (x), and then plement each B alli (u) as a 2-level hash table given by we report (x, pi (y )) + (y , pi (y ) as an approximate Fredman, Komlos and Szemeredi [7] of optimal size, so distance between u and v . that it takes O(1) worst case time to determine whether Distance Rep ort(u, v ) a vertex v V belongs to the B alli (u) or not, and rel 1, x p u,y v port the distance (u, v ) in case v belongs to B alli (u). d While l (y ) B alll (x) o / To provide a better insight into the data-structure, swap(x, y ), l l + 1 Figure 4 depicts the set of vertices constituting i If l < k , return (y , pl (y )) + (x, pl (y )) {B all (u)|i k }. k k ( To ensure the desired stretch-bound in the distance Else return min of ( (y , p k(y )) + x, p k(y ))) and ( (x, p (x)) + (y , p (x))) reported by (2k - 1)-approximate distance oracle, Thek k Note that p (y ) Sk , and we store the distance orem 4.2 (main theorem of this paper) would play a k from x to all the vertices of set Sk in B allk (x). Therecentral role. 276 fore, the 'while loop' of the distance reporting algorithm Case 1 (y B allk (x)) : In this case, we bound will execute at-most k - 1 iterations, spending O(1) time (y , pk (y )) + (x, pk (y )) as follows. querying a hash table in each iteration. (y , pk (y )) + (x, pk (y )) In order to ensure that the above algorithm reports (2k - 1)-approximate distance between u, v V , we first = (y , pk (y )) + (x, pk (y )) {using Equation 3.1} show that the following assertion holds : 2 (y , pk (y )) + (x, y ) {using triangle inequality} 2(k - 1) (u, v ) + (u, v ){using assertion Ak-1 } Ai : At the end of ith iteration of the 'while loop', (y , pi+1 (y )) i (u, v ). = (2k - 1) (u, v ) The assertion Ai can be proved using induction on i as follows. First note that the variables x and y take Case 2 (y B allk (x)) : Theorem 4.2 ensures that the / the values u and v alternatively during the 'while-loop'. distance reported is no more than 3 (x, y ) = 3 (u, v ) So (x, y ) = (u, v ) always. (2k - 1) (u, v ), k > 1. For the base case (i = 0), p1 (y ) is same as y , and k The size of the set Sk is O(n1/k ), and the expected y is v . So (y , p1 (y )) = 0. Hence A0 is true. For the i i size of each B all(u, Sk , Sk+1 ) is n1/k using Theorem 3.1 rest of the inductive proof, it suffices to show that if Aj is true, then after (j + 1)th iteration Aj +1 is also true. and Lemma 3.1. So the expected size of the distance oracle is O(n1/k · n + (k - 1) · n · n1/k ) = O(k n1+1/k ). The proof is as follows. We consider the case of 'even j ', the arguments for the case of 'odd j ' are similar. For even j , at the end of 4.2 Main Theorem j th iteration, {x = u, y = v }, Thus Aj implies that Theorem 4.2. For the (2, 1)-spanner GS k of unk at the end of j th iteration (v , pj +1 (v )) j (u, v ). k / Consider the (j + 1)th iteration. For the execution weighted graph G(V , E ), if v B all (u), then ei k k k of (j + 1)th iteration, the condition in the 'while-loop' ther ( (u, p (u)) + (v , p (u))) or ( (v , p (v )) + k must have been true. Thus pj +1 (v ) does not belong (u, p (v ))) is bounded by 3 (u, v ). to B all(u, Rj +1 , Rj +2 ). Hence by Definition of B all(·), k k Proof. If either u is adjacent to pk (u) or v is adjacent the vertex pj +2 (u) must be lying closer to u than the to pk (v ), we can prove the theorem as follows. Let v be vertex pj +1 (v ). So at the end of (j + 1)th iteration, adjacent to pk (v ), and v = v0 , v1 , v2 · · · , vj +1 = u be the (y , pj +2 (y )) can be bounded as follows shortest path of length (j + 1) between v and u in the j +2 j +2 original graph. It follows from Lemma 2.2 that there is (y , p (y )) = (u, p (u)) a path from pk (v ) to v1 consisting of at-most two edges (u, pj +1 (v )) in the spanner. By definition of (2, 1)-spanner, there (u, v ) + (v , pj +1 (v )) is a path from v1 to vj +1 in the spanner consisting of {using triangle inequality} at-most 2j + [j > 0] edges. (Note that [j > 0] is one if j > 0, and zero otherwise). Hence distance (v , pk (v )) (u, v ) + j (u, v ) {using Aj } between v and pk (v ) in the spanner is no more than = (j + 1) (u, v ) 2(j + 1) + [j > 0]. Also the edge e(v , pk (v )) is present in the spanner. Hence, (v , pk (v )) + (u, pk (v )) Thus the assertion Aj +1 holds. We employ Theorem 4.2 and the observations of 1 + 2(j + 1) + [j > 0] 3(j + 1) = 3 (v , u). If neither u is adjacent to pk (u) nor v is adjacent the parameterized (2, 1)-spanner from previous section to pk (v ), let us divide the shortest path Puv between to prove the following theorem. u and v in the graph G(V , E ) into three parts : the Theorem 4.1. The algorithm Distance Report(u, v ) part P k uw lying in B all (u), the part Pw v lying inside reports (2k - 1)-approximate distance between u and v B allk (v ), and the remaining part P ww of the path that does not lie in either of the two balls. Let the subProof. Note that that the algorithm DistanceReport(u, v ) would execute (l - 1) iterations of paths Puw , Pww , Pw v have lengths , , respectively. u the 'while loop'. If l < k , the distance reported is Note that both and are greater or eq al to 1 (see Figure 5). Now we bound the distance (u, pk (u)) + l l (y , p (y )) + (x, p (y )), which by triangle inequality k is no more than 2 (y , pl (y )) + (x, y ). Note that (v , p (u)) as follows. (x, y ) = (u, v ), and (y , pl (y )) (l - 1) (u, v ) as follows from assertion Al . Therefore, the distance reported is no more than (2l - 1) (u, v ) < (2k - 1) (u, v ). If l = k , there are two cases. (u, pk (u)) + (v , pk (u)) { (u, pk (u)) + (v , u) + (u, pk (u)) by triangle inequality} 277 pk (u) v u w w' v (u,v) w u p i+1 (w) Ballk (u) Ballk (v) Figure 5: dividing the path u - v into three parts. Figure 6: if w does not lie in C (v , Ri+1 ), then pi+1 (w) k would lie closer to u than v . 5.2 Computing B alli (u) efficiently In order to i i compute B alli (u) = B all(u, Sk , Sk+1 ) efficiently for all i i the vertices, we first compute sets {C (v , Sk+1 )|v Sk } which are defined as follows : Definition : For a graph G(V , E ), and a set X V , the set C (v , X ) consists of al l those vertices of the graph for whom v lies closer than any vertex of set X . i It follows from the definition above that u C (v , Sk+1 ) 5 Computing (2k - 1)-approximate distance if and only if v B alli (u). We shall make use of the ~ following equality which is based on this observation. oracle in O(n2 ) time v u uv It follows from the description of the data-structure deg(u) = deg(u) associated with approximate distance oracle that after (5.2) i i V B alli (u) Sk C (v ,Sk+1 ) i forming the hierarchy of sets Sk , that takes O(m) time, all that is required is the computation of pi (u) and i i Given {C (v , Sk+1 )|v Sk }, we can compute B alli (u) for each u and i k . {B alli (u)|u V } as follows. 5.1 Computing pi (u) for each u V Recall from i definition itself that pi (u) is the vertex of the set Sk i that is nearest to u. Hence, computing p (u) for each u V requires solving the following problem with i X = Sk , Y = V - X . Problem : Given X, Y V in a graph G = (V , E ), with X Y = , compute the nearest vertex of set X for each vertex y Y . The above problem can be solved by running a single source shortest path algorithm on a modified graph as follows. Modify the original graph G by adding a dummy vertex s to the set V , and joining it to each vertex of the set X by an edge of zero weight. Let G be the modified graph. It can be seen that the distance from a vertex y to its nearest vertex from set X is (s, y ) - 1. Moreover while Running the shortest path algorithm from the vertex s, if e(s, x), x X is the edge leading to the shortest path from s to y , then x is the vertex from the set X that lies nearest to y . For an unweighted graph, single source shortest paths can be computed in O(m) time by a variant of breadth-first search procedure starting from the source. we can thus state the following lemma. i For each v Sk do i For each u C (v , Sk+1 ) do i B all (u) - B alli (u) {v } Hence we can state the following Lemma. i Lemma 5.2. Given the family of sets {C (v , Sk+1 )|v i i Sk }, we can compute {{B all (u)|u V )} in time of the u i order of V |B all (u)|. i The following property of the set C (v , Sk+1 ) will be used in its efficient computation. i Lemma 5.3. If u C (v , Sk+1 ), then al l the vertices on the shortest path from v to u also belong to the set i C (v , Sk+1 ). = 2 (u, p (u)) + ( (u, w) + (w, w + (w v )) = 2 (u, pk ) + (u, w) + (w, w ) + (w , v ) {by Equation 3.1 } = 2( (u, w) + 1) + (u, w) + (w, w ) + (w , v ) = 2( + 1) + + (2 + [ > 0]) + {by defn. of (2, 1)-spanner } = 3 + 2 + + (2 + [ > 0]) 3( + + ) = 3 (u, v ) k ) , Lemma 5.1. Given X, Y V in an unweighted graph G(V , E ), with X Y = , it takes O(m) time to compute the nearest vertex of set X for each vertex y Y . Proof. We give a proof by contradiction. Given that i u C (v , Sk+1 ), let w be any vertex on the shortest path i from v to u. If w C (v , Sk+1 ), the vertex v doesn't lie / closer to w than the vertex pi+1 (w). See Figure 6. In other words (w, v ) (w, pi+1 (w). Hence (u, v ) = (u, w) + (w, v ) (u, w) + (w, pi+1 (w)) (u, pi+1 (w)) 278 i Thus v does not lie closer to u than pi+1 (w) which is a Lemma 5.4. The set C (v , Sk+1 ) can be computed in u i+1 i+1 vertex of set Sk . Hence by definition, u C (v , Sk ), O(deg(v ) + / C (v ,Ri+1 ) deg(u)) time. k thus a contradiction. Hence the total time Ti required for computing i+1 i In the shortest path tree rooted at v , it follows from {C (v , Sk |v Sk } for i < k is i Lemma 5.3 that the entire cluster C (v , Sk+1 ) would apu v pear as a sub-tree rooted at v . Also each vertex x deg(v ) + Ti = deg(u) belonging to this subtree (cluster) satisfies (v , x) < i i+1 Sk C (v ,Rk ) (x, pi+1 (x)). Based on these two observations, here v v u follows an efficient algorithm that computes the set = deg(v ) + deg(u) i+1 C (v , Sk ). The algorithm performs a restricted shorti i Sk Sk C (v ,Ri+1 ) k est path computation from the vertex v , wherein we uv m+ deg(u) {using Equation 5.2 } don't proceed along any vertex that does not belong to i V B alli (u) the set C (v , Sk+1 ). u N | Restricted shortest path algorithm : Note that the = m+ B alli (u)| · deg(u) shortest path algorithm (Dijkstra's algorithm) starts V with singleton tree {v } and performs n - 1 steps to grow ow using Theorem 3.1, it follows that B alli (u) = {u} the complete shortest path tree. Each vertex x V \{v } i/k is assigned a label L(x), which is infinity in the begin- if deg(u) is more than n ln n. So contribution of all such vertices in the above expression is no more than ning, but eventually becomes the distance from v to x. Let Vi denotes the set of i nearest vertices from v . The m. Hence the inequality can be written as | u algorithm maintains the following invariant at the end Ti 2m + B alli (u)| · min(ni/k ln n, deg(u)) of lth step : I (l) : For all the vertices of the set Vl , the label L(x) = (v , x), and for every other vertex y V \Vl , the label L(y ) is equal to the length of the shortest path from v to y that passes through vertices of Vl only. During the (j + 1)th step, we select the vertex, say w from set V - Vj with least value of L(·). Since all edge-weights are positive, it follows from the invariant I (j ) that L(w) = (w, v ). Thus we add w to set Vj to get the set Vj +1 . Now in order to satisfy the invariant I (j + 1), we relax each edges e(w, y ) incident from w to a vertex y V - Vj +1 as follows : L(y ) min{L(y ), L(w) + weig ht(w, y )}. It is easy to observe that this ensures the validity of the invariant I (j + 1). In the restricted shortest path algorithm, we will put the following restriction on relaxation of an edge e(w, y ) : we relax the edge e(w, y ) only if L(w) + weig ht(w, y ) is less than (y , pi (y )). This will ensure that a vertex y C (v , Ri+1 ) will never be visited during / k the algorithm. The fact that the vertices of the cluster C (v , Ri+1 ) form a sub-tree of the shortest path tree k rooted at v ensures that the above restricted shortest path algorithm indeed finds all the vertices that form the cluster C (v , Ri+1 ) along with their distance from v . k The running time of the above algorithm is dominated by the total number of edges relaxed, and each edge relaxation requires O(1) time for unweighted graphs. (For unweighted graphs, Dijkstra's algorithm is just a variant of breadth-first search). So, the abuve algorithm runs in time of the order of deg(v ) + o C (v ,Ri+1 ) deg(u). k V 2m + u{ V n 1/k · min(ni/k ln n, deg(u)) = using Lemma 3.1 } u ni/k ln n , V k 2m + n1/k min = 2m + min O(min(n n k+i+1 u V deg(u) ln n, n1/k m k+i+1 k ln n, mn1/k )) {for all 1 i < k } Combining the upper bound of Ti mentioned above and Lemma 5.2, we can state the following theorem. Theorem 5.1. Given an unweighted graph G(V , E ), j and the family of sets {Sk |j k }, it takes k+i+1 1/k O(min(n k ln n, mn )) time to compute {B alli (u)|u V } for i < k . The total time required to compute the family of sets {B alli (u)|u V , i < k } is no more than O(min(n2 ln n, k mn1/k ) as follows from Theorem 5.1. k Computing distances from a vertex x Sk to all the vertices in the (2, 1)-spanner GS k requires merely a k BFS traversal of the spanner with x as root, which can be performed in time of the order of the number of edges in the spanner. Since the spanner GS k has O(min(m, n2-1/k ln n)) edges and there are k k O(n1/k ) vertices in the set Sk , the total time to com k pute { (x, v )|x Sk , v V } in (2, 1)-spanner is 279 O(min(mn1/k , n2 ln n). Hence the total time required to compute (2k - 1)-approximate distance oracle is O(min(n2 ln n, k mn1/k )). We repeat the algorithm if the size of the approximate distance oracle exceeds O(k n1+1/k ). The expected number of repetitions is O(1). Combining these facts with Theorem 4.1, we can state the following theorem. in O(m) expected time. In order to minimize the size of the spanner, we balance the two terms n/p and n2 p. Choosing p = n-1/2 , we get a bound of O(n3/2 ) on the size of (2, 1)-spanner. We can thus state the following Theorem. Theorem 6.2. Given an unweighted undirected graph G(V , E ), a (2, 1)-spanner of size O(n3/2 ) can be computed in expected O(m) time. Theorem 5.2. An unweighted graph G(V , E ) can be processed in O(min(n2 ln n, k mn1/k ) time to compute Thorup and Zwick [9] show that (n3/2 ) bound on a data-structure of size O(k n1+1/k ) that can report ce (2k - 1)-approximate distance between any two vertices the size of 3-spanner is indeed optimal. Sin3/2 a (2, 1)spanner is also a 3-spanner, therefore, (n ) bound in O(k ) time for any integer k > 1. is optimal for (2, 1)-spanner too. There does not exist any previous algorithm for computing (2, 1)-spanner. 6 An optimal size (2, 1)-spanner However there exist linear time algorithms [1, 8] for Consider the (2, 1)-spanner G(V , ER ER ) from Lemma computing (2k - 1)-spanner for k 3. Elkin and Peleg 2.4. Let the sample set R be formed by picking each [6] presented algorithm for computing (1 + , ) spanner, vertex independently with some fixed probability p. The but with O(n2+µ ) running time. expected size of the set R will be np. Therefore it follows from Lemma 2.1 that the expected size of the set ER 2 will be n p. The following lemma bounds the expected References size of the set ER . Lemma 6.1. Given a graph G(V , E ), and a sample set R formed by picking each vertex independently with probability p, the expected number of edges ER is O(n/p). Proof. Note that the edges adjacent to a vertex u V - R appear in the set ER if neither u nor any of its neighbor is adjacent to any vertex of the sample R. Since each vertex is selected in the sample R independently with probability p, so the expected number of edges contributed by vertex v to ER is deg(v ) · (1 - p)deg(v)+1 . Using elementary calculus and the fact that p 1, it can be verified that vertex v contributes maximum number of edges to ER if deg(v ) is close to (1/p), and the maximum number is (1/p). Hence, using linearity of expectation, the expected number of edges in the set ER is bounded by (1/p). Combining Lemmas 6.1 and 2.4, we can state the following theorem. Theorem 6.1. For an unweighted graph G(V , E ), and a sample R V formed by selecting each vertex independently with probability p, the subgraph G(V , ER ER ) is a (2, 1)-spanner with expected size O(n/p + n2 p) As mentioned earlier, the running time of our algorithm is O(m). We would repeat the algorithm if the size of the spanner is more than O(n/p + n2 p). The expected number of iterations is O(1). Thus a (2, 1) spanner of size O(n/p + n2 p) can be computed [1] S. Baswana and S. Sen. A simple linear time algorithm for computing a (2k - 1)-spanner of O(n1+1/k ) size in weighted graphs. In Proceedings of the 30th International Col loquium on Automata, Languages and Programming (ICALP), pages 384­396, 2003. [2] E. Cohen and U. Zwick. All-pairs small stretch paths. Journal of Algorithms, 38:335­353, 2001. [3] D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9:251­280, 1990. [4] D. Dor, S. Halperin, and U. Zwick. All pairs almost shortest paths. Siam Journal on Computing, 29:1740­ 1759, 2000. [5] M. Elkin. Computing almost shortest paths. In Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing, pages 53­62, 2001. [6] M. Elkin and D. Peleg. (1 + , )-spanner construction for general graphs. In Proceedings of 33rd ACM Symposium on Theory of Computing (STOC), pages 173­182, 2001. [7] M. L. Fredman, J. Koml´s, and E. Szemer´di. Storing o e a sparse table with o(1) worst case time. Journal of ACM, 31:538­544, 1984. [8] S. Halperin and U. Zwick. Unpublished result. 1996. [9] M. Thorup and U. Zwick. Approximate distance oracle. In Proceedings of 33rd ACM Symposium on Theory of Computing (STOC), pages 183­192, 2001. [10] V. V. Vazirani. In Approximation Algorithms. Springer Verlag, 2001. [11] U. Zwick. All-pairs shortest paths in weighted directed graphs - exact and almost exact algorithms. In Proceedings of the 39th Annual Symposium on Foundations of Computer Science (FOCS), pages 310­319, 1998. 280