Experimental Analysis of Dynamic All Pairs Shortest Path Algorithms Camil Demetrescu Abstract We present the results of an extensive computational study on dynamic algorithms for all pairs shortest path problems. We describ e our implementations of the recent dynamic algorithms of King [18] and of Demetrescu and Italiano [7], and compare them to the dynamic algorithm of Ramalingam and Reps [25] and to static algorithms on random, real-world and hard instances. Our exp erimental data suggest that some of the dynamic algorithms and their algorithmic techniques can b e really of practical value in many situations. Stefano Emiliozzi Giuseppe F. Italiano § 1 Intro duction In this paper we consider fully dynamic algorithms for all pairs shortest path problems. Namely, we would like to maintain information about shortest paths in a weighted directed graph sub ject to edge insertions, edge deletions and edge weight updates. This seems an important problem on its own, and it finds applications in many areas (see, e.g., [24]), including transportation networks, where weights are asso ciated with traffic/distance; database systems, where one is often interested in maintaining distance relationships between ob jects; data flow analysis and compilers; do cument formatting; and network routing [10, 22, 23]. This problem was first studied in 1967 [20, 21], but the first fully dynamic algorithms that in the worst case were provably faster than recomputing the solution from scratch were proposed only twenty years later. We recall here that the all pairs shortest path problem can be solved in O(mn + n2 log n) worst-case time with This work has b een partially supp orted by the IST Programme of the EU under contract n. IST-1999-14.186 (ALCOMFT), by the Italian Ministry of University and Research (Pro ject "ALINWEB: Algorithmics for Internet and the Web"). Dipartimento di Informatica e Sistemistica, Universita ` di Roma "La Sapienza", Roma, Italy. Email: demetres@dis.uniroma1.it. URL: http://www.dis.uniroma1.it/~demetres. Dipartimento di Informatica e Sistemistica, Universita di ` Roma "La Sapienza", Roma, Italy. Email: thepomy@tin.it. § Dipartimento di Informatica, Sistemi e Produzione, Universita di Roma "Tor Vergata", ` Roma, Italy. Email: italiano@disp.uniroma2.it. URL: http://www.info.uniroma2.it/~italiano. Dijkstra's algorithm and Fibonacci heaps [8, 11], where m is the number of edges and n is the number of no des. Since m can be as high as (n2 ), this is (n3 ) in the worst case. In 1999 King [18] presented a fully dynamic algorithm for maintaining all pairs shortest paths in directed graphs with positive integer weightsess than l C : the running time of her algorithm is O(n2.5 C log n ) per update and O(1) per query. Demetrescu and Italiano [6] showed how to maintain all pairs shortest paths on directed graphs with real-value edge weights that can take at most S different values: their time S bound is O(n2.5 log3 n ) per update and O(1) per query. Very recently, the same authors [7] presented a fully dynamic shortest paths algorithm that requires O(n2 ) 1 time per update and constant time per query. Our Results. The ob jective of this paper is to advance our knowledge on dynamic shortest path algorithms by following up the recent theoretical progress of King [18] and of Demetrescu and Italiano [7] with a thorough empirical study. In particular, we present and experiment with efficient implementations of dynamic shortest path algorithms. Our empirical analysis shows that some of the dynamic algorithms and their techniques can be really of practical value in many situations. Indeed, we observed that in practice their speed up is much higher than the one predicted by the theoretical bounds: they can be even two to four orders of magnitude faster than repeatedly computing a solution from scratch with a static algorithm. Furthermore, our work may shed light on the relative behavior of some implementations on different test sets and on different computing platforms: this might be helpful in identifying the most suitable dynamic shortest path co de for different application scenarios. As a side result of our experimental work, we propose also a new static algorithm, which in practice reduces substantially the total number of edges scanned and thus can run faster than Dijkstra's algorithm on dense graphs. Related Work. Besides the extensive computational studies on static shortest path algorithms (see, e.g., [4, 1 Throughout the O (f (n)polylog(n)). paper, we use O(f (n)) to denote Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 369 14, 27]), many researchers have been complementing the wealth of theoretical results on dynamic graphs with thorough empirical studies. In particular, Frigioni et al. [13] proposed efficient implementations of dynamic transitive closure and shortest path algorithms, while Frigioni et al. [12] and later Demetrescu et al. [5] conducted an empirical study of dynamic single-source shortest path algorithms. Many of these shortest path implementations refer either to partially dynamic algorithms or to special classes of graphs. Other dynamic graph implementations include the work of Alberts et al. [1] and Iyer et al. [16] (dynamic connectivity), and the work of Amato et al. [2] and Cattaneo et al. [3] (dynamic minimum spanning tree). 2 Exp erimental Setup 2.1 Test Sets. In our experiments we considered three kinds of test sets: random inputs, real-world inputs, and synthetic inputs. Random inputs. We considered random graphs with n nodes, m edges, and random integer edge weights. To generate the update sequence, we select at random one operation among edge insertion, edge deletion, and edge weight update. If the operation is an edge insertion, we select at random a pair of nodes x, y such that edge (x, y ) is not in the graph, and we insert edge (x, y ) with random weight. If the operation is a deletion, we pick at random an edge in the graph, and delete it. If the operation is an edge weight update, we randomly pick an edge in the graph, and change its weight to a new random value. Real-world inputs. We considered two kinds of realworld inputs: US road networks and Internet networks. The US road networks were obtained from ftp://edcftp.cr.usgs.gov, and consist of graphs having 148 to 952 nodes and 434 to 2,896 edges. The edge weights can be as high as 200,000, and represent physical distances. As Internet networks, we considered snapshots of the connections between Autonomous Systems (AS) taken from the University of Oregon Route Views Archive Pro ject (http://www.routeviews.org). The resulting graphs (AS 500, . . . ,AS 3000) have 500 to 3,000 nodes and 2,406 to 13,734 edges, with edge weights as high as 20,000. The update sequences we considered in real-world graphs were random edge weight updates (within the same weight range as the original graph). Synthetic inputs. We considered bottleneck graphs formed by two bipartite components (X1 , Y1 ) and (X2 , Y2 ), with |X1 | = |Y1 | = |X2 | = |Y2 | and with edges directed from Xi to Yi , i = 1, 2. Nodes in Y1 reach nodes in X2 through a single edge (u, v ), i.e., there is an edge (y , u) for each y Y1 and there is an edge (v , x) for each x X2 . This way, most shortest paths in the graph go through edge (u, v ). Bottleneck inputs are obtained by applying a sequence of random weight updates on the edge (u, v ). This is done to force hard instances, which cause many changes in the solution. 2.2 Computing Platforms. To assess the experimental performance of our implementations on different architectures, we experimented on a variety of computing platforms: · AMD Athlon - 1.8 GHz, 256KB L2 cache, 512MB RAM. · AMD Athlon - 2.1 GHz, 256KB L2 cache, 768MB RAM. · Intel Xeon - 500 MHz, 512KB L2 cache, 512MB RAM. · Intel Pentium 4 - 2.0 GHz, 512KB L2 cache, 2GB RAM. · Sun UltraSPARC I Ii - 440 MHz, 2MB L2 cache, 256MB RAM. · IBM Power 4 - 1.1 GHz, 1.4MB L2 cache, 32MB L3 cache, 64GB RAM. On these platforms, we experimented on different operating systems (Linux kernel 2.2.18, Solaris 8, Windows XP Professional, AIX 5.2) and compilers (GNU gcc 2.95, IBM xlc 6.0, Microsoft Visual C++ 7, Metrowerks CodeWarrior 6). We also used different systems for monitoring memory accesses and for simulating cache effects (Valgrind, Cachegrind [26]). In our experiments we noticed that most of the relative behaviors of the implementations did not depend heavily on the architecture/operating system/compiler. Whenever this dependence was significant, we will point it out explicitly. 3 Algorithms Implemented In this section we briefly describe the algorithms considered in our experimental analysis, addressing implementation and performance tuning issues. Table 1 lists the algorithms under investigation and their theoretical asymptotic bounds. The algorithms were implemented in C following a uniform programming style and using exactly the same data structures as basic building blocks (heaps, dynamic arrays, hash tables, graphs). Our goal was more to provide a unified experimental framework for comparing the relative efficiency of different algorithmic techniques, rather than to produced heavily engineered codes. We remark that the absolute performances of our codes might be further improved by reducing the library overhead due to, e.g., runtime checking and exception handling, and by using more engineered implementations of data structures (e.g., better heaps). In the code development process, we used the gprof profiling tool to identify hot spots in code and the Cachegrind system to analyze cache effects, tuning the code accordingly. We also used the experimental results to identify a good setting of the relevant parameters of 370 Algorithm S-DIJ S-LSP D-RRL D-KIN D-LHP Reference [8, 11] This paper [25] [18] [7] Up date Time O(mn + n2 log n) O(|LS P | + n2 log n), |LS P | mn O(mn + n2 log n) O (n2.5 C ), C =max weight O (n2 ) Query Time O(1) O(1) O(1) O(1) O(1) Space O(n2 ) O(n2 ) O(n2 ) O (n2.5 C ) O (mn) Table 1: Algorithms under investigation. each implementation. The source code of our implementations is distributed under the terms of the GNU Lesser General Public License and is available at the URL http://www.dis.uniroma1.it/~demetres/experim/dsp/. We used Dijkstra's single-source algorithm [8, 11] repeated from every node as a static reference code (S-DIJ) to evaluate the performances of the other algorithms in a dynamic setting. Algorithms D-RRL, D-KIN, D-LHP, and S-LSP are described below. 3.1 The Algorithm by Ramalingam and Reps (D-RRL). The algorithm by Ramalingam and Reps [25] works on a directed graph G with strictly positive realvalued edge weights and maintains information about the shortest paths from a given node s. In particular, it maintains a directed acyclic subgraph of G containing all the edges that belong to at least one shortest path from s. To support an edge weight update, the algorithm first computes the subset of nodes whose distance from s is affected by the update. Distances are then updated by running a Dijkstra-like procedure on those nodes. Maintaining single-source shortest paths from s requires O(ma + na log na ) per update, where na is the number of nodes affected by the update and ma is the number of edges having at least one affected endpoint. This yields a O(mn + n2 log n) time bound in the worst case for dynamic all pairs shortest paths. Our implementation. The algorithm by Ramalingam and Reps is known to be very fast in practice (see, e.g., [5, 10, 12]). In our implementation, we considered a further simplified version of the original algorithm, which was previously described in [5]. This lighter version, which we refer to as D-RRL, maintains a shortest paths tree instead of a directed acyclic subgraph, and does not spend time in identifying nodes that do not change distance from the source after an update. If the graph has only a few different paths having the same weight (as in the real-world inputs we considered), this variant can be much faster (see [5]) than the original algorithm in [25]. 3.2 The Algorithm by King (D-KIN). The dynamic shortest paths algorithm by King [18] works on directed graphs with small integer weights. The main idea behind the algorithm is to maintain dynamically all pairs shortest paths up to a distance d, and to recompute longer shortest paths from scratch at each update by stitching together shortest paths of length d. To maintain shortest paths up to distance d, the algorithm keeps a pair of in/out shortest paths trees I N (v ) and OU T (v ) of depth d rooted at each node v . Trees I N (v ) and OU T (v ) are maintained with a variant of the decremental data structure by Even and Shiloach [9]. It is easy to prove that, if the distance dxy between any pair of nodes x and y is at most d, then dxy is equal to the minimum of dxv + dvy over all nodes v such that x I N (v ) and y OU T (v ). To support updates, insertions/decreases of edges around a node v are handled by rebuilding only I N (v ) and OU T (v ), while edge deletions/increases are performed via operations on any trees that contain them. The amortized cost of such updates is O(n2 d) per operation. To maintain shortest paths longer than d, the algorithm exploits the following property (see, e.g., [15]): if H is a random subset of ((C n log n)/d) nodes in the graph, then the probability of finding more than d consecutive nodes in a path, none of which are from H , is very small. Thus, if we look at nodes in H as "hubs", then any shortest path from x to y of length d can be obtained by stitching together shortest subpaths of length d that first go from x to a node in H , then jump between nodes in H , and eventually reach y from a node in H . This can be done by first computing shortest paths only between nodes in H using any static allpairs shortest paths algorithm, and then by extending them at both endpoints with shortest paths of length d to reach all other nodes. This stitching operations requires O(n2 |H |) O((C n3 log n)/d) time. = Choosing d = C n log n yields an O(n2.5 C log n ) amortized update time. Since H can be also computed deterministically, the algorithm can be derandomized. While the original algorithm in [18] requires O(n3 ) space, in [19] King and Thorup showed how to reduce space to O (n2.5 C ). The interested reader can find the low-level details of the algorithm in [18, 19]. Our implementation. Following the techniques for 371 Experiments for increasing max weight C (rnd, 500 nodes, 1500 edges) 18 16 D-KIN 14 Time per update (sec) Experiments for increasing max weight C (rnd, 500 nodes, 1500 edges) 400 350 D-KIN 300 Space (MB) 250 200 150 12 10 8 6 4 2 0 0 10 20 30 40 50 60 Maximum edge weight C 70 80 90 100 100 50 0 0 10 20 30 40 50 60 Maximum edge weight C 70 80 90 100 Figure 1: Algorithm D-KIN: dependence of the time per update and space on the maximum edge weight C . The experiment was done on an AMD Athlon, 2.1 GHz, 256KB L2 cache, 768MB RAM. The up date sequence contained 100 evenly mixed op erations on a random graph with 500 nodes and 1500 edges. space reduction [19], we implemented a simpler randomized version of the algorithm by King, which we refer to as D-KIN. The code is divided into three modular building blocks: (i) an increase-only data structure for maintaining single-source shortest paths up to distance d; (ii) a forest of in/out trees for maintaining all pairs shortest paths up to distance d under fully dynamic update sequences; (iii) a data structure that maintains a forest of in/out trees and performs stitching to rebuild shortest paths longer than d. Since the maximum edge weight C is involved in the theoretical bounds, we conducted some experiments aimed at evaluating the effect of increasing C on the update time and space. For instance, the experiment considered in Figure 1 shows that going from a random graph with 500 nodes, 1500 edges, and maximum weight C = 10 to a graph with the same size and C = 100 can degradate performances by a factor of 10, while requiring three times as much space. Note that, while the space is growing as predicted by the theoretical analysis, the time bound appears to grow linearly in the giv range of C : this is most probably due to the en O(n2.5 C log n ) bound not being tight for small values of C . Since C can be as high as 20, 000 in Internet graphs and as high as 200, 000 in US road networks, we could not run D-KIN on these real-world inputs. Profiling information revealed that with the stan dard setting |H | = d = C log n, typically more than 75% of the update time is required by the path stitching procedure. We thus expected a reduction of |H | to yield substantial benefits in the running time, and this was fully confirmed by our experiments. As an example, Figure 2 (a) shows the result of an experiment on a random graph with 500 nodes, 1500 edges, and weights in [0, 5], where we set |H | = C n log n and d |H| n |H| C n log n d (a) alpha time (sec) space (MB) avg. error 0.00 2.59 135 0 0.01 1.41 181 0 0.02 1.11 227 0 0.03 1.01 274 0 0.04 0.89 320 0 0.05 0.85 367 0 C n log n =0 C n log n =1 d Cn |H| n C n log n |H| d C n log n =1 =0 C n log n d Cn (b) beta time (sec) space (MB) 1.0 2.59 135 0.9 2.06 123 0.8 1.58 112 0.7 1.51 101 0.6 1.15 90 0.5 0.98 79 0.4 0.94 67 0.3 0.57 54 avg. error 0 0 0 0 0 3.431 4.468 6.904 Figure 2: Algorithm D-KIN: parameter tuning with and . The exp eriment was done on an AMD Athlon, 1.8 GHz, 256KB L2 cache, 512MB RAM. The up date sequence contained 100 evenly mixed op erations. d = (1 - ) C n log n + · C n, for [0, 0.05]. The experiment showed that going from = 0 to = 0.05, which corresponds to increasing d and decreasing |H | by 2.5×, can yield a speedup of 3× at the price of a space increase of 2.7×. For larger values of the time improvements were negligible, while space kept on increasing substantially. As predicted by the long paths property of [15], with this set of parameters we found no errors in the maintained distances. Another interesting question was how far we can decrease both |H | and d without incurring errors. To th is aim, we ran an experiment with |H | = d = · C n log n, for (0, 1] on a random graph with 500 nodes, 1500 edges, and weights in [0, 5]. We found out that we can nearly halve both |H | and d (i.e., = 0.6) 372 Experiment for increasing smoothing threshold (Utah road network) 28 26 D-LHP Time per update (msec) 24 full smoothing 22 Space (MB) 20 18 16 22 24 Experiment for increasing smoothing threshold (Utah road network) no smoothing D-LHP 20 18 16 full smoothing 14 14 no smoothing 12 10 0 0.1 0.2 0.3 0.4 0.5 0.6 Smoothing threshold 0.7 0.8 0.9 1 12 10 0 0.1 0.2 0.3 0.4 0.5 0.6 Smoothing threshold 0.7 0.8 0.9 1 Figure 3: Algorithm D-LHP: effects of varying the degree of smoothing on the Utah road network initialized from an empty graph via edge insertions in order to start with many historical shortest paths. The exp eriment was done on an Intel Xeon 500MHz, 512KB L2 cache, 512MB RAM. The up date sequence contained 1, 000 evenly mixed op erations (after the initialization). without incurring distance errors, with substantial time and space improvements (see Figure 2 (b)). For smaller values of , we started getting errors. In general, we could see that the larger the number of nodes or edges in the graph, the smaller were the values of for which errors started to appear. Since space was crucial to experiment with reasonably large graphs, we preferred to tune rather than in the D-KIN code used in our experiments. 3.3 The Algorithm by Demetrescu and Italiano (D-LHP). The dynamic shortest path algorithm in [7] works on directed graphs with nonnegative real-valued edge weights and hinges on the notion of local ly shortest paths (LSP): we say that a path is locally shortest if every proper subpath of is a shortest path (note that is not necessarily a shortest path). A historical shortest path is a path that has been a shortest path at some point during the sequence of updates, and none of its edges has been updated since then. We further say that a path in a graph is local ly historical if every proper subpath of is a historical shortest path. The main idea behind the algorithm is to maintain dynamically the set of locally historical paths (LHP), which include locally shortest paths and shortest paths as special cases. The following theorem from [7] bounds the number of paths that become locally historical after each update: Theorem 3.1. Let G be a graph subject to a sequence of update operations. If at any time throughout the sequence of updates there are at most z historical shortest paths between each pair of nodes, then the amortized number of paths that become local ly historical at each update is O(z n2 ). To keep changes in locally historical paths small, it is then desirable to have as few historical shortest paths as possible throughout the sequence of updates. To do so, the algorithm transforms on the fly the input update sequence into a slightly longer equivalent sequence that generates only a few historical shortest paths. In particular, it uses a simple smoothing strategy that, given any update sequence of length k , produces an operationally equivalent sequence F () of length O(k log k ) that yields only O(log k ) historical shortest paths between each pair of nodes in the graph (see [7]). This technique implies that only O(n2 log k ) paths become locally historical at each update in the smoothed sequence F (). To support an edge weight update operation, the algorithm works in two phases. It first removes all maintained paths that contain the updated edge. Then it runs a dynamic modification of Dijkstra's algorithm [8] in parallel from all nodes: at each step a shortest path with minimum weight is extracted from a priority queue and it is combined with existing historical shortest paths to form new locally historical paths. The update algorithm spends O(log n) time for each of the O(z n2 ) new locally historical paths. Since the smoothing strategy lets z = O(log n) and increases the length of the sequence of updates by an additional O(log n) factor, this yields O(n2 log3 n) amortized time per update. Even with smoothing, there can be as many as O(mn log n) locally historical paths in a graph: this implies that the space required by the algorithm is O(mn log n) in the worst case. We refer the interested reader to [7] for the low-level details of the method. Our implementation. Our implementation (D-LHP) followed closely the theoretical algorithm in [7], with one 373 Locally shortest paths in random graphs (500 nodes) 120 3.5 Locally shortest paths in US road networks average node degree #lsp per pair of nodes 3.0 2.5 2.0 100 average node degree #lsp per pair of nodes 80 96 60 48 40 24 20 3 0 0 6 12 2.1 5 10 2.8 15 4.9 20 25 30 Number of edges (x 1000) 35 40 45 6.2 50 0.0 DE NV NH ME AZ ID MT ND CT NM MA NJ LA CO MD NE MS IN AR KS KY MI IA AL MN MO CA NC 1.5 1.0 0.5 1.11.7 US states Figure 4: Counting locally shortest paths in random and real-world graphs. additional heuristic: we stopped performing smoothing whenever the ratio between the number of created LHPs and deleted LHPs exceeded a certain smoothing threshold. In particular, when the smoothing threshold is 0, no smoothing is in place, while a smoothing threshold equal to 1 corresponds to full smoothing. We expected that the main ob jective of smoothing was to ensure the theoretical worst-case bounds, but we were not fully convinced of its practical significance. This was confirmed by our experiments: indeed smoothing did not appear to be of any benefit in random and realworld inputs, as they seemed to contain no pathological instances. To assess the effects of smoothing, we performed another experiment by changing the way the data structure initialization was done. Namely, rather than starting from an initial data structure having only one (historical) shortest path per pair, we built the graph via edge insertions, in order to force artificially the data structure to contain initially a very high number of historical shortest paths. In this scenario, our experiments showed that the main positive effect of smoothing was to reduce the space usage, but this always came at the price of a running time overhead. For example, Figure 3 illustrates the effects of smoothing on the Utah road network (269 nodes and 742 edges) initialized from an empty graph via edge insertions. This suggests that a certain degree of smoothing can be useful only when there can be a high number of historical shortest paths in the data structure. Since this possibility did not appear frequently in our experiments, we used a D-LHP code with smoothing threshold set to 0 (i.e., no smoothing). Experiment for increasing edge density (rnd, 500 nodes, Intel Xeon) 16 S-DIJ 14 12 10 Time (sec) 8 6 4 2 0 0 5 10 15 20 25 30 Number of edges (x 1000) 35 40 45 50 S-LSP S-DIJ S-LSP Figure 5: S-LSP versus S-DIJ on a random graph with 500 nodes, increasing densities, and edge weights in the range [1, 1000]. The exp eriment was done on an Intel Xeon 500MHz, 512KB L2 cache, 512MB RAM. The up date sequence contained 1, 000 evenly mixed op erations. investigate experimentally was related to the number of locally shortest paths in a graph. In the worst case, we know that they can be as many as O(mn). But how many of them can we have in "typical" instances? Our experiments showed that in both random and real-world graphs they tend to be very close to n2 , i.e., there is typically only one locally shortest path between any pair of nodes, which is also a shortest path (see Figure 4). This suggested that locally shortest paths could be exploited also for static all pairs shortest path algorithms. In particular, we investigated how to deploy locally shortest paths in Dijkstra-like algorithms in order to reduce substantially the number of total 3.4 A New Static Algorithm Based on Lo cally edge scans, which is known to be the performance Shortest Paths (S-LSP). Another interesting issue to bottleneck for shortest path implementations on dense graphs. We designed a static algorithm, which we refer 374 Experiments for increasing edge density (rnd, 500 nodes, Intel Xeon) 100,000 S-DIJ 10,000 Time per update (msec) S-LSP 1,000 D-KIN 100 S-DIJ S-LSP D-KIN D-RRL D-LHP D-RRL Experiments for increasing edge density (rnd, 500 nodes) 180 160 140 120 Space (MB) 100 D-KIN 80 60 40 D-RRL 20 D-KIN D-LHP D-RRL D-LHP 10 1 0 5 10 15 20 25 30 35 Number of edges (x 1000) 40 D-LHP 45 50 0 0 5 10 15 20 25 30 35 Number of edges (x 1000) 40 45 50 Figure 6: Overall comparison on random graphs with 500 nodes, increasing densities, and edge weights in the range [1, 1000]. D-KIN was run with = 0.6, and integer weights in the smaller range [1, 5]. The exp eriment was done on an Intel Xeon 500MHz, 512KB L2 cache, 512MB RAM. The up date sequence contained 1, 000 evenly mixed op erations and running times are rep orted on a logarithmic scale. to as S-LSP, that essentially runs Dijkstra's algorithm in parallel from all nodes and scans only edges in locally shortest paths to reduce the overall work. The running time of S-LSP is O(|LS P | + n2 log n), where |LS P | is the number of locally shortest paths in the graph. This can yield substantial time savings whenever |LS P | << mn. For instance, in Figure 5 we report the results of an experiment comparing S-LSP and Dijkstra's algorithm (S-DIJ) on random graphs with 500 nodes and increasing edge density: as it can be seen, while on sparse graphs the data structure overhead in the algorithms is significant, as the graph becomes denser and edge scanning becomes more relevant in the running times of the algorithms, S-LSP can be substantially faster than S-DIJ. Note that S-LSP follows a similar approach to the Hidden Paths Algorithm by Karger et al. [17]: their algorithm runs in O(m n + n2 log n) time, where m is the number of edges that participate in shortest paths. By the optimal-substructure property of locally shortest paths, it is easy to see that |LS P | m n mn. We remark that, in case of unweighted graphs, m = m, while |LS P | might be much smaller than mn. 4 Discussion Random inputs. Our experiments with random graphs pointed out that D-LHP and D-RRL are the fastest implementations: in this scenario they can be faster than static algorithms (S-DIJ and S-LSP) by two to four orders of magnitude, depending on the inputs. Algorithm D-KIN was only at most one order of magnitude faster than the static algorithms: this seems to be mainly due to the high overhead caused by maintaining the forest of in/out trees and by stitching paths in the data structure. For sparse random graphs, the running times of D-RRL and D-LHP are very close, and the underlying computing platform plays a role in deciding which one is to be used (see later for a discussion on this). As the graph density increases, and consequently the computational savings of locally shortest paths become more significant (as illustrated also in Figure 4), D-LHP becomes the most efficient choice on all the platforms we considered. This can be seen in Figure 6, which depicts the result of an experiment on random graphs with 500 nodes, increasing edge densities, and edge weights in the range [1, 1000]. Only the experiment on D-KIN was done with integer weights in the smaller range [1, 5], to avoid the performance degradation of this algorithm for large values of C . Figure 6 reports also the space usage of the dynamic implementations: D-RRL uses simple data structures, retains little information throughout the sequence of updates, and thus can save space with respect to D-KIN and D-LHP. Note that, as correctly predicted by theory, the amount of memory used by D-RRL (O(n2 )) and by D-KIN (O(n2.5 C )) does not depend substantially upon the graph density, while the space requirement of D-LHP (O(|LH P | + n2 )) depends on the number of locally historical paths and thus varies with the graph and with its density. Real-world inputs. The same general trend can be observed also in our experiments with real-word graphs: in particular, D-RRL and D-LHP are much faster than the other implementations, and their running times may get very close, so that their relative performance depends on the underlying computing platform. In particular, Figure 7 (a) shows the results of an experiment on road networks run on an IBM Power 4 (1.4MB L2 cache, 32MB 375 Experiments on US road networks (IBM Power 4) 45 40 35 Update time (msec) Update time (msec) D-LHP D-RRL Experiments on US road networks (Sun UltraSPARC IIi) 160 140 120 100 D-RRL 80 60 40 20 0 100 D-LHP D-RRL D-LHP D-RRL 30 25 20 15 10 5 0 100 D-LHP 200 300 400 500 600 700 Number of nodes 800 900 1000 200 300 400 500 600 Number of nodes 700 800 900 1000 (a) Experiments on US road networks 160 140 120 D-LHP D-RRL D-LHP (b ) Experiments on AS networks (IBM Power 4) 60 50 D-LHP D-RRL Update time (msec) 40 Space (MB) 100 D-RRL 80 60 40 30 D-RRL 20 D-LHP 10 20 0 100 0 200 300 400 500 600 Number of nodes 700 800 900 1000 0 500 1000 1500 2000 2500 Number of nodes AS Network 3000 3500 (c) (d ) Figure 7: D-LHP versus D-RRL in real-world graphs under sequences of 1, 000 evenly mixed updates. (a), (b) Update times of D-LHP and D-RRL on the US road networks. (c) Space usage of D-LHP and D-RRL on the US road networks. (d) Up date times of D-LHP and D-RRL on the Internet networks. L3 cache, 64GB RAM), while Figure 7 (b) shows the results of the same experiment run on an UltraSPARC IIi (2MB L2 cache, 512MB RAM). In both cases, we have that the larger is the graph, the more D-RRL is likely to become faster than D-LHP. However, with a better memory system (e.g., on the IBM Power 4) D-LHP tends to remain competitive on larger instances. Our experiments on Internet graphs (see, e.g., Figure 7 (d)) confirmed this trend. Figure 7 (c) shows the space usage of the two implementations on the road networks. Figure 8 (a) plots the relative time performance of the two implementations (i.e., running time of D-RRL over running time of D-LHP) on the US road networks measured on platforms with different cache sizes: as it can be clearly seen from this figure, the speed-up of D-LHP over D-RRL tends to increase with increasing cache sizes. To investigate more this phenomenon, we performed an extensive simulation of cache miss ratios with the Cachegrind tool [26], and the simulation results on the Colorado road network are reported in Figure 8 (b). The time ratio measured on different platforms seems to follow the same general trend as the simulated cache miss ratio, which indeed suggests that the relative performance of D-RRL and D-LHP on different machines depends on their different cache usage. As a possible explanation for this, we observe that D-RRL, besides requiring less space than D-LHP, tends to have a more regular pattern of access in its data, since it recomputes single-source shortest paths starting repeatedly from every node in the graph, as opposed to D-LHP, which updates all shortest paths in parallel, and thus needs to access more global data structures. Consequently, D-RRL seems to suffer less from reduced cache sizes. Bottleneck inputs. Bottleneck inputs force a pathological sequence of updates, changing the weight of many shortest paths and causing the algorithm to rescan a constant fraction of the graph during each update. 376 Experiments on US road networks using different platforms 2.4 2.2 Relative time performance D-RLL/D-LHP 2.0 1.8 1.6 1.4 1.2 1.0 0.8 Sun UltraSPARC IIi NH DE NV AZ MT ME ID CT ND MA NM NJ LA MS IN KS IA CO MD NE AR KY MN MI AL MO IBM Power 4 NC D-LHP faster than D-RRL 2.0 1.8 Cache effects on the Colorado road network 11.12 IBM Power 4 (32MB L3 cache) Sun UltraSPARC IIi (2MB L2 cache) AMD Athlon (256KB L2 cache) Simulated cache miss ratio D-RRL/D-LHP Performance ratio D-RRL/D-LHP on real architectures 1.6 1.4 1.30 1.83 Power 4 1.59 1.41 CA OH 1.2 1.17 1.0 Athlon Xeon 0.87 1.00 0.92 UltraSPARC IIi 0.84 0.8 0.6 0.69 0.75 0.71 0.6 D-LHP slower than D-RRL 0.4 1 2 3 4 5 6 Number of edges (x 100) 7 AMD Athlon 8 9 10 128KB 256KB 512KB 1MB 2MB 4MB Cache size 8MB 16MB 32MB (a) (b ) Figure 8: Studying cache effects. (a) Performance ratio D-RRL/D-LHP on the US road networks using architectures with different cache sizes. (b) Simulated cache miss ratio D-RRL/D-LHP on the Colorado road network. Figure 9 shows the result of an experiment with bottleneck graphs of different densities. As it can be seen from this figure, D-RRL is hit pretty badly by those inputs, and becomes even slower than the static implementation of Dijkstra (S-DIJ). This reflects the fact that the worst-case update time of the algorithm by Ramalingam and Reps is asymptotically the same as the static algorithm, and thus the performance of D-RRL can be quite bad on hard instances. On the other side, in these inputs D-LHP is faster than either D-RRL or S-DIJ, but becomes comparable to its static version S-LSP. This can be explained by noting that for those inputs most of the computation savings of D-LHP come from locally shortest paths, which are O(n2 ) in this case. Since bottleneck inputs force scanning of a great portion of the graph, D-LHP and S-LSP end up performing similar tasks on those test sets. 5 Concluding Remarks In this paper we have implemented, engineered and evaluated experimentally three different dynamic shortest path algorithms: D-RRL [25], D-KIN [18] and D-LHP [7]. In our experiments, implementing a dynamic algorithm seemed really worth the effort, as all the dynamic implementations can be much faster than recomputing a solution from scratch with a static algorithm: D-RRL and D-LHP can be up to 10,000 times faster than a repeated application of a static algorithm, while D-KIN can be around 10 times faster than a static algorithm. The algorithm by Ramalingam and Reps (D-RRL) is basically a variant of Dijkstra's algorithm, and works only on the portion of the graph that is changing throughout updates. Since it uses simple data structures, it seems very hard to beat in situations where the updates produce a very small change in the solution. However, its worst-case running time is asympExperiments on bottleneck graphs (500 nodes, IBM Power 4) totically the same as Dijkstra's algorithm, and thus it 2.0 can be quite bad in pathological worst-case instances. 1.8 S-DIJ D-RRL The algorithm of Demetrescu and Italiano (D-LHP) uses S-LSP 1.6 D-RRL more sophisticated data structures than D-RRL, but it D-LHP 1.4 still works only on the portion of the graph that is modi1.2 fied by the updates. It can be as fast as D-RRL on sparse S-DIJ 1.0 graphs, and it becomes substantially faster than D-RRL 0.8 on dense graphs and on worst-case inputs, where locally 0.6 shortest paths seem to gain better payoffs. The main 0.4 difference in performance with D-RRL on sparse graphs D-LHP 0.2 seems related to memory issues, i.e., to the algorithms' S-LSP 0 space usage and pattern access on data: in particular, 0 5 10 15 20 25 30 35 Number of edges (x 1000) (1) D-RRL is likely to become faster than D-LHP as the number of nodes increases; and (2) in our experiments, Figure 9: Experiment on bottleneck graphs of different sizes. This exp eriment was done on an IBM Power 4, 1.4MB D-LHP ran faster on platforms with a go o d memory hiL2 cache, 32MB L3 cache, 64GB RAM. The up date sequence erarchy system (i.e., cache and/or memory bandwidth), while D-RRL seemed preferable on platforms with small contained 1, 000 evenly mixed op erations. Time per update (sec) 377 cache and/or small memory bandwidth. Overall, D-LHP revealed to be the most robust implementation on different inputs among the ones we tested. As a side effect, we derived from D-LHP a new static algorithm, S-LSP, which can run faster than Dijkstra's algorithm on dense graphs, since in practice it reduces substantially the total number of edges scanned. One issue that seems to deserve further theoretical and empirical study is memory usage. To answer queries fast, all the algorithms maintain explicitly the all pairs shortest paths matrix. This makes them hit very soon a "memory wall" in today's computing platforms, thus limiting substantially the maximum problem size that can be solved in practice. Just to make an example, if an implementation requires around 100 bytes per each pair of nodes in the graph (for instance the space usage of D-LHP is roughly 80 bytes per locally historical path, plus 24 bytes per pair of nodes) on a memory system with 10 GB of RAM we can only solve instances of up to 10,000 nodes without incurring memory swap problems. These were indeed the larger graphs that we were able to consider in our computational study. Acknowledgments. We are indebted to the anonymous referees for many useful comments. We deeply acknowledge the generous hospitality of Federico Massaioli and CASPUR (Consorzio interuniversitario per le Applicazioni di Supercalcolo Per Universita e Ricerca), ` which let us run experiments on some of their machines. References [1] D. Alb erts, G. Cattaneo, and G.F. Italiano. An empirical study of dynamic graph algorithms. ACM Journal on Experimental Algorithmics, 2(5), 1997. [2] G. Amato, G. Cattaneo, and G.F. Italiano. Exp erimental analysis of dynamic minimum spanning tree algorithms. In Proc. SODA'97, pages 314­323, 1997. [3] Cattaneo, G. and Faruolo, P. and Ferraro-Petrillo, U. and Italiano, G. F. Maintaining dynamic minimum spanning trees: An exp erimental study. In Proc. ALENEX'02, 2002. [4] B.V. Cherkassky, A.V. Goldb erg, and T. Radzik. Shortest paths algorithms: Theory and exp erimental evaluation. Math. Programming, 73:129­174, 1996. [5] C. Demetrescu, D. Frigioni, A. Marchetti-Spaccamela, and U. Nanni. Maintaining shortest paths in digraphs with arbitrary arc weights: An exp erimental study. In Proceedings WAE'00, 2000. [6] C. Demetrescu and G.F. Italiano. Fully dynamic all pairs shortest paths with real edge weights. In Proc. FOCS'01, pages 260­267, 2001. [7] C. Demetrescu and G.F. Italiano. A new approach to dynamic all pairs shortest paths. In Proceedings of STOC'03, pages 159­166, 2003. [8] E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Math., 1:269­271, 1959. [9] S. Even and Y. Shiloach. An on-line edge-deletion problem. Journal of the ACM, 28:1­4, 1981. [10] B. Fortz and M. Thorup. Internet traffic engineering by optimizing OSPF weights. In Proceedings INFOCOM'00, pages 519­528, 2000. [11] M.L. Fredman and R.E. Tarjan. Fib onacci heaps and their use in improved network optimization algorithms. Journal of the ACM, 34:596­615, 1987. [12] D. Frigioni, M. Ioffreda, U. Nanni, and G. Pasqualone. Analysis of dynamic algorithms for the single source shortest path problem. ACM JEA, 3, 1998. [13] D. Frigioni, T. Miller, U. Nanni, G. Pasqualone, G. Shaefer, and C.D. Zaroliagis. An exp erimental study of dynamic algorithms for directed graphs. In Proc. ESA'98, pages 320­331, 1998. [14] A.V. Goldb erg. Shortest path algorithms: Engineering asp ects. In Proc. ISAAC'01, 2001. [15] D. H. Greene and D.E. Knuth. Mathematics for the analysis of algorithms. Birkh¨user, 1982. a [16] R. Iyer, D. R. Karger, H. S. Rahul, and M. Thorup. An exp erimental study of p oly-logarithmic fully-dynamic connectivity algorithms. In B.E. Moret and A.V. Goldb erg, editors, Proc. ALENEX'00, 2000. [17] D. Karger, D. Koller, and S.J. Phillips. Finding the hidden path: Time b ounds for all-pairs shortest paths. SIAM Journal on Computing, 22(6):1199­1217, 1993. [18] V. King. Fully dynamic algorithms for maintaining allpairs shortest paths and transitive closure in digraphs. In Proc. FOCS'99, pages 81­99, 1999. [19] V. King and M. Thorup. A space saving trick for directed dynamic transitive closure and shortest path algorithms. In Proc. COCOON'01, pp. 268­277, 2001. [20] P. Loubal. A network evaluation procedure. Highway Research Record 205, pages 96­109, 1967. [21] J. Murchland. The effect of increasing or decreasing the length of a single arc on all shortest distances in a graph. Technical rep ort, LBS-TNT-26, London Business School, Transp ort Network Theory Unit, 1967. [22] Paolo Narvaez, Kai-Yeung Siu, and Hong-Yi Tzeng. New dynamic algorithms for shortest path tree computation. IEEE/ACM Transactions on Networking, 8:734­746, 2000. [23] Paolo Narvaez, Kai-Yeung Siu, and Hong-Yi Tzeng. New dynamic SPT algorithm based on a ball-andstring model. IEEE/ACM Transactions on Networking, 9:706­718, 2001. [24] G. Ramalingam. Bounded incremental computation. In Lecture Notes in Computer Science 1089, 1996. [25] G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest path problem. Journal of Algorithms, 21:267­305, 1996. [26] J. Seward and N. Nethercote. Valgrind, an op ensource memory debugger for x86-gnu/linux. URL: http://developer.kde.org/~sewardj/. [27] F. Zhan and C. Noon. Shortest path algorithms: An evaluation using real road networks. Transportation Science, 32(1):65­73, 1998. 378