Meldable RAM priority queues and minimum directed spanning trees Ran Mendelson Abstract Mikkel Thorup Uri Zwick very sparse graphs, on the O(m + n log n) running time of an algorithm by Gabow, Galil, Spencer and Tarjan We consider the implementation of meldable priority that works for arbitrary edge weights. queues with integer keys in the RAM model. We present two new general techniques for transforming 1 Intro duction non-meldable priority queues into meldable ones. These The problem of finding a minimum (or maximum) spantransformations can be described symbolically as: ning tree in an undirected graph is an extremely well studied problem. Chazelle [Cha00] obtained a deternon-meldable priority queue + ministic O(m(m, n)) time algorithm for the problem. union-find meldable priority queue Karger et al. [KKT95] obtained a randomized algorithm that runs, with very high probability, in O(m + n) time. non-meldable priority queue + Both these algorithms are comparison based and can slow meldable priority queue handle arbitrary weights. Fredman and Willard [FW94] faster meldable priority queue obtained a deterministic O(m + n) time algorithm for Using the first transformation to combine a recent the problem in the word RAM model. The directed version of the minimum spanning tree non-meldable RAM priority queue of Thorup with the problem received much less attention in recent years. classical union-find data structure we obtain a meldable RAM priority queue with an amortized cost of This version comes in two, or in fact three, variants. We O(log log n · (n)) per operation, where (n) = (n, n) are either given a root r and asked to find a directed is the inverse Ackermann function. Using instead a ran- spanning tree of minimum weight rooted at r, or we domized priority queue of Han d Thorup we obtain an are asked to find a directed spanning tree of minimum an expected amortized cost of O( log log n · (n)) per op- weight rooted at an arbitrary vertex. (It is assumed, in eration. The second transformation yields slower meld- both cases, that the desired directed spanning trees do able priority queues, but the obtained queues can sup- exist.) A very closely related problem is the problem port the insert, find-min and decrease-key operations in of finding an optimum branching , i.e., a branching of constant time. In particular, by combining a random- maximum total weight. A branching B in a directed ized "atomic-heap" of Thorup with, e.g., the classical graph is a collection of edges that satisfies the following Fibonacci heaps of Fredman and Tarjan, we obtain, for two properties: (i) B does not contain a cycle; (ii) every fixed > 0, a meldable priority queue with an ex- No two edges of B are directed into the same vertex. pected amortized cost of O(1) for each insert, find-min It is not difficult to show that these three versions and decrease-key operation, and an expected amortized are essentially equivalent. We refer to the three of cost of O((log n)1/2+ ) for each delete or meld operation. them collectively as the minimum directed spanning tree (MDST) problem. All results stated in this paper apply Using the meldable priority queues of the first type, to all three versions. we obtain improved algorithms for finding minimum Chu and Liu [CL65], Edmonds [Edm67] and Bock directed spanning trees in graphs with integer edge [Boc71] independently obtained an essentially identical weights: a deterministic O(m · log log n · (n)) time polynomial time algorithm for the MDST problem. (In algorithm and a randomized O(m · log log n · (n)) the sequel we refer to this paper, unfairly, as Edmonds' expected time algorithm. These bounds improve, for algorithm.) Karp [Kar71] gave a simple formulation of the algorithm and a direct combinatorial proof of its Scho ol of Computer Science, Tel-Aviv University, Tel-Aviv, correctness. All subsequent results, including ours, are 69978, Israel. E-Mail: {ranm,zwick}@cs.tau.ac.il AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ just more efficient implementations of variants of this algorithm using improved priority queue data structures. 07932, USA. E-mail: mthorup@research.att.com. Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 40 Priority queues are very basic data structures that are used by many algorithms. The most basic operations, supported by all priority queues, are insert, which inserts an element with an associated key into the priority queue, and extract-min, which returns the element with the smallest key currently contained in the queue, and deletes it. These two operations can be used, for example, to sort n elements, by performing n insert operations followed by n extract-min operations. Most priority queues also support a delete operation, that deletes a given element, not necessarily with the minimum key, from the queue, and find-min, which finds, but does not delete, an element with minimum key. Using the insert and delete operations we can easily implement a decrease-key operation, or more generally a change-key operation, that decreases, or arbitrarily changes, the key of a queue element. (We simply delete the element from the queue and re-insert it with its new key.) As the decrease-key operation is the bottleneck operation in efficient implementations of Dijkstra's singlesource shortest paths algorithm [Dij59], and Prim's algorithm [Pri57] for finding a minimum spanning tree, many priority queues support this operation directly, sometimes in constant time. An efficient implementation of Edmonds' [Edm67] algorithm for the MDST problem requires the maintenance of a collection of priority queues. In addition to the standard operations performed on individual priority queues in this collection, we also need, quite often, to meld, or unite, two priority queues from this collection. (We also need other priority queue operations to be mentioned later.) This provides a strong motivation for studying meldable priority queues. Fibonacci heaps, developed by Fredman and Tarjan [FT87], are very elegant and efficient meldable priority queues. They support delete operations in O(log n) amortized time, and all other operations in O(1) amortized time, where n is the size of the priority queue from which an element is deleted. (For a general discussion of amortized time bounds, see [Tar85].) Brodal [Bro96] obtained a much more complicated data structure that supports delete operations in O(log n) worstcase time, and all other operations in O(1) worst-case time. Both these data structures are comparison-based and can handle elements with arbitrary real keys. In this setting they are optimal. Tarjan [Tar77], helped by Camerini et al. [CFM79], obtained an O(min{m log n, n2 }) time implementation of Edmonds' algorithm using simple meldable priority queues that support all operations in O(log n) time. For reasons that will be explained in Section 5, using Fibonacci heaps to obtain improved algorithms for the MDST problem is a non-trivial task. Gabow et al. [GGS89] devised a complicated algorithm with running time of O(m log log logm/n+2 n + n log n). Gabow et al. [GGST86] simplified and improved this result by obtaining a simpler, though still quite intricate, algorithm with a running time of O(m + n log n). In this paper we obtain improved algorithms for the MDST problem with integer edge weights in the RAM model. Our improvements are obtained by constructing substantially more efficient meldable priority queues. The techniques that we use to obtain these meldable priority queues may be even more interesting than the concrete results that we obtain. We develop generic transformations that add a meld operation, at a very small extra cost, to priority queues that initially do not support this operation. The rest of this paper is organized as follows. In the next section we briefly discuss the RAM model of computation and the priority queues currently available under it. In Section 3 we describe our first transformation. In Section 4 we describe our second transformation. In Section 5 we present our improved algorithms for the MDST problem. We end in Section 6 with some concluding remarks and open problems. 2 Priority queues in the RAM mo del Our two transformations can be applied in almost any computational model. Their usefulness, however, manifests itself most clearly in the word RAM model. The word RAM model is a fairly natural model of computation. This is the standard RAM model of computation (see, e.g., [AHU74]), which allows random memory access, together with the explicit assumption that each word of memory contains w bits. The only assumption made on w is that w log n, where n is the input size. The set of basic word operations that can be performed in constant time are the standard word operations available in typical programming languages (e.g., C): addition, multiplication, bit-wise and/or operations, shifts, and their like. Each number appearing in the input problem, or encountered during the solution of the problem, e.g., the key associated with an element of a priority queue or the weight of an edge in a graph, is assumed to be an integer that fits into a single word of memory. This is precisely the model considered by Fredman and Willard [FW93, FW94]. Many of our time bounds depend only on the size of the problem, and not on the word size w. Some of them do depend, however, on w, or equivalently on C = 2w , the largest integer fitting into a single word. van Emde Boas et al. [van77, vKZ77] obtained a randomized priority queue that supports the basic operations in O(log log C ) time. This data structure requires, however, C words of memory. An alternative is 41 the Y -fast tries of Willard [Wil83] in conjunction with the dynamic perfect hashing methods of Dietzfelbinger et al. [DKM+ 94] which brings the space down to linear in n while supporting priority queue operations in amortized randomized time of O(log log C ) per operation. Raman [Ram97], improving results of Ahuja et al. [AMOT90] and Cherkassky et al. [CGS99], obtained a randomized priority queue that supports insert, find-min and decrease-key operations in O(1) time and delete operations in O((log C )1/4+ ) time, for any fixed > 0. Thorup [Tho03] improved all these results by giving a deterministic, linear space, priority queue that supports insert, find-min and decrease-key operations in O(1) time and delete operations in O(log log C ) time. All these priority queues do not support, however, the meld operation. Fredman and Willard [FW94] constructed a deterministic data structure, called AF-heaps, that supports insert, find-min and decrease-key operations in O(1) time and delete operations in O(log n/ log log n) time. Raman [Ram97] reduced the delete time to O( log n log log n), or to O((log n)1/3+ ), for any fixed > 0, using randomization. Thorup [Tho03] again improved all these results by giving a deterministic, linear space, priority queue that supports insert, find-min and decrease-key operations in O(1) time and delete operations in O(log log n) time. Again, these priority queues do not support the meld operation. Thorup [Tho00, Tho02] obtained a general equivalence between priority queues and sorting. More specifically, he showed that if n elements can be sorted in O(nf (n)) time, then the basic priority queue operations can be implemented in O(f (n)) time. Using the recent O(n log log n) sorting algorithm of Han [Han02], this gives priority queues that support all operations in O(log log n) time. (This is weaker than Thorup's [Tho03] result cited above, as the insert, find-min and decrease-key operations ar not supported in constant e time.) Using a recent O(n log log n) randomized sorting algorithm of Han and Thorup [HT02],his reduction t gives randomized priority queues with O( log log n) expected time per operation. The insert, find-min and decrease-key operations are again not supported directly by these priority queues. The meld operation is not supported at all by these two data structures. We obtain the first non-trivial meldable priority queues in the RAM model. Our priority queues are obtained by applying general transformations to existing data structures. Our first transformation is especially simple as it relies only on the classical union-find data structure. For example, by combining the priority queues of van Emde Boas et al. [van77, vKZ77] with the union-find data structure [Tar75], two data struc- tures that are available for almost 30 years, we obtain meldable priority queues that support all operations in O((n) log log C ) amortized time. (Purists may not like this time bound as it involves both n and C . Better combinations will actually appear in the full version of the paper. This simple bound is brought here to show that our transformations yield interesting and non-trivial results even when applied to relatively simple data structures.) By applying our first transformation on the O(log log n) time priority queues of Thorup [Tho03] (or the ones derived from the sorting algorithm of Han [Han02]), we obtain meldable priority queues that support all operations in O(log log n · (n)) amorzed time. ti By applying this transformation on the O( log log n) expected time priority queues derived from the randomized sorting algorithm of Han and Thorup [HT02], we obtain meldable priority queues that support all oper ations in O( log log n · (n)) expected amortized time. These priority queues are used by us in Section 5 to obtain our improved algorithms for the MDST problem. Using our second transformation to combine a randomized "atomic-heap" of Thorup [Tho00] with, e.g., the classical Fibonacci heaps of Fredman and Tarjan [FT87], we obtain, for every fixed > 0, a meldable priority queue with an expected amortized cost of O(1) for each insert, find-min and decrease-key operation, and an expected amortized cost of O((log n)1/2+ ) for each delete or meld operation. 3 The first transformation In this section we describe our first transformation which combines a non-meldable priority queue with the classical union-find data structure (see [Tar75], [Tar83], or [CLRS01]) to produce a meldable priority queue with only a slight increase in the operation cost. This transformation, which we denote by T1 , satisfies: Theorem 3.1. If P is a priority queue data structure that supports al l operations, excluding the meld operation, in O(f (n)) (expected) amortized time, then T1 (P ) is a priority queue data structure that supports al l operations, including the meld operation, in O(f (n)·(n)) (expected) amortized time, where (n) = (n, n) is the inverse Ackermann function. The rest of this section contains a description of transformation T1 and a proof of Theorem 3.1. We begin with a concise description of the classical unionfind data structure, as this description is needed for the description of our transformation. A union-find data structure is a data structure that supports the following operations: 42 make-set(x) : p[x] x rank[x] 0 union(x, y ) : link(find(x), find(y )) link(x, y ) : if rank[x] > rank[y ] then hang(y , x) else hang(x, y ) if rank[x] = rank[y ] then rank[y ] rank[y ] + 1 find(x) : if p[x] = x then hang(x, f ind(p[x])) return p[x] hang(x, y ) : p[x] y Figure 1: The classical union-find data structure make-set(x) ­ Create a set that contains the single element x. union(x, y ) ­ Unite the sets containing the elements x and y . find(x) ­ Return a representative element of the set containing the element x. A classical, simple, and extremely efficient implementation of a union-find data structure is given in Figure 1. Each element x has a parent pointer p[x] and a rank rank[x] associated with it. These parent pointers define trees that correspond to the sets of elements maintained by the data structure. The representative element of each set is taken to be the root of the tree containing the elements of the set. To find the representative element of a set, we simply follow the parent points until we get to a root. To speed-up future find operations, we employ the path compression heuristic that makes all the vertices encountered on the way to the root direct children of the root. Unions are implemented using the union by rank heuristic. The rank rank[x] associated with each element x is an upper bound on the depth of its subtree. For reasons that will become clear shortly, parent pointers are changed in the implementation given in Figure 1 only via a call to a separate procedure, called hang. In a seminal paper, Tarjan [Tar75] showed that the time taken by the algorithm of Figure 1 to process an intermixed sequence of m make-set, union and find operations, out of which n are make-set operations, in O(m(m, n)), where (m, n) is the extremely slowly growing inverse of Ackermann's function. Suppose that we have at our disposal a nonmeldable priority queue that supports the following operations: make-pq(x) ­ Create a priority queue that contains the single element x. insert(PQ, x) ­ Insert the element x into the priority queue PQ. delete(PQ, x) ­ Delete the element x from the priority queue PQ. find-min(PQ) ­ Find an element with the smallest key contained in the priority queue PQ. It is assumed, of course, that each element x has a key key [x] associated with it. We can easily add the following operation to the repertoire of the operations supported by this priority queue: change-key(PQ, x, k ) ­ Change the key associated with the element x in the priority queue PQ to k . This is done by deleting the element x from the priority queue PQ, changing its key by setting key [x] k , and then reinserting it into the priority queue. (As mentioned, some priority queues directly support operations like decrease-key . We shall not assume such capabilities in this section.) We combine this non-meldable priority queue with the union-find data structure to obtain a meldable priority queue that supports the following operations: MAKE-PQ(x) ­ Create a priority queue that contains the single element x. INSERT(x, y ) ­ Insert the element x into the priority queue containing element y . containing x. DELETE(x) ­ Delete the element x from the priority queue containing it. FIND-MIN(x) ­ Find an element with the smallest key in the priority queue MELD(x, y ) ­ Meld the priority queues containing the elements x and y . CHANGE-KEY(x, k ) ­ Change the key associated with the element x to k . Note that the operations above only take priority queue elements as arguments. They are not told explicitly to which priority queue these elements belong. As 43 MAKE-PQ(x) : make-set(x) PQ[x] make-pq(x) INSERT(x, y ) : MAKE-PQ(x) MELD(x, y ) DELETE(x) : CHANGE-KEY(x, +) FIND-MIN(x) : find-min(PQ[find(x)]) MELD(x, y ) : union(x, y ) CHANGE-KEY(x, k ) : change-key(PQ[x], x, k ) find(x) hang(x, y ) : z find-min(PQ[x]) if p[x] = x then delete(PQ[p[x]], z ) insert(PQ[y ], z ) p[x] y Figure 2: A meldable priority queue obtained by combining the union-find data structure with a non-meldable priority queue. explained in Kaplan et al. [KST02a], a meldable priority queue data structure that supports such operations must include, at least implicitly, an implementation of a union-find data structure. We construct our meldable priority queues by explicitly augmenting the classical union-find data structure with the ability to find elements with minimal keys. A collection of meldable priority queues is now maintained as follows. Each priority queue of the collection is maintained as a tree in the union-find data structure. Each element x contained in such a tree thus has a parent pointer p[x] assigned to it by the union-find data structure. In addition to that, each element x has a `local' priority queue PQ[x] associated with it. This priority queue contains the element x itself, and the minimal element of each subtree of x. (Thus if x has d children, PQ[x] contains d + 1 elements.) To find the minimal element in the priority queue of x, we simply need to get to the root y of the tree containing x (a find(x) operation), and then find the minimal element in the local priority queue of y (a find-min(P Q[y ]) operation). But, how do we keep the local priority queues up to date? Maintaining the local priority queues is, in fact, quite easy. For each pointer changed in the unionfind data structure, we need to perform at most two operations on local priority queues! Let us see why. When first considering an element x, we initialize its priority queue to contain x, and no other element. When the union-find data structure hangs an element x on an element y , by setting p[x] y , we simply need to insert the minimal element in PQ[x] into PQ[y ], an insert(PQ[y ], find-min(PQ[x])) operation. Consider now the effect of a find operation. Let x = x1 , x2 , . . . , xk be the sequence of elements encountered on the path from x to the root of its tree. As path compression is applied, the elements x1 , x2 , . . . , xk-1 are cut from their current parents and are made direct children of the root xk . For i = k - 1, k - 2, . . . , 1 we thus need to delete the minimal element of the subtree of xi from PQ[xi+1 ]. Finally, we need to insert these minimal elements into the local priority queue PQ[xk ] of the root xk . It is easy to check that all local priority queues would now be up to date. Changing the key of an element x is also not difficult. We first perform a find(x) operation. This makes x either a root or a child of the root. Changing the key of x then affects at most two priority queues and the required changes are easily made. (A moment's reflection shows that it is, in fact, enough just to change the key of x in PQ[x], and then perform a find(x) operation. The element x may temporarily be contained in some priority queues with a wrong key, but this does not cause any problems as it would immediately be deleted.) We still need to explain how we handle deletions. The simplest solution, which is sufficient for many applications, is simply to change the key of deleted elements to +. As we are only getting amortized bounds here in any case, we can physically remove the deleted elements from the data structure when, say, at least half of the elements currently in the data structure are deleted. This will affect the amortized cost of each operation by only a constant factor. (For more details see Kaplan et al. [KST02b].) A simple implementation of all these operations is given in Figure 2. Procedure hang(x, y ) there replaces the original version of that procedure found in Figure 1. This procedure is used by the union-find data structure to hang x on y . We augment this procedure with instructions to remove the minimal element in x's subtree, i.e., the minimal element in P Q[x], from the local priority queue of p[x], and then insert it to the priority queue of y , which is x's new parent. 44 Conveniently, this is the only change that need to be done to the union-find data structure! As for each pointer changed in the union-find data structure we perform at most two operations on local priority queues, the total amortized cost of performing an operation on a `global' priority queue is at most O(f (n) · (n)), proving Theorem 3.1. 4 The second transformation Our second transformation, which we denote by T2 , receives two priority queue data structures P and Q as arguments. It also receives a parameter B = B (n). Here, P is assumed to be a nonmeldable `atomic' priority queue data structure, i.e., a priority queue data structure that can handle all operations on priority queues of size at most B in constant time. On the other hand, Q is a meldable, but typically much slower, priority queue data structure. It is still supposed to support the insert, find-min and decrease-key operations in constant time, but may take much longer to support delete and meld operations. The obtained data structure T2 (P , Q, B ) supports insert, find-min and decrease-key operations in constant time and performs delete and meld operations much quicker than Q. More precisely: Theorem 4.1. If P is a priority queue data structure that supports al l operations, excluding the meld operation, on priority queues of size at most B in O(1) (expected) amortized time, Q is a meldable priority queue that supports the insert, find-min and decrease-key operations in O(1) (expected) amortized time, the delete operation in fdel (n) (expected) amortized time, and the meld operation in fmeld (n) (expected) amortized time, and B log n, then T2 (P , Q, B ) is a priority queue data structure that supports the insert, find-min and decrease-key operations in O(1) (expected) amorog n tized time, the delete operation in O( llog B + fdel (B )) (expected) amortized time, and the meld operation in og n O( llog B · fmeld (B )) (expected) amortized time. The construction of T2 (P , Q, B ) is modelled after the construction of AF-heaps by Fredman and Willard [FW94], with a few technical changes. The obtained priority queue supports the following operations: MAKE-PQ(x) ­ Create a priority queue that contains the single element x. INSERT(H, x) ­ Insert the element x into the priority queue H . DELETE(H, x) ­ Delete the element x from the priority queue H . FIND-MIN(H ) ­ Find an element with the smallest key in the priority queue H . MELD(H1 , H2 ) ­ Meld the priority queues H1 and H2 , generating a new priority queue (destroying H1 and H2 ). DEC-KEY(H, x, ) ­ Decrease the key of the element x of H by . Note that, as opposed to the previous section, the priority queue operations here assumed to be given the "name" of the priority queue acted upon. We give here a general description of the T2 (P , Q, B ) data structure. The amortized analysis of its complexity is fairly similar to the analysis of Fredman and Willard [FW94]. It is omitted due to lack of space. A more detailed description of the data structure, and its complexity analysis will appear in the full version of the paper. Each priority queue H of the T2 (P , Q, B ) data structure consists of a collection of trees. Each node v in one of these trees holds an element x = x[v ] contained in the priority queue. Each such element x has a key key [x] associated with it. The keys of the elements contained in each tree satisfy the heap order condition : if u is the parent of v , then key [x[u]] key [x[v ]]. The individual trees are structured so that each internal node has between B /2 and B children, where B = B (n) is the parameter supplied to the construction. The leaves of each tree are all at the same depth. Each node also keeps the height of its subtree, and the number of children it has. A tree with t nodes has height O(log t/ log B ), which is O(log n/ log B ), as t n. The collection of trees contains at most B - 1 trees of each given height. The trees of each height are placed in a common bucket and the number of trees in each bucket is maintained. Each non-leaf node v has an `atomic' priority queue P Q[v ] containing its children. These priority queues are implemented using the P data structure. For each bucket we maintain a meldable priority queue containing the roots of the trees contained in the bucket. The priority queues are implemented using the Q data structure. We shall refer to these priority queues as the bucket priority queues. Finally, the minimum elements of the non-empty bucket priority queues are kept in another meldable priority queue which we refer to as the global priority queue. The global priority queue is again implemented using the Q data structure. In addition to that we have a queue QU = QU[H ] that holds trees that are still to be inserted into the forest. After the completion of a priority queue operation, this queue is empty. (The exact role of this queue will become clear later.) To implement the priority queue operations listed above we introduce, following Fredman and Willard [FW94], the following auxiliary operations: 45 RIPPLE(H, v ) ­ Fill a `hole' created by the removal of the element associated with v in H . CONSOLIDATE(H, j ) ­ Merge B trees from the j -th bucket of H . PRUNE(H, v ) ­ Cut v and the subtrees hanging on it in H and move them to QU[H ]. EMPTY-QUEUE(H ) ­ Incorporate the trees found in QU[H ] into the main collection of trees. Note that the second argument of the RIPPLE and PRUNE operations is a node v , and not an element x of the priority queue. The RIPPLE operation moves certain elements from one node to another. A DELETE(H, x) operation, as we shall see below, is implemented by removing x from the node v = v [x] that currently holds it. This leaves a `hole' in our data structure, as the node v has no element associated with it now. The task of RIPPLE(H, v ) is to fill this hole. If v is a leaf, we simply remove it from the tree. (This actually causes a problem if the parent of v now has less than B /2 children. This will be solved later using PRUNE.) Otherwise, a smallest element associated with one of the children of v is moved to v . (This element is found using a find-min(P Q[v ]) operation.) The heap order is clearly preserved. RIPPLE is then called recursively on the corresponding child. If v is the root of a tree, then the corresponding bucket priority queue, and possibly the global priority queue, are also updated. Since operations on atomic priority queues take constant (randomized) amortized time, and a delete operation on a bucket priority queue takes O(fdel (B )) amortized time, the cost of a RIPPLE operation is at og n most O( llog B + fdel (B )). A FIND-MIN(H ) operation is implemented by performing a find-min operation on the global priority queue. An INSERT(H, x) operation is performed by creating a single node tree containing the element x and then inserting this tree into the forest. We then update the bucket priority queue by inserting to it the new root. If the minimum of this bucket priority queue changes, we also update the global priority queue. This takes O(1) amortized time. A DELETE(H, x) operation is performed, as explained above, by performing a RIPPLE(H, v ) at the appropriate node v . A DEC-KEY(H, x, ) operation is performed by pruning away the subtree of the node containing the element whose key is being decreased and then inserting this tree into the appropriate bucket (having decreased the key stored in its root). A MELD(H1 , H2 ) operation is performed by melding all pairs of bucket priority queues of the same height. When a bucket exceeds its capacity of B - 1 tree roots, as a result of an INSERT or a MELD operation, a CONSOLIDATE operation is performed as follows; A new node is created which becomes the root node of a new tree. The subtrees of this node consist of exactly B trees from the overflowing bucket, of which there are between B to 2B - 2. A RIPPLE operation is then performed on this new root. The new tree is then inserted into the forest. (This, in turn, may trigger further consolidations.) In addition, we need to rebuild the appropriate bucket priority queue so that it would contain the remaining tree roots, and possibly to update the global priority queue. When a node v remains with less than B /2 children, we perform a PRUNE(H, v ) operation. (A node with less than B /2 children is said to be light .) The PRUNE(H, v ) operation cuts v and its remaining subtrees and moves them to the queue QU. This may in trigger further prunings, further consolidation of tree roots, and so on. Finally, the EMPTY-QUEUE(H ) operation incorporates the trees found in the queue QU[H ] into the main collection of trees. By following the amortized analysis of Fredman and Willard [FW94], it is not difficult to show that the (expected) amortized cost of each DELETE(H, x) og n operation is O( llog B + fdel (B )), and that the (expected) amortized cost of each MELD(H1 , H2 ) operation is og n O( llog B · fmeld (B )), while the (expected) amortized cost of each other operation is O(1). The complete analysis will appear in the full version of the paper. Thorup [Tho00] obtained a randomized data structure that can perform, for any > 0, all the priority queue operations, excluding the meld operation, on pri1/ 2- i ority queues of size at most 2(log n) n O(1) expected time. By applying the transformation of this section on these atomic priority queues and the Fibonacci heaps 1/ 2- , of Fredman and Tarjan [FT87], with B = 2(log n) we obtain meldable priority queues with an expected amortized cost of O(1) for each insert, find-min and decrease-key operation, and an expected amortized cost of O((log n)1/2+ ) for each delete or meld operation. 5 Minimum directed spanning trees We now discuss the implementation of Edmonds' MDST algorithm [Boc71, CL65, Edm67] using the new priority queues derived in this paper. We begin with a brief sketch of the algorithm. (We consider, for concreteness, the unrooted version of the problem.) To simplify matters, we shall assume that the input graph G is strongly connected. Otherwise, we choose a vertex r such that every vertex of G is reachable from r. Then, 46 we add dummy edges of suitably large weight from each vertex to r. The algorithm is composed of two stages. The first is a contraction stage that works as follows. If the input graph G = (V , E ) happens to be a graph with a single vertex, we are done. Otherwise, given G, construct a subset H = H (G) of the edges obtained by choosing for each vertex v V the lightest edge entering it, if any. Then contract the (disjoint) directed cycles formed in H and update the weights of the edges entering contracted vertices. More specifically, if C is a cycle in H and e = (u, v ), where u C , v C , is an edge entering C , then the new weight w (e) of the edge e in the contracted graph will be w(e) - w(e ), where e is the edge of C entering v and w(e), and w(e ) are the current weights of these edges. Self loops are discarded. This process is repeated until the graph contains a single vertex. We set H to be the union of all the sets H constructed during the contraction phase. Note that H contains O(n) edges. Edmonds [Edm67] (see also Karp [Kar71]) has shown that H must contain a minimum directed spanning tree. Tarjan [Tar77] and Camerini et al. [CFM79] gave a linear time algorithm for extracting a minimum directed spanning tree given H. We can therefore concentrate on the implementation of the contraction phase. Although the contraction process is conceptually simple, its efficient implementation is far from being trivial. Tarjan [Tar77] showed that it can be implemented, using a meldable priority queue data structure by performing a sequence of at most O(m) insert operations, O(m) extract-min operations, O(n) meld operations, and O(n) add operations. An add operations adds a constant to keys of all elements contained in a certain priority queue. It is required for efficiently updating the weights of the edges. (Note that each contraction changes the weights of all the edges entering a given vertex by the same amount.) Adding an add operation of existing priority queue data structures is not a difficult task. In particular, the priority queues obtained using our two transformations can be easily extended to support this operation in constant time. Using simple priority queues, Tarjan [Tar77] obtains an O(m log n) algorithm for the problem. Gabow et al. [GGST86] give a more sophisticated algorithm that implements the contraction process using at most O(n) insert operations, O(n) extract-min operations, O(n) delete operations, O(n) add operations, and finally O(m) move operations. A move is a non-standard priority queue operation that moves an element from one priority queue to another. There are no known constructions that support such an operation in constant time, without severely deteriorating the times required for the other operations. Also Fibonacci heaps do not support a general move operation in constant time, Gabow et al. [GGST86] show that in the special context of the contraction algorithm, the required move operations may be implemented in constant time. As a result, they obtain an O(m + n log n) algorithm for the MDST problem. Our deterministic O(m · log log n · (n)) time algo rithm and the randomized O(m · log log n · (n)) expected time algorithm for the MDST problem with integer edge weights are obtained using Tarjan's [Tar77] approach. We were not able to augment our data structures with the ability to perform the move operations required by the approach of Gabow et al. [GGST86] in constant time. Trying to implement these move operations in constant time is an interesting open problem. 6 Concluding remarks The main results of this paper are the two general techniques for transforming priority queues that do not support the meld operations into priority queues that do support it, at a very small degradation of their performance. Our first transformation is especially simple and it uses only classical techniques, namely the union-find data structure. As a byproduct, we also obtained improved algorithms for the minimum directed spanning tree problem with integer edge weights. Obtaining further improved results for this problem is a challenging open problem. Of particular interest would be the derivation of new algorithms that are not just more efficient implementations of Edmonds' [Edm67] algorithm. References [AHU74] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The design and analysis of computer algorithms. AddisonWesley, 1974. [AMOT90] R.K. Ahuja, K. Mehlhorn, J.B. Orlin, and R.E. Tarjan. Faster algorithms for the shortest path problem. Journal of the ACM, 37:213­223, 1990. [Boc71] F. Bock. An algorithm to construct a minimum directed spanning tree in a directed network. in: Developments in Operations Research, Gordon and Breach, New York, pages 29­44, 1971. [Bro96] G.S. Brodal. Worst-case efficient priority queues. In Proc. of 7th SODA, pages 52­58, 1996. [CFM79] P.M. Camerini, L. Fratta, and F. Maffioli. A note on finding optimum branchings. Networks, 9:309­312, 1979. [CGS99] B.V. Cherkassky, A.V. Goldberg, and C. Silverstein. Buckets, heaps, lists, and monotone priority queues. SIAM Journal on Computing, 28(4):1326­ 1346, 1999. 47 [Cha00] B. Chazelle. A minimum spanning tree algorithm with inverse ackermann type complexity. Journal of the ACM, 47(6):1028­1047, 2000. [CL65] Y.J. Chu and T.H. Liu. On the shortest arborescence of a directed graph. Sci. Sinica, 14:1396­1400, 1965. [CLRS01] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to algorithms. The MIT Press, second edition, 2001. [Dij59] E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269­271, 1959. [DKM+ 94] M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. Meyer auf der Heide, H. Rohnert, and R.E. Tarjan. Dynamic perfect hashing: upper and lower bounds. SIAM Journal on Computing, 23(4):738­761, 1994. [Edm67] J. Edmonds. Optimum branchings. J. Res. Nat. Bur. Standards, 71B:233­240, 1967. [FT87] M.L. Fredman and R.E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34:596­615, 1987. [FW93] M.L. Fredman and D.E. Willard. Surpassing the information-theoretic bound with fusion trees. Journal of Computer and System Sciences, 47(3):424­436, 1993. [FW94] M.L. Fredman and D.E. Willard. Transdichotomous algorithms for minimum spanning trees and shortest paths. Journal of Computer and System Sciences, 48:533­551, 1994. [GGS89] H.N. Gabow, Z. Galil, and T.H. Spencer. Efficient implementation of graph algorithms using contraction. Journal of the ACM, 36(3):540­572, July 1989. [GGST86] H.N. Gabow, Z. Galil, T.H. Spencer, and R.E. Tarjan. Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, 6:109­122, 1986. [Han02] Y. Han. Deterministic sorting in o(n log log n) time and linear space. In Proc. of 34th STOC, pages 602­ 608, 2002. [HT02] Y. Han and M. Thorup. Integer sorting in O(n log log n) expected time and linear space. In Proc. of 43rd FOCS, pages 135­144, 2002. [Kar71] R.M. Karp. A simple derivation of Edmonds' algorithm for optimum branchings. Networks, 1:265­ 272, 1971. [KKT95] D.R. Karger, P.N. Klein, and R.E. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. Journal of the ACM, 42:321­328, 1995. [KST02a] H. Kaplan, N. Shafrir, and R.E. Tarjan. Meldable heaps and boolean union-find. Proc. of 34th STOC, pages 573­582, 2002. [KST02b] H. Kaplan, N. Shafrir, and R.E. Tarjan. Unionfind with deletions. In Proc. of 13th SODA, pages 19­ 28, 2002. [Pri57] R. C. Prim. Shortest connection networks and some generalizations. Bel l System Technical Journal, 36:1389­1401, 1957. [Ram97] R. Raman. Recent results on the single-source shortest paths problem. SIGACT News, 28:81­87, 1997. [Tar75] R.E. Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the ACM, 22:215­225, 1975. [Tar77] R.E. Tarjan. Finding optimum branchings. Networks, 7:25­35, 1977. [Tar83] R.E. Tarjan. Data structures and network algorithms. CBMS 44, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1983. [Tar85] R.E. Tarjan. Amortized computational complexity. SIAM Journal on Algebraic and Discrete Methods, 6:306­318, 1985. [Tho00] M. Thorup. On RAM priority queues. SIAM Journal on Computing, 30(1):86­109, 2000. [Tho02] M. Thorup. Equivalence between priority queues and sorting. In Proc. of 43rd FOCS, pages 125­134, 2002. [Tho03] M. Thorup. Integer priority queues with decrease key in constant time and the single source shortest paths problem. In Proc. of 35th STOC, pages 149­158, 2003. [van77] P. van Emde Boas. Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters, 6(3):80­82, 1977. [vKZ77] P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Mathematical Systems Theory, 10:99­127, 1977. [Wil83] D. Willard. Log-logarithmic worst case range queries are possible in space O(N ). Inform. Process Lett., pages 81­84, 1983. 48