Optimal Routing in Chord Prasanna Ganesan Abstract We propose optimal routing algorithms for Chord [1], a popular topology for routing in peer-to-peer networks. Chord is an undirected graph on 2b nodes arranged in a circle, with edges connecting pairs of nodes that are 2k positions apart for any k 0. The standard Chord routing algorithm uses edges in only one direction. Our algorithms exploit the bidirectionality of edges for optimality. At the heart of the new protocols lie algorithms for writing a positive integer d as the difference of two non-negative integers d and d such that the total number of 1-bits in the binary representation of d and d is minimized. Given that Chord is a variant of the hypercube, the optimal routes possess a surprising combinatorial structure. Gurmeet Singh Manku and 2 in that order, thus converting the leftmost 1 in the remaining distance to a 0 at each step. The longest path has length b and the average path length is b/2. Clockwise greedy routing is non-optimal because it fails to exploit the bidirectionality of edges. Consequently, the average path length for this routing protocol is no better than that for a hypercube on as many nodes, even though Chord has roughly twice as many edges. We develop new routing protocols for Chord that exploit the bidirectionality of edges. It turns out that optimal routes in Chord have a strong connection with the Binary Subtraction Problem: Given a positive s integer d, find a pair of non-negative integers d , d uch that the number of 1-bits in d and d is minimal, sub ject to the constraint d = d - d . For example, the shortest route to cover a clockwise distance 14 (1110 in binary) is to use a clockwise step of 16 in combination with an anti-clockwise step of length 2, which can be seen as an optimal way of expressing 14 as the difference of two numbers. One may wonder why the Chord topology and, consequently, these routing protocols, are relevant in a practical peer-to-peer system where the number of participant computers may not be a power of two. It turns out that it is possible to create small, roughly equi-sized groups of computers such that the number of groups is a power of two. The Chord topology is then constructed over these groups and used for routing from one group to another. Road map: In §2, we study a simple but nonoptimal variant of the standard Chord routing algorithm: each node chooses between the shorter of clockwise-greedy and anti-clockwise-greedy routes, on a per-destination basis. We show that the average path b length drops to b/2 - /(2 ) + (1). In §3, we expose the interplay between optimal routing in Chord and binary subtraction. In §4, we solve the Binary Subtraction Problem. In general, there is no unique solution. We present a non-deterministic procedure that generates all the optimal solutions. In §5, we provide optimal routing algorithms for Chord. We show that Chord's diameter is b/2 . However, the average all-pairs shortest-path length is only b/3 + (1). Interestingly, two simple algorithms that discover optimal routes can be encoded compactly by finite-state 1 Intro duction Consider an undirected graph on 2b nodes arranged in a circle. Nodes are labeled with b-bit identifiers from 0 through 2b - 1 going clockwise. An edge (x, y ) exists iff x and y are 2k positions apart on the circle for some k 0, i.e., |x - y | equals either 2k or 2b - 2k for some 0 k < b. The resulting graph forms the basis of Chord [1], a topology used for routing in peer-to-peer networks. Nodes in the topology represent computers and edges represent overlay-network connections. Since overlay connections span wide-area networks, a message sent along a connection might incur a potentially large delay. Therefore, it is important to minimize the path length of routes on this overlay network. In the standard Chord routing algorithm, messages are forwarded along only those edges that diminish the clockwise distance by some power of two. Routing is clockwise and greedy, never overshooting the destination. Thus if a message is destined for a node that is clockwise distance d away, routing is equivalent to performing left-to-right bit-fixing to convert the 1s in the binary representation of d to zero. For example, if d is 14 (1110 in binary), greedy routing uses steps of 8, 4 Department of Computer Science, Stanford University, USA, prasannag@cs.stanford.edu. Supp orted in part by a Stanford Graduate Fellowship. Department of Computer Science, Stanford University, USA, manku@cs.stanford.edu. Supp orted in part by NSF Grant EIA0137761 and grants from SNRC and Veritas. Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 176 automata. The average shortest-path lengths are then computed by treating the automata as Markov Chains. In §6, we study link congestion for all the routing algorithms we describe. In §7, we extend our results to higher-base versions of Chord. In §8, we list open problems pertaining to the Hyperskewbe, a rather mysterious topology obtained by subtracting the hypercube edges from Chord but retaining the diameter edges. Theorem 2.1. The average path length for the Choice b /2 + (1) while the of two algorithm is b/2 - longest path has length b/2 . Proof. The average path length is at most S/2b + 1. If b is odd, S = Sodd , otherwise S = Seven , where b S (b-1)/2 b + b Sodd = i=0 ii i=(b+1)/2 (b - i) i b T b/2 b + b ev en = i=0 i i i=b/2+1 (b - i) i 2 Choice of Two hen T b b A simple variant of the standard Chord routing algoT - Sodd = i=(b+1)/2 (2i - b) i rithm is as follows: When node x wishes to send a mes- S b b - Seven = i=(b+2)/2 (2i - b) i sage to y , it chooses the shorter of ubstituting j = b - i, we get (a) the clockwise greedy route to y , and T b (b-1)/2 T - Sodd = j =0 (b - 2j ) j b (b) the anti-clockwise greedy route to y 's clockwise U (b-2)/2 - Seven = j =0 (b - 2j ) j successor (namely (y + 1) mod 2b ), followed by an anti-clockwise step to y . sing Lemma 2.1, we get b T T - Sodd = (b+1) (b+1)/2 When both (a) and (b) have equal path lengths, we 2 W b arbitrarily choose (b). The clockwise-greedy route in (a) - Seven = (b/2) b/2 considers only those edges that diminish the clockwise e use Stirling's approximation, which is distance by some power of two. Similarly, the anti clockwise-greedy route in (b) considers only those edges 2 x(x/e)x e1/(12x+1) < x! < 2 x(x/e)x e1/12x that diminish the anti-clockwise distance by some power of two. If we replace (b) by the more natural "anti- to deduce 2 ( clockwise greedy route to y ", Theorem 2.1 (see below) b + 1)/2 + (1) b T - Sodd = 2 b remains largely unchanged (the longest path is b/2 T - Seven = /2 + (1) b instead of b/2 ). However, congestion analysis in §6 becomes fairly involved. b /2 + (1). We now compute the average path length for Thus, the average path length is b/2 - b b The longest path length is b/2 corresponding to those Choice of Two. Let d = (y - x + 2 ) mod 2 , the clockwise distance from x to y . Then the length of the values of d with b/2 ones and b/2 zeros. clockwise-greedy route is H (d) where H (x) denotes the Hamming norm of x, i.e., the number of 1-bits in x. The 3 Optimal Routing and Binary Subtraction length of the anti-clockwise greedy route, as described Consider a route from node x to node y in Chord. Let above, is b - H (d) + 1. The shorter of the two has length us label each edge in the route by either a positive or min{H (d), b - H (d) + 1}. Let a negative power of two as follows: Let c denote the d clockwise distance traversed by following some edge. If S= :H (d)b min{H (d), b - H (d)} c = 2b-1 , we label the edge arbitrarily as -2b-1 or b = b-1 b T = i=0 i i b2 +2b-1 . If c = 2k for some 0 k < b - 1, then the label is +2k . Otherwise, c = 2b - 2k for some 0 k < b - 1 and The average shortest-path length for clockwise the label is -2k . In other words, we mark clockwise and greedy routes is T /2b = b/2. The average for Choice anti-clockwise usage of edges of length 2k with positive of Two is S/2b + (1). The quantity S/2b can be and negative signs respectively. viewed in terms of a bins-and-balls problem: if each of b Chord is symmetric with respect to every node. balls is thrown at random into one of two identical bins, Therefore, a route corresponding to one of the shortest b then S/2 is the expected size of the smaller bin. paths between x and y enjoys two properties: (a) no two labels along the path sum to zero, and (b) no label Lemma 2.1. For 0 m < b appears twice. Let d be the sum of the positive labels, b= b P m (b - 2j ) j (m + 1) m+1 and d the sum of the absolute values of the negative j =0 labels. Recall that H (x) is the Hamming norm of x, i.e., roof. By induction. the number of ones in the binary representation of x. 177 Since each 1-bit in the binary representation of d nd d corresponds to some edge along the route, the length of the route is H (d ) + H (d ). Moreover, either d = d - d or 2b - d = d - d . To explain, when d is greater than d , their difference is the clockwise distance covered, and when d is greater than d , their difference is the anti-clockwise distance covered. We are now ready for a formal definition of the Chord routing problem. Chord Routing Problem: Given b 1 and 0 < d < 2b , identify d , d such that H (d ) + H (d ) is minimal, subject to two constraints: (i) either d = d (ii) both d , d - a of procedure Opt Route. We will discuss procedure Opt Subtract in detail in §4. Lemma 3.1. If d , d = Opt Subtract(d) and 1 d 2b-1 , then d , d is a valid solution for the Chord Routing Problem with parameters b and d. Proof. The proof is in two parts: First, we show that when 1 d 2b-1 , then constraint (i) in the definition of the Chord Routing Problem can be simplified by dropping the latter condition. Second, we show that constraint (ii) can be dropped altogether. The resulting problem is simply the Binary Subtraction Problem with input d. s d o (a) Let S (b, x) denote the set of all tuples x , x uch that x = x - x and both x , x [0, 2b ). b-1 t if t In general, for fixed b and d, there are several We will now show tha2b-1 d , 2 - , b-hen for any , b dd S (b, 2 - d), +d d 21 S (b, d). optimal solutions for this problem. For example, if We will also that H (d ) + H (d ) = H (2b-1 + d ) + b = 4 and d = 0110, then 0110, 0000 , 1000, 0010 a - b-1 2 ) Together, the two claims will prove that nd 0000, 1010 are three different optimal solutions. H (d , a for 1 d 2b-1 , we can simplify constraint (i) in Moreover, a specific d d ctually encodes a set of optimal routes in Chord because any permutation of the Chord Routing Problem by dropping the second edges corresponding to 1s in d , d covers clockwise condition. Consider d , d S (b, 2b - d). By definition, distance d. b - . Rr g It turns out that the Chord Routing Problem can be 2 - db-= d ) d - e-a1 ranging the terms, we b-et d = (2 1 + d - (d 2b- ). By assumption, d 2 1 . reduced to the following Binary Subtraction Problem: Therefore, 2b - d 2b-1 , in which case, d 2b-1 Binary Subtraction Problem : Given positive inteand d < 2b-1 . In other words, the "diagonal" edge s ger d, find a pair of non-negative integers d , d is necessarily a part of 2 and definitely not in d . d . uch that H (d ) + H (d ) is minimized, subject to Now, consider the tuple b-1 + d , d - 2b-1 Loosely - . the constraint d = d d speaking, this tuple can be obtained from d , d by Both Binary Subtraction and Chord Routing pro- "flipping" the direction of the diagonal. Since d 2b-1 duce a tuple d , d as output. However, the two prob- and d < 2b-1 , it follows that both members of the tuple lems differ in two respects, both involving the number 2b-1 + d , d - 2b-1 lie in the interval [0, 2b ). Thus of bits b. First, Binary Subtraction has only one con- 2b-1 + d , d - 2b-1 S (b, d). It is easy to see that straint d = d - d whereas Chord Routing has a choice: H (d ) + H (d ) = H (d - 2b-1 ) + H (2b-1 + d ). either d = d - d or 2b - d = d - d . Second, in (b) Since d 2b-1 , if d , d is a solution of Chord Routing, both d and d are restricted to lie in b the range [0, 2 ). There is no such restriction in Binary Opt Subtract(d), then both d , d [0, 2b ). Therefore, constraint (ii) in the Chord Routing Problem can Subtraction. Procedure Opt Route(b, d) below shows how the be dropped. Chord Routing Problem can be solved by using proce, i dure Opt Subtract(d), which solves the Binary Sub- Lemma 3.2. If d d s a solution for the Chord Routing Problem with parameters b and d, then d , d is a traction Problem. solution for the Chord Routing Problem with parameters procedure Opt Route(b, d) b and 2b - d. if (d < 2b-1 ) Proof. Follows from the problem definition. d , d Opt Subtract(d) e output d , d Next, we present the solution to the Binary Sublse traction Problem in §4. In §5, we will derive two simple, d , d Opt Subtract(2b - d) T intuitive algorithms to directly discover optimal routes output d , d in Chord. In fact, these algorithms are simple enough he next two Lemmas establish the correctness to be described as finite-state automata. [0, 2b ). r 2b - d = d - d. 178 4 Solving the Binary Subtraction Problem Figure 1 shows pseudo-code for Opt Subtract(d), which solves the Binary Subtraction Problem. The output tuple d , d is produced bit-by-bit, right to left. In fact, Opt Subtract is non-deterministic and can produce al l optimal pairs for a given value of input d. procedure Opt Subtract (d) while (d > 0) { while (d mod 2 = 0) { d d/2 } output 0, 0 duces optimal solutions for the Binary Subtraction Problem. The reader interested in the algorithms for optimal routing in Chord, rather than in proofs of their optimality, may wish to proceed directly to §5. Lemma 4.1. If x mod 2 = 1 then exactly one of the fol lowing four predicates is true: (a) Suffix(x, 0(01) 01) (b) Suffix(x, 11(01) 01) (c) Suffix(x, 00(10) 11) (d) Suffix(x, 1(10) 11) t := 1, 0 0, 1 1, 0 or 0, 1 ¢ if Suffix(d, 0(01) 01) if Suffix(d, 1(10) 11) otherwise utput t } Figure 1: Opt Subtract(d) is a non-deterministic procedure that solves the Binary Subtraction Problem. It outputs d , d where d = d - d and H (d ) + H (d ) is minimal. Procedure Opt Subtract effectively scans the binary representation of d, right to left, producing the corresponding bits of d , d along the way. As long as the current bit is 0, the output is 0, 0 since both bits at that position ought to be 0 in d , d for optimality. The current bit becomes 1 when d mod 2 = 1. Then the output must be either 1, 0 or 0, 1 . Depending upon whether we choose 1, 0 or 0, 1 , the algorithm continues scanning the remainder, i.e., (d - 1)/2 or (d + 1)/2 respectively. The crux of the matter lies in deciding which of the two possible outputs to produce when d mod 2 = 1. Opt Subtract provides a precise characterization of how the decision should be made. The decision involves regular expressions [2]. Suffix(x, R) denotes a predicate that takes a positive integer x and regular expression R as input. Let x denote the binary representation of x in the form of a string with a leading 1. Then Suffix(x, R) evaluates to true iff string 00x satisfies regular expression (0 + 1) R. In other words, Suffix checks whether some suffix of 00x satisfies R. Prepending a pair of zeros to x might seem unnatural. However, it simplifies proofs by reducing the number of cases that we have to consider. We now proceed to prove that this algorithm pro- The non-determinism of Opt Subtract stems from the fact that we assign t := 0, 1 or t := 1, 0 arbitrarily if the suffix of d is neither 0 nor 0(01) 01 nor 1(10) 11. In other words, whenever d satisfies conditions (b) or (c) in the above lemma, the algorithm has a choice in producing its output. For example, any of the following five outputs can be produced for input d = 1110011010: d -d d - 0000011010 00010000000 1110011010 1 10000100010 - 00010001000 1110011010 10000000000 - 00001100110 1110011010 10000100000 - 00010000110 1110011010 10000000010 - 00001101000 1110011010 4.1 Pro of of Optimality We begin by defining functions and G over the set of positive integers. : For d 1, (d) = min[H (d ) + H (d )] over all possible d , d 0 such that d = d - d . G: We first define G for strings that satisfy the regular expression 1(1 + 0) . Later, we will extend the definition to include positive integers. For string x that satisfies regular expression (1+ 0)+ 1, let g denote the number of "groups" of 1s in x; if all groups are singleton ones, G(x) = g , otherwise, G(x) = g + 1. For example, G(110111) = 2 + 1 = 3, G(101) = 2 and G(10111) = 2 + 1 = 3. For any x that satisfies (0 + 1) 1(0 + 1) , we claim that it is possible to write x = (0 )1 (000 )2 (000 ) . . . (0 ) uniquely, where each sub-string 1 , 2 , . . . , satisfies the regular expression (1+ 0)+ 1. We are essentially breaking up the 179 £ d := o (d - 1)/2 d + 1)/2 if t = 1, 0 if t = 0, 1 Proof. Suffix(x, R) is true iff string 00x satisfies R. If x mod 2 = 1, then exactly one of the following two mutually exclusive conditions holds: 00x ends in 01: Either (a) or (b) is true, depending upon whether a pair of zeros or a pair of ones is encountered when scanning 00x from right to left. 00x ends in 11: Either (c) or (d) is true, depending upon whether a pair of zeros or a pair of ones is encountered first when scanning 00x from right to left, and ignoring the right-most bit. ¡ string x whenever we encounter consecutive zeros, and we are ignoring lei ding and trailing strings of zeros. a Now, G(x) = For positive integer x, =1 G(i ). G(x) = G(x) where x denotes the string representing x in binary. i The correctness of Opt Subtract is clear if we rewrite the assignment to t as follows: 1 t := ,0 0, 1 1, 0 or 0, 1 if G((d - 1)/2) < G((d + 1)/2) if G((d - 1)/2) > G((d + 1)/2) if G((d - 1)/2) = G((d + 1)/2) Proof. Proof by induction on integer d. Base case: The claim is true for d = 1. Induction step: Consider d > 1. We assume that the claim is true for all integers less than d. There are two cases: · d mod 2 = 0 Clearly, (d) = (d/2) and G(d/2) = G(d). By induction hypothesis, (d/2) = G(d/2). Thus (d) = G(d). Opt Subtract indeed outputs 0, 0 when d mod 2 = 0. a 5 Optimal Routing Algorithms for Chord We present two deterministic algorithms for solving the Chord Routing Problem that we defined in §3. For i fixed values of b and d, exactly one output tuple d , d s produced. Both algorithms run the automaton in Figure 2 for exactly b steps. · d mod 2 = 1 0 S0 0 S3 If d = d - d , then exactly one of d in the last bit. Therefore, for d > 1, nd d h as 1 1 0 0 1 (d) = 1 + min[((d - 1)/2), ((d + 1)/2)] By induction hypothesis, ((d - 1)/2) = G((d - 1)/2) ((d + 1)/2) = G((d + 1)/2) S1 1 S2 1 Therefore, (4.1) (d) = 1 + min[G((d - 1)/2), G((d + 1)/2)] The following identities are easily seen to hold: £ Figure 2: The state machine used by automata-based algorithms that solve the Chord Routing Problem. a) Suffix(x, 0(01) 01) G((x - 1)/2) = G(x) - 1 G((x + 1)/2) = G(x) G((x - 1)/2) = G(x) - 1 G((x + 1)/2) = G(x) - 1 £ b) Suffix(x, 11(01) 01) c) Suffix(x, 1(10) 11) £ G((x - 1)/2) = G(x) G((x + 1)/2) = G(x) - 1 G((x - 1)/2) = G(x) - 1 G((x + 1)/2) = G(x) - 1 £ d) Suffix(x, 00(10) 11) From Lemma 4.1, exactly one of the above four Suffix predicates must be true for d. Therefore, (4.2) min[G((d + 1)/2), G(d - 1)/2)] = G(d) - 1 Eq (4.1) and Eq (4.2) together yield (d) = G(d). 5.1 Right-to-Left Chaining: The binary representation of distance d, possibly padded with leading 0s to make it exactly b bits long, is fed as input right-toleft. The output tuple d , d is also generated right-toleft. The start state is S0 . Each transition produces a pair of bits as output, the first bit for d and the second for d . All thin edges produce 0, 0 as output. Thick 1 0 edges (S0 S1 and S2 S3 ) require a lookahead: If the next input bit is 0, the output is 1, 0 . If the next input bit is 1, the output is 0, 1 . If there is no next input bit, i.e., the input string just terminated, the output is arbitrarily chosen as 0, 1 or 1, 0 . Observe that the traversal of a thick edge corresponds to exactly one 1-bit in either d or d . Further intuition into Right-to-Left Chaining is gained by understanding the algorithm in terms of two ideas: chain fixing and chain coupling. Chain fixing is simple: a distance corresponding to a chain of at least two 1s can be covered using a 180 Theorem 4.1. For d 1, (d) = G(d). If d , d s produced by Opt Subtract(d) as output, then d = d - d and H (d ) + H (d ) = G(d). In fact, it can be shown that Opt Subtract(d) a produces al l possible tuples d , d such that d = d - d nd H (d ) + H (d ) = G(d). .2 Left-to-Right Bidirectional Greedy : The idea is simple: Node x chooses the edge that takes the message the closest, in terms of absolute distance on the circle, to y . This algorithm can also be encoded using 1 Thanks to Qixiang Sun for suggesting this algorithm. £ 5 d := £ t := 1, 0 0, 1 if Suffix(d, 01) if Suffix(d, 11) if t = 1, 0 if t = 0, 1 (d - 1)/2 d + 1)/2 1 Proof. For d = 2b-1 , the Lemma is true. For d 1 and d = 2b-1 , the b-bit strings representing d and 2b - d satisfy three properties: (a) the position of the rightmost 1 is the same in both strings, (b) all digits to the left of the right-most 1 are complements of each other in the two strings, and (c) there is at least one digit to the left of the right-most 1. Lemma 5.2. For 1 d < 2b , 2 d b /3 G(d) = G(2b - d) - 1 2b < 2b+1 /3 d /3 G(d) = G(2b - d) 2b+1 d> /3 G(d) = G(2b - d) + 1 181 combination of only two steps: a short backward step and a long forward step. For example, a distance of 7 (000111 in binary) in a 64-node graph can be covered with a forward step of 8 combined with a backward step of 1. Chain fixing applies only to chains of length at least two. A chain consisting of a solitary 1 is handled by taking a single forward step. Chain coupling comes into play when two chains are separated by a solitary zero. We explain the idea with an example. Let us say we had to cover clockwise distance 238(011101110). We know that the last five bits can be fixed by a backward step of 2 and a forward step of 16. However, we observe that taking the backward step of 2 leads to a distance 240(011110000), in which the bit corresponding to the intended forward step of 16 becomes part of a longer chain of 1s. Thus, instead of a forward step of 16, we can instead use a backward step of 16 followed by a forward step of 256 to fix this entire chain of 1s. The behavior of the finite-state automaton in Figure 2 can be understood in terms of chain-fixing and chain-coupling. The automaton starts in State S0 , moving between S0 and S1 to fix singleton ones. Transition 1 S1 S2 marks the onset of chain-fixing because two consecutive ones were just encountered in the input. 1 0 Transition S2 S3 , followed by S3 S2 marks the 0 onset of chain-coupling. Transition S3 S1 signals the end of chain-coupling since two consecutive zeroes were just encountered, taking us to the start state S0 . If the input string happens to terminate in state S1 or S3 , the output can be chosen arbitrarily as 1, 0 or 0, 1 . The choice is immaterial because 1s in the most significant bit-positions of d and d correspond to clockwise distances 2b-1 and 2b - 2b-1 respectively, which are equivalent. Observe that the automaton solves Opt Subtract(d) when d < 2b-1 . The deterministic algorithm is equivalent to replacing the two lines in Figure 1 that assign values to t and d , as follows: the automaton in Figure 2. The input string d[1..b] is treated as a bit-array of length b and is processed from left to right. The start state is S0 . The output is stored in bit-arrays d [1..b] and d [1..b], both of which are initialized to all-zeros. The output arrays get altered only when a thick edge is traversed. Let us assume that the thick edge was traversed due to the ith input bit, where 1 i b. S0 S 1 : S2 S 3 : 0 1 if (i = b) or (d[i + 1] = 0) then set d [i] 1 else set d [i - 1] 1 if (i = b) or (d[i + 1] = 0) then set d [i - 1] 1 else set d [i] 1 5.3 Pro of of Optimality Lemma 5.1. Let d denote the binary representation of integer d 1. The automaton traverses exactly G(d) thick edges if string 0d is fed right-to-left. Proof. Let 0d = 01 (00 0)2 (00 0)3 . . . (0 ) where each i satisfies the regular expression (1+ 0)+ 1. We claim that the number of thick edges traversed i when processing 0d is G(d) = It =1 G(i ). is easier to prove the claim if we rewrite 0d = (01 0)0 (02 0)0 (03 0)0 . . . (0 )0 . The automaton traverses no thick edges while processing the sub-strings corresponding to 0 . For each 0i , the automaton performs one state transition from S0 to S1 for each of the initial singleton 1s, one state transition from S0 to S1 for the first of a non-singleton group of 1s, and one state transition from S2 to S3 at the end of every group of 1s thereafter. Observe that if the input string does not have a leading zero, i.e., the last input bit fed to the automaton is a 1, the automaton could potentially terminate in state S2 without traversing a thick edge for the last group of 1s. F igure 3: Opt Route(d) is a non-deterministic procedure that solves the Chord Routing Problem. Proof. The path length of a route produced by Rightto-Left Chaining for distance d is exactly equal to the number of thick edges traversed when scanning the binary representation of d right-to-left. As outlined in the proof of Lemma 5.1, Right-to-left Chaining traverses. thick edges exactly G(d) times iff G(d) 2b+1 /3 Otherwise, it traverses thick edges G(d) - 1 times. From Theorem 5.1, we see that this is optimal. One of the strings that requires the maximum number of steps is d = (01)b/2 when b is even and d = (01)(b-1)/2 0 when b is odd. For both cases, G(d) = b/2 . The proof of Theorem 5.2 sheds light on the reason for the automata for both Right-to-Left Chaining and Left-to-Right Bidirectional Greedy being identical in structure: the automaton is essentially a computation of G over the input string. The automaton remains the same irrespective of whether strings are scanned right-to-left or left-to-right. 5.4 Average Path Length Theorem 5.1 characterizes the path length of optimal solutions for the Chord Routing Problem. Combining this theorem with Lemma 5.2, we see that the path length for 2b-1 < d < 2b is simply G(2b - d). Therefore, the average path length over all distances 0 d < 2b is d=2b -1 [1 + 2 d=1 G(d)]/2b . (The additive constant 1 appears because the path length for d = 0 is zero, while it is 1 for d = 2b-1 .) It turns out that the computation of this average is greatly simplified if we analyze the automaton in Figure 2. We have already seen that the path length for a given distance d is just the number of thick edges traversed by the automaton when processing the b-bit binary representation of d. Computing the average path length simply requires us to run this automaton Theorem 5.1. An optimal solution to the Chord Routing Problem has path length 0 G G(d) (d) - 1 if d = 0 2 if 1 d b+1 /3 otherwise Proof. Follows from Lemma 5.2 and the definition of procedure Opt Route. In fact, Lemma 5.2 suggests an improvement to Opt Route, as shown in Figure 3. This procedure is a complete characterization of the Chord Routing Problem and produces al l possible optimal solutions for inputs b and d. Theorem 5.2. Right-to-Left Chaining solves the Chord Routing Problem optimal ly. The diameter of Chord is b/2 . 182 ¡ ¢ ¡ ¡ Consider two copies of the automaton in Figure 2, one scanning the b-bit string representing d and the other scanning the b-bit string representing 2b - d, both right-to-left. Both automata reach state S1 upon consuming the right-most 1. Thereafter, for every digit (0 or 1) processed as input by the first automaton, its complement is processed by the second. It is easy to see that the two automata are in diagonally opposite states at all times. Therefore, both traverse exactly the same number of thick edges when processing b bits. From Lemma 5.1, we conclude that if we were to run both automata for one more step, with an additional 0 as input, the number of thick edges traversed would be exactly the function G over the respective inputs. The only thick transition upon receiving 0 as input 0 corresponds to S2 S3 . Therefore, G(d) = G(2b - d) iff the automaton terminates in S1 or S3 , and G(d) = G(2b - d) - 1 iff the automaton terminates in S0 . Finally, G(d) = G(2b - d) + 1 iff the automaton terminates in S2 . We are now left with the job of characterizing strings that terminate in the four states. The automaton is in state S2 when scanning a string right-to-left iff there exists a prefix of the string that satisfies the regular expression 1(01) 1. For 1 d < 2b , a prefix of the string represent2ng d sa.tisfies the regular i b+1 expression 1(01) 1 iff d > /3 Therefore, the 2 . automaton is in state S2 iff d > b+1 /3 Similarly, it can be2 shown that the automaton is in state S0 . iff d b /3 The automaton is in state S1 or S3 otherwise. 2 , Thus G(d) = G(2b - d) + 1 iff 2 > . b+1 /3 and d b G(d) = G(2b - d) - 1 iff d /3 Otherwise, G(d) = G(2b - d). procedure Opt Route(b, d) if (d 2b /3 ) d , d Opt Subtract(d) e output d , d lse if (d > 2b+1 /3 ) d , d Opt Subtract(2b - d) e output d , d lse Opt Subtract(2b - d) , dd or Opt Subtract(d) output d , d on all possible b-bit binary strings and compute the average number of thick edges traversed. This suggests an interesting approach for analysis. We visualize the automaton as describing a 4-state Markov chain (each transition has probability 1/2) and we count the expected number of times a thick edge 1 0 (S0 S1 or S2 S3 ) is traversed, in b steps. 1 in either o , d r d thereby contributing to the sum H (d ) + H (d ). The stationary distribution for the 4-state Markov chain would be [ 1/3 1/6 1/3 1/6 ]. Assuming that the probability of being in a particular state converges to the stationary distribution very quickly (after a small number of bits have been processed by the automaton), the expected number of thick edges taken is roughly 1 1 1 ( 2 · 3 + 1 · 3 ) · b = b/3. Therefore, average path 2 length should be b/3 for large values of b. Our formal analysis validates this intuition by computing the exact probability distribution after bits are seen by the automaton, and computing the expected number of thick edges taken using exact probabilities. For 1, let A ,s denote the fraction of binary strings of length exactly that cause the automaton to stop in state Ss on reading them. From Figure 2, it follows that for 1 < b, A = A0 P , where matrix P and vectors A and A0 are defined as follows: 1/2 1/2 0 0 1/2 0 1/2 0 P= 0 0 1/2 1/2 1/2 0 1/2 0 A A0 = 6 Link Congestion We now study the fraction of times each edge in the graph is used, in each direction, in order to perform routing from every node to every other node. We visualize an undirected edge between x and y as consisting of two links, (x, y ) and (y , x), in order to model the connection of real computers on the Internet. The discussion so far did not freeze the exact sequence of steps used in order to route a given distance d. In the following discussion, we will still not freeze this sequence, but will require that all nodes use the same sequence of steps in routing for each distance d. Let us define the fraction of times a link is followed from node x to node y in the routing paths from every node to every other node to be the congestion of link (x, y ), C (x, y ). For notational convenience, let us assume that all arithmetic on node identifiers is modulo 2b . Let N (a, (x, y )) represent the number of times a link (x, y ) is used by all the routes from a to every other node. Consider a link (x, y ) where y - x is 2k . The congestion C (x, y ) is given by 22b C (x, y ) 2b -1 2b -1 = a=0 N (a, (x, y )) = a=0 N (0, (x - a, y - a)) 2b -1 = p=0 N (0, (p, p + 2k )). This quantity is simply the number of times a clockwise link of length 2k is used in routing from node 0 to every other node. (Also note that this implies that for a given k , the congestion on link (x, x + 2k ) is the same for all x.) By a similar argument, we see that all anti-clockwise links of the same length are also used an equal number of times. Theorem 6.1. For Choice of Two routing, and for -b odd b2 , link congestion on al l links is C = 2 4 (1 - 1 /8 b + o(1)), except on diameter links where it is 2C , and on anti-clockwise links of length 1 where it is C + 2-(b+1) . Proof. Observe that the average number ob links used f on a clockwise route from node 0 is b/4 - /8 + (1) for odd b. Since links of all lengths are equally likely to be used, link congestion of all clockwise links is 1 2-b /8 b + o(1)). The same argument applies to 4 (1 - anti-clockwise links. Since there are only half as many diameter links as there are links of other lengths, the congestion on them is twice that on other links. Observe that a backward link of length 1 is used on every anticlockwise route, leading to an additional congestion of 2-(b+1) on them. Lemma 6.1. Link congestion is symmetric for Rightto-Left Chaining, i.e., C (x, y ) = C (y , x). 2 For even b, the congestion on clo ckwise links is C = 2-(b+2) while that on anti-clo ckwise links is 2C - C . The same exceptions apply for unit-length backward links and diameter links. [ A ,0 A ,1 A = [1 0 0 0] ,2 A ,3 ] roof. By induction. The base case ( = 1) is true because A1 = [ 1/2 1/2 0 0 ]. The induction step follows from the fact that for 1 < b, A A +1,0 +1,2 Lemma 5.3. For 1 < b, 2 / A A A ,0 = 2 /3 /2 P / A 32 ,2 = ,1 ,3 2 = 2 /6 / = 6 / /2 2 = [A = [A ,0 +A ,1 + A ,1 +A ,2 + A ,3 ]/2 ,3 ]/2 A A +1,1 +1,3 =A =A ,0 /2 ,2 /2 Theorem 5.3. The average path length for both Leftto-Right Chaining and Right-to-Left Bidirectional Greedy is b/3 + (1). Proof. The average number of thibck lines encountered -1 for strings of length b is given by =0 [A ,0 /2 + A ,2 /2]. Plugging 2 roba+ lities from Lemma 5.3 and using the p bi 2 = +1 / / identity 3 3 2 /3 + (-1) /3 for 0, 1 b the sum simplifies to 3 + 9 (1 - ( -1 )b ). 2 183 Proof. This follows from a symmetry argument; link (x, y ) is used by a route from p to q iff link (y , x) is used by the route from (x + y - p) to (x + y - q ). Summing over all such p and q leads us to conclude that C (x, y ) = C (y , x). Theorem 6.2. For Right-to-Left Chaining, link -b -k congestion on a link of length 2k is C = 2 6 (1 + (2k1)1 ) + except on diameter links where it is 2C . The input string d comprises of digits from the set {0, 1, . . . k - 1} and is scanned from right to left. Symbol m is a short-hand for the "middle digits", i.e., the set {1, 2, . . . k - 2}. Each transition also produces a pair of digits as output, the first digit for d and the second for d . The start state is S0 . All thin edges produce 0, 0 as output. All thick edges produce nonzero output and require lookahead: S0 S 1 : S 2 k-1 Proof. The fraction of times edges of length 2k are used in routing from a single node is simply the probability of making a state transition along a thick edge, when seeing the (k + 1)th input bit. This probability is given by 1 (Ak [0] + Ak [2]). Dividing this quantity by the 2 number of links of length 2k , we get the desired result. Observe that there are 2b+1 links of each length 2k , except for the case k = b - 1 when there are only 2b w links. 7 Chord in Base-k In this section, we study a natural generalization of Chord, similar to higher-base hypercube topologies. Consider k b nodes arranged clockwise in a circle with labels going from 0 to k b - 1. An edge connects a pair of nodes iff they are separated by a clockwise or anti-clockwise distance of k for some 0 < < b and 0 < b. Chord in base-k is of considerable practical interest because it offers routing in fewer steps than base-2 Chord when k > 2. S3 : S4 : m 0 X if (next digit is k - 1) ehen output 0, 1 t lse output k - 1, 0 if (next digit is k - 1) then output 0, k - 1 e lse output 1, 0 if (next digit is k - 1) ehen output 0, m - 1 t lse output m, 0 here X denotes any of the five states. Theorem 7.1. The average length of shortest paths for k-1 base-k Chord on k b nodes is k+1 b + (1). Proof. The stationary distribution for the Markov chain associated with the automaton in Figure 4 is 1 1 k -1 2 [ k+1 k(k1 1) k+1 k(k+1) k-1 ]. A cost of 1 hop + k+ is paid every time a thick edge is traversed. Continuing in the same fashion as in the proof of Theorem 5.3, it can be shown that the average number of thick edge k-1 traversals is k+1 b + (1). 8 The Hyp erSkewb e 0 S0 0 m k-1 0 m m 0 S3 m k-1 0 k-1 m k-1 We define the HyperSkewbe, a skew-symmetric variant of the Hypercube, as follows: · The HyperSkewbe of dimension 1, S1 is a graph consisting of two nodes, labeled 0 and 1, with an edge between the two nodes. · (Recurrence) Consider one copy of Sk , and suffix all node identifiers with a 0 to create a graph Sk,0 . Consider another copy of Sk and suffix all node identifiers with a 1 to create Sk,1 . Now, create an edge between node x in Sk,0 and node (x - 1) mod 2k+1 in Sk,1 . Call the resulting graph Sk+1 . We observe that the above recurrence is almost identical to the standard hypercube definition. The only difference is that node x in Sk,0 is attached to (x - 1) mod 2k+1 to create the HyperSkewbe, instead of to x + 1, which would have resulted in the standard Hypercube. This small variation in the definition results in an intriguing structure which appears to have S4 S1 k-1 S2 Figure 4: State machine diagram for algorithm Rightto-Left Chaining for Base-k . Right-to-Left Chaining for Base-k : The algorithm uses the automaton in Figure 4, which generalizes the one in Figure 2. An extra state has been introduced to accommodate the middle digits, i.e., the digits other than 0 and k - 1. State S4 can actually be merged with state S3 to obtain a smaller automaton. However, we keep the two states separate for clarity. 184 properties rather different from that of the Hypercube. For example, the diameter of the k-dimensional HyperSkewbe is smaller than k for all k > 2. 5 6 2 7 3 1 0 4 Figure 5: 3-dimensional Hyperskewbe on 8 nodes. Figure 5 shows a three-dimensional HyperSkewbe. The Chord network is simply the union of the hypercube of dimension k with the HyperSkewbe of dimension k , and it appears that it is the presence of this HyperSkewbe that makes the routing properties of Chord so different from that of the standard hypercube. Let us now consider routing on the HyperSkewbe. One algorithm for routing is to perform right-to-left bitfixing, which results in an average path length of b/2 on the b-dimensional HyperSkewbe. One improvement to this routing algorithm is the following: Suppose we wish to route from node x to node y . We find the length of the route obtained by right-to-left bit-fixing to convert x to y . We also find the length of the route obtained by right-to-left bit-fixing converting y to x. In general, these two routes, and their lengths, are not identical. We can then choose the shorter of these two routes as the route from x to y . Experiments indicate that this idea improves upon simple right-toleft bit-fixing considerably. However, this algorithm is not optimal either. The characterization of optimal routes on the HyperSkewbe appears to be an interesting open problem. References [1] I. Stoica, R. Morris, D. Liben-Lowell, D. R. Karger, M. F. Kaashoek, F. Dabek and H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications, IEEE/ACM Transactions on Networking, Vol. 11, No. 1 (2003), pp. 17­32. [2] J. E. Hopcroft, R. Motwani and J. D. Ullman, Introduction to Automata Theory, Languages and Computation (2nd Ed.), Pearson Addison Wesley, (2000), pp. 1-521. 185