Quasiconvex Analysis of Backtracking Algorithms David Eppstein Abstract We consider a class of multivariate recurrences frequently arising in the worst case analysis of DavisPutnam-style exponential time backtracking algorithms for NP-hard problems. We describe a technique for proving asymptotic upper bounds on these recurrences, by using a suitable weight function to reduce the problem to that of solving univariate linear recurrences; show how to use quasiconvex programming to determine the weight function yielding the smallest upper bound; and prove that the resulting upper bounds are within a polynomial factor of the true asymptotics of the recurrence. We develop and implement a multiple-gradient descent algorithm for the resulting quasiconvex programs, using a real-number arithmetic package for guaranteed accuracy of the computed worst case time bounds. 1 Intro duction The topic of exponential-time exact algorithms for hard problems has led to much research in recent years [3, 7, 8, 10­12, 14, 16, 19]. In contrast to the situation with polynomial time algorithms, one can not significantly increase the size of a solvable instance by waiting for Moore's law to provide faster computers, so algorithmic improvement is especially important in this area. Several design principles are known for exponential-time algorithms, including dynamic programming [17] and randomized hill climbing [19], but the most common approach is a simple form of branch-and-bound in which one repeatedly performs some case analysis to find an appropriate structure in the problem instance, and then uses that structure to split the problem into several smaller subproblems which are solved by recursive calls to the algorithm. An example of this branch-and-bound approach is a graph coloring algorithm of the author [11] which uses a subroutine for listing each maximal independent set of at most k vertices (k -MIS) in an n-vertex graph. The subroutine repeatedly selects and applies one of the following cases, at least one of which is guaranteed to exist in any problem instance: zero, recursively list each (k - 1)-MIS in G \ {v } and append v to each listed set. · If the input graph G contains a vertex v of degree one, with neighbor u, recursively list each (k - 1)MIS in G \ N (u) and append u to each listed set. Then, recursively list each (k - 1)-MIS in G \ {u, v } and append v to each listed set. · If the input graph G contains a path v1 -v2 -v3 of degree-two vertices, then, first, recursively list each (k - 1)-MIS in G \ N (v1 ) and append v1 to each listed set. Second, list each (k - 1)-MIS in G \ N (v2 ) and append v2 to each listed set. Finally, list each (k - 1)-MIS in G \ ({v1 } N (v3 )) and append v3 to each listed set. Note that, in the last recursive call, v1 may belong to N (v3 ) in which case the number of vertices is only reduced by three. · If the input graph G contains a vertex v of degree three or more, recursively list each k -MIS in G \ {v }. Then, recursively list each (k - 1)-MIS in G \ N (v ) and append v to each listed set. We can bound the worst-case number of output sets produced by this example, as the solution to the following recurrence in the variables n and k : T (n - 1, k - 1) 2T (n - 2, k - 1) T (n, k ) = max 3T (n - 3, k - 1) T (n - 1, k ) + T (n - 4, k - 1) As base cases, T (0, 0) = 1, T (n, -1) = 0, and T (n, k ) = 0 for k > n. Each term in the overall maximization of the recurrence comes from a case in the case analysis; the recurrence uses the maximum of these terms because, in a worst-case analysis, the algorithm has no control over which case will arise. Each summand in each term comes from a recursive subproblem called for that case. It turns out that, for the range of parameters of interest n/4 k n/3, the recurrence above is dominated by its last two terms, and has the solution T (n, k ) = (4/3)n (34 /43 )k . We can also find graphs · If the input graph G contains a vertex v of degree having this many k -MISs, so the analysis given by the recurrence is tight. Similar but somewhat more complicated multivariate recurrences have arisen in our algo Scho ol of Information and Computer Science, University of rithm for 3-coloring [10] with variables counting 3- and California, Irvine, CA 92697-3425, USA, eppstein@ics.uci.edu Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 788 T (n, h) max T (n + 3, h - 2) + T (n + 3, h - 1) + T (n + 4, h - 2) + T (n + 5, h - 2), T (n, h + 1) + T (n + 1, h + 2), 2 T (n + 2, h) + 2 T (n + 3, h), 2 T (n + 2, h) + 2 T (n + 3, h), T (n + 3, h - 2) + T (n + 3, h - 1) + T (n + 5, h - 3) + T (n + 5, h - 2), T (n + 1, h) + T (n + 3, h - 1) + 3 T (n + 3, h + 3), T (n + 3, h - 2) + 2 T (n + 3, h - 1) + T (n + 7, h - 2), T (n + 1, h) + 2 T (n + 4, h - 2), 3 T (n + 1, h + 2) + 2 T (n + 1, h + 5), 2 T (n + 2, h) + T (n + 3, h + 1) + T (n + 4, h) + T (n + 4, h + 1), T (n + 1, h - 1) + T (n + 4, h - 1), T (n + 1, h + 3) + 2 T (n + 2, h) + T (n + 3, h), 2 T (n + 2, h - 1), T (n, h + 3) + T (n + 1, h + 2) + T (n + 2, h), T (n + 1, h - 1) + T (n + 4, h - 1), 2 T (n + 1, h + 1) + T (n + 2, h + 1), 9 T (n + 2, h + 3), T (n + 1, h) + T (n + 1, h + 1), 9 T (n + 9, h - 5) + 9 T (n + 9, h - 4), T (n + 3, h - 2) + T (n + 3, h - 1) + T (n + 5, h - 2) + 2 T (n + 6, h - 3), T (n + 1, h - 1) + T (n + 4, h) + T (n + 4, h + 1), 2 T (n + 2, h) + T (n + 3, h) + T (n + 4, h) + T (n + 5, h), T (n + 1, h) + 2 T (n + 2, h + 1), T (n + 1, h - 1), 2 T (n + 2, h + 1) + T (n + 3, h - 2) + T (n + 3, h), T (n + 1, h + 1) + T (n + 1, h + 2) + T (n + 2, h), 2 T (n + 2, h) + 2 T (n + 3, h), T (n + 1, h + 2) + T (n + 2, h - 1) + T (n + 2, h + 1), T (n + 1, h), T (n + 2, h + 1) + T (n + 3, h - 2) + T (n + 4, h - 3), T (n - 1, h + 2), 3 T (n + 4, h) + 7 T (n + 4, h + 1), T (n + 2, h - 1) + 2 T (n + 3, h - 1), T (n + 2, h - 1) + T (n + 2, h) + T (n + 2, h + 1), T (n + 3, h - 2) + T (n + 3, h) + 2 T (n + 4, h - 2), T (n + 1, h) + T (n + 3, h - 1) + T (n + 3, h + 3) + T (n + 5, h) + T (n + 6, h - 1), 2 T (n + 1, h + 4) + 3 T (n + 3, h + 1) + 3 T (n + 3, h + 2), 3 T (n + 3, h + 1) + T (n + 3, h + 2) + 3 T (n + 3, h + 3) + 3 T (n + 4, h), T (n + 2, h - 1) + T (n + 3, h - 1) + T (n + 4, h - 2), T (n, h + 1), T (n + 1, h + 2) + T (n + 3, h - 2) + T (n + 3, h - 1), 2 T (n + 3, h - 1) + T (n + 3, h + 2) + T (n + 5, h - 2) + T (n + 5, h - 1) + T (n + 5, h) T (n + 2, h + 2) + 2 T (n + 3, h) + 3 T (n + 3, h + 1) + T (n + 4, h), T (n + 3, h - 2) + T (n + 3, h - 1) + T (n + 5, h - 3) + T (n + 6, h - 3) + T (n + 7, h - T (n + 1, h - 1), T (n + 1, h) + 2 T (n + 3, h), 4 T (n + 3, h + 1) + 5 T (n + 3, h + 2), 4 T (n + 2, h + 3) + 3 T (n + 4, h) + 3 T (n + 4, h + 1), T (n + 3, h - 2) + 2 T (n + 3, h - 1) + T (n + 6, h - 3), 4 T (n + 2, h + 3) + 6 T (n + 3, h + 2), T (n, h + 1) + T (n + 4, h - 3), T (n + 1, h - 1) + 2 T (n + 3, h + 2), 2 T (n + 2, h + 1) + 3 T (n + 2, h + 3) + 2 T (n + 2, h + 4), 2 T (n + 2, h) + 2 T (n + 2, h + 3), 2 T (n + 2, h) + T (n + 2, h + 3) + T (n + 3, h + 2) + T (n + 4, h) + T (n + 4, h + 1), 2 T (n, h + 2), T (n + 2, h) + T (n + 3, h - 2) + T (n + 3, h - 1), T (n + 3, h - 2) + 2 T (n + 4, h - 2) + T (n + 5, h - 3), T (n + 1, h) + T (n + 5, h - 4) + T (n + 5, h - 3), T (n + 1, h + 2) + T (n + 2, h - 1) + T (n + 3, h - 1), T (n + 2, h - 1) + T (n + 2, h) + T (n + 4, h - 1), 10 T (n + 3, h + 2), 6 T (n + 2, h + 2), T (n + 2, h) + T (n + 3, h), 2 T (n + 3, h - 1) + T (n + 3, h + 2) + T (n + 5, h - 2) + T (n + 5, h - 1) + T (n + 5, h) 6 T (n + 3, h + 1), 3 T (n, h + 3), T (n + 2, h - 1) + T (n + 2, h) + T (n + 4, h - 2), 2 T (n + 5, h - 3) + 5 T (n + 5, h - 2), 2 T (n + 2, h) + T (n + 2, h + 1) + T (n + 4, h - 1), 8 T (n + 1, h + 4), T (n + 3, h - 2) + T (n + 3, h - 1) + T (n + 5, h - 3) + T (n + 5, h - 2) + T (n + 7, h - T (n + 1, h - 1) + T (n + 2, h + 2), 5 T (n + 2, h + 2) + 2 T (n + 2, h + 3) + 2 T (n + 7, h - 3), 4), + T (n + 6, h - 2) + T (n + 7, h - 2), 3), Table 1: A recurrence arising from unpublished work with J. Byskov on graph coloring algorithms. 789 4-value variables in a constraint satisfaction instance, and in our algorithm for the traveling salesman problem in cubic graphs [12] with variables counting vertices, unforced edges, forced edges, and 4-cycles of unforced edges. These examples of recurrences have all had few enough terms that they can be solved by hand, but Table 1 depicts a recurrence, arising from unpublished work with J. Byskov on graph coloring algorithms, that is complex enough that hand solution seems unlikely. This recurrence was derived through an iterative process, starting from a simple case analysis of the problem, in which the worst cases for the algorithm were repeatedly identified and replaced by a larger number of better cases. Our interest in this paper is in performing this type of analysis algorithmically: if we are given as input a recurrence such as the ones discussed above, can we efficiently determine its asymptotic solution, and determine which of the cases in the analysis are the critical ones for the performance of the backtracking algorithm that generated the recurrence? We show that the answer is yes, by expressing the problem as a quasiconvex program, a type of generalized linear program studied previously by the author and others with applications including finite element mesh smoothing [2], brain flat mapping, hyperbolic graph drawing, and conformal meshing [4], and multi-pro jector tiled display color gamut equalization [5]. This quasiconvex programming formulation allows us to solve a -term recurrence in O( ) steps, each of which involves the solution of a constant number of algebraic equations; alternatively, we can apply any of several numerical hill-climbing techniques, which are guaranteed to converge to the global optimum of a quasiconvex program. We describe one such technique and provide two proof-of-concept implementations of it, one for exploratory analysis using floating point, and another using the XR exact real-number computation package for Python [6, 18]. The algorithms we describe are able to analyze recurrences such as the one in Table 1, producing asymptotic analysis of their behavior (here, we are interested in the asymptotics of T (n, 0)) and identifying the cases that are bottlenecks for that analysis, typically within one or two seconds for a floating point evaluation sufficiently accurate for exploratory analysis of algorithms. We first began using a weighting technique for upper bounding recurrences of this type in our previous work on graph coloring [10], without much regard to its completeness or generality until this paper. Despite some searching we have been unable to find relevant prior research on asymptotic analysis of similar multidimensional recurrences. smallmis = { "deg0": [(1,1)], "deg1": 2*[(2,1)], "deg2": 3*[(3,1)], "deg3": [(4,1), (1,0)], } Table 2: Python representation of recurrence input. 2 Formalization and Statement of Results We assume that the input to our problem consists of the following items: · An integer dimension d. · A recurrence F (x) = max i j F (x - i,j ), where x and i,j are vectors in Zd , and i and j are indices ranging over the cases and subproblems of the algorithm to be analyzed. We assume that, as base cases for the recurrence, F (0) = 1, and F (y ) = 0 when y can not reach the zero vector by any expansion of the recurrence. · A target vector t in Zd . The desired output is a description of the asymptotic behavior of thje function f (n) = F (n t). We call the expressions F (x - i,j ) terms of the recurrence, and their subexpressions F (x - i,j ) summands of the term. Much of what we describe here would generalize without difficulty to non-integer values of x and i,j , and to noninteger multipliers on the summands of each term. For our Python implementation, we represent the recurrence as a dictionary mapping case names to terms, with each term represented as a list of summands and each summand represented as a d-tuple of integers. For instance, the cardinality-bounded maximal independent set recurrence discussed in the introduction has the representation shown in Table 2. This Python representation also allows the inclusion of comments in the file containing the recurrence, describing in further detail the case analysis from which the recurrence arises. We obtain the following results: · We show that our recurrences can be upper bounded by linear univariate recurrences formed by weighting the variables, and that the optimal set of weights for this upper bound technique can be found efficiently using quasiconvex programming. The optimal basis for the quasiconvex program consists of the terms forming the worst cases for the recurrence. 790 · We describe a numerical improvement technique for quasiconvex programming, based on multi-gradient descent, and provide two implementations of this technique: one based on floating point arithmetic and capable of running at interactive speeds for exploratory algorithm analysis, and one using an exact arithmetic package for publishable guaranteed worst case bounds. Since Fw was defined as a maximum of values of F , this technique immediately leads to upper bounds for the target function f (n) = F (n t): Lemma 3.3. Let w Rd be such that, for each summand F (x - i,j ) of the input recurrence, w · i,j is positive, and let w · t = 1. Then f (n) Fw (n) = O(cn ). w · We prove lower bounds showing that the upper We call w a weight vector, because if one interprets bounds from our optimal weighting technique are the coordinates of w as weights of the different backtight to within a polynomial factor. tracking algorithm instance features counted by x, then w · x is the total weight of the instance. The recur3 Upp er Bounds rence for Fw and its solution O(cn ) describes the time w In this section we describe a weighting technique that for the backtracking algorithm as a function of instance can be used to obtain asymptotic upper bounds for our weight. However, different choices of the weight vecrecurrences, and formulate the problem of optimizing tor will give different upper bounds O(cn ) on the time w the weights in order to obtain the best such upper used by the backtracking algorithm. Our task now is to bound. select the best weight vector, that is, the one yielding Fix a vector w Rd , such that, for each summand the tightest upper bound. For convenience, we define F (x - i,j ) of the input recurrence, w · i,j is positive, c = c = + when w · is non-positive for some j . w w ,i i,j let y be a real variable, and define 4 Quasiconvex Programming In this section we find efficient algorithmic solutions Then it is not hard to see, by using the recurrence for for the optimal weighting problem formulated in the F (x) to expand the right hand side of this definition, previous section. A function q (w) : Rd R is called quasiconvex and interchanging the order in the maximization, that when its level sets q = {w Rd | q (x) } j are all convex. In particular, the points w where q Fw (y - w · i,j ). Fw (y ) max i achieves its minimum value (if a minimum exists) form a convex set, and an approximation to a value achieving As base cases set Fw (y ) = 1 for 0 y < mini,j w · i,j the global minimum can be found numerically by local and Fw (y ) = 0 for y < 0. This resembles the improvement techniques. If the functions qi for i in recurrence defining F , but with a real instead of vector- some finite index set S are all quasiconvex, then the valued argument. For linear univariate recurrences function qS (w) = maxiS qi (w) is also quasiconvex, and similar to this one, standard solution techniques such as it becomes of interest to find a point where qS achieves generating functions and characteristic polynomials are its minimum value. Amenta et al. [2] define quasiconvex commonly taught in freshman combinatorics courses. programming as a formalization of this search for the The recurrence for Fw is nonlinear because of the minimum of qS . Linear programming can be seen as a maximization, and involves non-integer variables, but special case of quasiconvex programming in which all the same techniques apply: the functions qi are linear. More formally, Amenta et al. define a nested convex Lemma 3.1. Fw (y ) = O(cy ), where cw is the unique family to be a map () from the nonnegative real w positive root of the monotonic function numbers to compact convex sets in Rd such that if j 1 < 2 then (1 ) (2 ), and such that for all rw (c) = 1 - max c-w·i,j . ) , () = Any nested convex family > ( . i determines a quasiconvex function q (w) = inf { | w d It will be convenient later to separate out this () } on R , with level sets q = (). Conversely, when q is quasiconvex, the closures of the level sets q , analysis into the different terms of the recurrence: restricted to a compact convex subdomain of Rd , form Lemma 3.2. Let cw,i denote the uniquejpositive root of a nested convex family. the monotonic function rw,i (c) = 1 - c-w·i,j . Then Amenta et al. define a quasiconvex program to cw = maxi cw,i . be formed by a set of nested convex families S = Fw (y ) = max F (x). w·xy 791 {1 , 2 , . . . n }; the task to be solved is finding the value ( w w (S ) = inf , w) i () Proof. If cw,i = c < + for all terms i, the result follows from Lemma 3.3. And if some term has w · i,j < 0, or has more than one summands, then cw,i = + but i S any sequence w converging to w has cw converging to here the infimum is taken in the lexicographic order- +, so the the upper bound is infinite and the result is vacuously true. The only remaining case is that some ing, first by and then by the coordinates of w. Quasiconvex programs can themselves be seen as a terms have a single summand i with w · i = 0. Such special case of generalized linear programs. These are terms can not contribute to asymptotic growth of F (n t) optimization problems based on an ob jective function and the result follows by applying Lemma 3.3 to the that maps sets to totally ordered values and that satis- recurrence formed by omitting them from the definition 2 fies certain axioms [1, 13, 15]; quasiconvex programs are of F . generalized linear programs because the function (S ) defined above satisfies these axioms [2]. One conseTherefore, we can use quasiconvex programming to quence of this generalized linear programming formu- find the weight vector w yielding the tightest possible lation is that the value of any quasiconvex program is upper bound O(cn ) on the asymptotic behavior of our determined by a basis, a subset of the nested convex recurrences. Further, by finding a basis for the quasifamilies with cardinality O(d). Another consequence is convex program, we can determine the recurrence terms that, when the dimension d is bounded, quasiconvex that are critical for its asymptotics. The restriction programs can be solved by dual-simplex based random- w · t = 1 does not affect quasiconvexity, and in fact aids ized algorithms that perform a number of steps linear in in the solution of our problem by reducing the effective the number of nested convex families in the input, where dimension of the quasiconvex program to d - 1. each step consists of solving a constant-sized subproblem. 5 Implementation As we now show, our problem of finding an optimal In this section we discuss two implementations of our weight vector for our input recurrence can be expressed optimization algorithms. Our implementations use a in this quasiconvex programming framework. numerical improvement technique which we call smooth Lemma 4.1. Let cw,i be as defined in the previous quasiconvex programming and which will also feature in section. Then the function qi (w) = cw,i is quasiconvex. our later lower bound proof. As stated earlier, generalized linear programming Proof. We must show that qi = {w | cw,i } is techniques can be used to solve our quasiconvex proconvex. Equivalently, expanding the definition of cw,i grams in O( ) steps, where denotes the number of and using the monotonicity of the function rw,i , we must terms and each step consists of solving a constant sized show convexity of the set subproblem. Such a subproblem has only a constant j number of potential bases, and we can test a basis by qi = {w | rw,i () 0} = {w | c-w·i,j 1}. solving an algebraic system of equations in the variables cw[k] where w[k ] denotes the k th coordinate of w. HowBut each summand of the sum in the right expression is ever this approach appears likely to be cumbersome in an exponential of a linear function of w, hence convex. practice due to the difficulty of implementing its pivA sum of convex functions is convex, and its level set is oting steps. Instead, we implemented a hill-climbing a convex set. 2 scheme for finding numerically the optimal value (c, w) for our quasiconvex program and for certain other quaCorollary 4.1. We can find a pair (c, w), where c = siconvex programs. inf w cw , w · t = 1, and w is a limit point of a sequence w w We have in mind two different purposes for an imith cw converging to c, as a solution to a quasiconvex plementation of our analysis algorithm. First, such an program. implementation could be used for exploratory analysis: Proof. We form nested convex families from the closures computing a rough estimate of the running time of an of the level sets of the functions qi (w) = cw,i . The result algorithm, with enough accuracy to determine whether follows from the definition of cw = max cw,i . 2 it improves other similar algorithms for the same problem, and determining the worst cases in an algorithm's Theorem 4.1. If (c, w) is found as the optimal solu- case analysis, in order to refine that analysis to protion to the quasiconvex program defined as above for a duce a better algorithm. For this sort of application, it recurrence for F (x) with target vector t, then f (n) = is important that the running time be fast enough to F (n t) = O(cn ). be usable at interactive or near-interactive speeds; an 792 w Figure 1: Example showing the difficulty of applying standard gradient descent methods to quasiconvex programming. The function to be minimized is the maximum distance to any point; only points within the narrow shaded intersection of circles have function values smaller than the value at point w. implementation using floating point arithmetic is appropriate. Second, we would like to be able to publish guaranteed worst-case bounds on algorithms, for which approximate numeric schemes such as floating point that lack error bounds are inappropriate; in this setting the longer running times associated with exact or interval arithmetic may be acceptable. With these considerations in mind, we implemented our algorithm twice, once using floating point and a second time using XR [6], an exact real-number computation package for the Python programming language [18]. The results of the second implementation are guaranteed to be valid upper bounds for the recurrence in question. In XR, a real number is represented as a data structure that is capable of constructing a multiprecision integer representing the rounded value of 2b , for any integer b; this rounded value is required to be within unit distance of the true value. We extended this package so that it could evaluate as exact real numbers the values cw given an input vector w the coordinates of which are also real numbers, by performing an appropriate binary search using the values of the function rw (c). We then implemented a quasiconvex programming algorithm to search for a sequence of vectors w converging to the infimum (c, w) of the quasiconvex program. We describe in more detail below our exact arithmetic implementation; our floating point implementation is similar. If all functions qi (w) are quasiconvex, the function q (w) = maxi qi (w) is itself quasiconvex, so we can apply hill-climbing procedures to find its infimum. However, although in our application the individual functions qi are smooth, their maximum q may not be smooth, so it is difficult to apply standard gradient descent techniques. The difficulty may be seen, for instance, in a simpler quasiconvex programming problem: determining the circumradius of a planar point set (Figure 1). A basis for the circumradius problem may consist of either two or three points; the quasiconvex functions used to solve the problem are simply the distances (or squared distances) from each input point, and the function q (w) measures the distance from w to the farthest point. But if a point set has only two points in its basis, and our hill climbing procedure for circumradius has reached a point w equidistant from these two points and near but not on their midpoint, then improvements to the function value q (w) may be found only by moving w in a narrow range of directions towards the midpoint. Standard gradient descent algorithms may have a difficult time finding such an improvement direction. Similar behavior arises naturally in our recurrence analysis problem. For instance in a recurrence in [12] for the time bound of a TSP algorithm on cubic graphs, there are only two critical terms despite the problem being defined over a three-dimensional vector space, and these two terms lead to behavior very similar to the circumradius example. To avoid these difficulties, we use the following algorithm, which we call smooth quasiconvex programming. We assume we are given as input a set of quasiconvex real functions qi (w). Further, we assume that for each i we also can compute a vector-valued function qi (w), satisfying the following properties: 1. If qi (x) < qi (w), then (x - w) · qi (w) > 0, and 2. If qi (w) · v > 0, then for all sufficiently small > 0, qi (w + v) < qi (w). For the circumradius example, for instance, qi should be a vector pointing from w towards the ith input point. The requirements on qi can be described geometrically, 793 as follows: we assume that the level set qi is a smooth convex set, one that has at each of its boundary points a unique tangent plane. The vector qi (x) is then an inward-pointing normal vector to the tangent q (x) plane to qi at x (essentially it is just the negation of the gradient of q ). The functions qi (w) arising in our recurrence analysis problem have this smoothness property, and (as we discuss in more detail in the next section) the vectors qi (w) can be constructed by evaluating the partial derivatives for each of the coordinates of x in the expression rx,i (cw ) at x = w. Our smooth quasiconvex programming algorithm then consists of selecting an initial value for w, and a desired output tolerance, and repeating the following steps: puter (800 MHz PowerPC G4) to solve moderately sized 3-variable recurrences to 64 bits of precision. However our floating point implementation is able to run at interactive speeds, taking roughly one or two seconds on the same laptop to solve recurrences such as the one in Table 1. We believe that significant improvements in runtime of our exact arithmetic implementation would be possible both by tuning the implementation and by using a faster software base. However it is encouraging that, in the trials we attempted, the algorithm appears to exhibit linear convergence to the correct function value as well as to the optimal weight vector coordinates: the number of iterations of the algorithm appears to be proportional to the number of bits of precision desired. 1. Compute the set of vectors qi (w), for each i 6 Lower Bounds such that qi (w) is within the desired tolerance of In this section we prove that the upper bounds found maxi qi (w). by our optimal weighting technique are tight to within 2. Find a vector v such that v · qi (w) > 0 for each a polynomial factor. In order to find lower bounds for the asymptotic vector qi (w) in the computed set. If no such vector exists, q (w) is within the tolerance of its optimal behavior of our recurrences, it is useful to have the following combinatorial interpretation of jtheir values. value and the algorithm terminates. For any x Zd , let i (x) = arg maxi F (x - i,j ). 3. Search for a value for which q (w + v) q (w), That is, i is the index of the term in the recurrence and replace w by w + v. that determines the value of F (x). If µ(x) : Zd Z Our actual implementation augments this proce- is any function mapping vectors to recurrence term dure by an outer scaling loop that gradually decreases indices, define an infinite graph Gµ , the vertices of which d the tolerance, so that multiple terms of the recurrence are vectors in Z , with an edge from x to x - i,j can influence the computation in step 1 even when the whenever i = µ(x) and i,j is a summand in term i current value of w is only a rough approximation to the of the recurrence. Let µ (x) denote the number of true optimum. We also terminate the loop when the paths from x to zero in Gµ . It is not hard to show by improvement to q (w) becomes much smaller than the induction that i (x) = F (x), so F can be interpreted tolerance, even when the termination condition of step as counting paths in a graph. Moreover, for any µ, 2 is not met, in order to handle situations in which the µ (x) F (x). An example of the graph Gi , for the recurrence discussed in the introduction, is shown in optimal basis is less than full-dimensional. The search for a vector v in step 2 can be expressed Figure 2; each vertex x in the figure is labeled with the as a linear program. However, when the dimension of number i (x) = F (x). We will find a lower bound for the quasiconvex program is at most two (equivalently, F (x) by counting paths in a graph Gµ , where µ(x) will the number of variables in the recurrence to be solved not necessarily equal i (x). Let (S ) = (c, w) be the optimal solution to the is at most three) it can be solved more simply by quasiconvex program formulated earlier for the upper sorting the vectors qi (w) radially around the origin and bounds for our recurrence, and let B be a basis of choosing v to be the average of two extreme vectors. In our implementation of step 3, we perform a the quasiconvex program; that is, a minimal subset doubling search for the largest leading to a smaller of terms of the recurrence such that the quasiconvex value of q (w + v), and then reduce the resulting by a program formed from the subset has the same value factor of two before replacing w, to attempt to control (B ) = (S ) as that for the whole recurrence. For situations where the value w oscillates around the true any term i in B , and any summand j in term i, let p pi,j = c-w·i,j , and let Di = i,j i,j . optimal value. Both due to our use of exact real arithmetic, and Lemma 6.1. For any term i in B , j p = 1. i,j due to the implementation in Python, a relatively slow interpreted language, our exact arithmetic implementa- Proof. If term i has only one summand, then it belongs tion is not fast, taking several hours on a laptop com- to B exactly when w · i,j = 0 and pi,j = 1. Otherwise, 794 1 2 4 8 16 1 2 4 8 12 18 1 2 4 6 9 12 16 1 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1 1 Figure 2: A portion of the infinite graph Gi for the recurrence T (n, k ) for listing maximal independent sets of cardinality k in an n-vertex graph, described in the introduction. The horizontal position of a vertex indicates its n coordinate and the vertical position indicates its k coordinate. To simplify the drawing, edges are shown in a confluent style [9] in which multiple edges are allowed to merge before reaching their destination. Each vertex is labeled with the number of paths in the graph from that vertex to the origin. qi (w). The pair (c, w) used to define bi and Di is assumed optimal, so can not be improved. Therefore, v does not exist and the origin must be contained in the convex hull of pro jections perpendicular to t of the vec tors qi . Any vector in the convex hull of a set of vectors Lemma 6.2. The vector Di defined above is a scalar can be expressed as a convex combination of those vec multiple of the vector qi defined for the smooth quasi- tors, and the same convex combination (when viewed convex programming algorithm of the previous section. in Rd rather than its pro jection perpendicular to t) has the property stated in the lemma. 2 Proof. The vector qi (w) is the vector such that qi (w + We are now ready to describe the graph Gµ used v) < wi (w) (for sufficiently small ) if and only if in our lower bound construction. For each x Z d , v · qi > 0. Expanding the definition of qi , as the value c such that rw,i (c) = 0, we see that qi (w) can equivalently we choose µ(x) to be one of the terms in the basis be defined as a vector v such that rw+ v,i (c) > 0 (for B , independently at random, with probability bi for sufficiently small ) if and only if v · qi > 0.Therefore, qi choosing term i. We then let Gµ be the infinite graph can be computed as the gradient of the vector function formed from µ as described at the start of the section. (x) = rx,i (c), evaluated at x = w. Expanding For each choice of µ the number of paths µ (n t) from this gradient by computing partial derivatives for each n t to the origin in Gµ forms a valid lower bound for the component of its vector argument, we arrive at the quantity f (n) = F (n t) that we are trying to estimate, definition of Di . 2 so the expected number of paths (averaged over the choice of µ) also forms a valid lower bound. To count the expected number of paths in Gµ , we Leb ma 6.3. There existb values bi , 0 bi 1, so that m i = 1 and so that i Di is a scalar multiple of the use the following random walk pro cess. Start from the vertex n t, and then from any vertex x choose randomly target vector t. among the edges leading away from x, independently Proof. The smooth quasiconvex programming algo- for each x. The set of edges at x is in one-to-one rithm of the previous section will find an improvement correspondence with the summands of term µ(x), and to solution (c, w) whenever there exists v , perpendicu- we choose summand j with probability pµ(x),j . As lar to t, having positive dot product with all the vectors shown in Lemma 6.1 these probabilities add to one at qi (w) is continuous and term i can belong top B only when qi (w) = c. But then rw (c) = 1 - i,j = 0 since qi is defined asp having the value that makes this equation true, so 2 i,j = 0. 795 each vertex. We continue this random walk process until we reach a vertex x with w · x = 0. Lemma 6.4. If a path from n t to the origin can be formed by the random walk described above, it has probability c-n of being chosen. Proof. More generally it is easy to show by induction on the length of the path that a path from x to y has probability cw·(y-x) of being chosen. The result follows from the choice of starting point x = n t and the constraint that w · t = 1. 2 Lemma 6.5. The random walk described above reaches the origin with probability (n(1-d)/2 ). Proof. By Lemma 6.3, the pro jections perpendicular to t of the vertices of the path form an unbiased random walk in Rd-1 , with O(n) steps of constant size, and the result follows from the standard theory of such walks. 2 Theorem 6.1. f (n) = F (n t) = (cn n(1-d)/2 ). Proof. By Lemmas 6.4 and 6.5, there is the given expected number of paths in Gµ from n t to the origin. The result follows from the facts that this number of paths is less than the number of paths between the same two vertices in Gi (since i is defined to maximize the number of paths) and that the number of paths in Gi is exactly F (n t). 2 7 Conclusions and Op en Problems We have shown that the solution to the recurrence j F (x - i,j ), F (x) = max i and analysis. Perhaps it would also be possible to automatically perform some of the case analysis used to design backtracking algorithms, and to determine the appropriate variables to use in setting up the recurrences used to analyze those algorithms, before automatically solving those recurrences, at least for simple constraint satisfaction type problems. It would also be of interest to find ways of specifying algorithms of this type in such a way that their correctness can be proven automatically, especially in situations where repeated refinement based on our analysis tools has led to highly complex case analysis such as that appearing in Table 1. Also, while we can find tight worst-case bounds on the solution of the recurrence derived from an algorithm, it may not always be possible to construct an instance causing the algorithm itself to have that worst case time bound; it would be useful to determine conditions under which this recurrence-based analysis is tight. In another direction, the proof of Theorem 6.1 hints at a theory of duality for quasiconvex programs that it would be of interest to explore. Acknowledgements This research was supported in part by NSF grant CCR9912338. I would like to thank Jesper Byskov and George Lueker for helpful discussions and comments on drafts of this paper, and Keith Briggs for help with programming using XR. References [1] N. Amenta. Helly-type theorems and generalized linear programming. Discrete Comput. Geom. 12:241­261, 1994. [2] N. Amenta, M. W. Bern, and D. Eppstein. Optimal point placement for mesh smoothing. J. Algorithms 30(2):302­322, February 1999, arXiv:cs.CG/9809081. [3] R. Beigel. Finding maximum independent sets in sparse and general graphs. Proc. 10th ACM-SIAM Symp. Discrete Algorithms, pp. S856­S857, January 1999, http: //www.eecs.uic.edu/beigel/papers/mis- soda.PS.gz. [4] M. W. Bern and D. Eppstein. Optimal M¨bius o transformations for information visualization and meshing. Proc. 7th Worksh. Algorithms and Data Structures, pp. 14­25. Springer-Verlag, Lecture Notes in Computer Science 2125, August 2001, arXiv:cs.CG/0101006. [5] M. W. Bern and D. Eppstein. Optimized color gamuts for tiled displays. Proc. 19th Symp. Computational Geometry. ACM, 2003, arXiv:cs.CG/0212007. To appear. [6] K. M. Briggs. XR ­ exact real arithmetic. Open source Python and C++ software, January 2002, http://members.lycos.co.uk/keithmbriggs/XR.html. [7] J. M. Byskov. Algorithms for k-colouring and finding for x = n t may be upper and lower bounded within a polynomial factor of cn , where the number c can be computed as the solution to a quasiconvex program defined from the recurrence and the target vector t. It would be of interest to tighten these bounds: under what conditions can we determine the correct polynomial adjustment to the bound cn ? It is consistent with our observations so far that F (n t) = (cn n(|B |-d)/2 ) where |B | is the cardinality of a basis for the quasiconvex program. For instance this would fit the central binomial coefficients, with |B | = 1 and d = 2, as well as the recurrence used as an example at the start of this paper with |B | = d = 2. However there is too little evidence yet to state such a formula as a conjecture. More generally, the work here is only a first step towards the automation of backtracking algorithm design 796 [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] maximal independent sets. Proc. 14th Symp. Discrete Algorithms, pp. 456­457. ACM and SIAM, January 2003. E. Dantsin and E. A. Hirsch. Algorithms for k-SAT based on covering codes. Preprint 1/2000, Steklov Inst. of Mathematics, 2000, ftp://ftp.pdmi.ras.ru/ pub/publicat/preprint/2000/01- 00.ps.gz. M. Dickerson, D. Eppstein, M. T. Goodrich, and J. Meng. Confluent drawings: visualizing non-planar diagrams in a planar way. ACM Computing Research Repository, December 2002, arXiv:cs.CG/0212046. D. Eppstein. Improved algorithms for 3-coloring, 3-edge-coloring, and constraint satisfaction. Proc. 12th Symp. Discrete Algorithms, pp. 329­337. ACM and SIAM, January 2001, arXiv:cs.DS/0009006. D. Eppstein. Small maximal independent sets and faster exact graph coloring. Proc. 7th Worksh. Algorithms and Data Structures, pp. 462­470. Springer-Verlag, Lecture Notes in Computer Science 2125, August 2001, arXiv:cs.DS/0011009. D. Eppstein. The traveling salesman problem for cubic graphs. ACM Computing Research Repository, February 2003, arXiv:cs.DS/0302030. B. G¨rtner. A subexponential algorithm for abstract a optimization problems. SIAM J. Computing 24:1018­1035, 1995. J. Gramm, E. A. Hirsch, R. Niedermeier, and P. Rossmanith. Better worst-case upper bounds for MAX-2-SAT. 3rd Worksh. on the Satisfiability Problem, 2000, http://ssor.twi.tudelft.nl/warners/ SAT2000abstr/hirsch.html. J. Matouek, M. Sharir, and E. Welzl. A s subexponential bound for linear programming. Tech. Rep. B 92-17, Freie Univ. Berlin, Fachb. Mathematik, August 1992. R. Paturi, P. Pudl´k, M. E. Saks, and F. s Zane. An a improved exponential-time algorithm for k-SAT. Proc. 39th Symp. Foundations of Computer Science, pp. 628­637. IEEE, 1998, http://www.math.cas.cz/pudlak/ppsz.ps. J. Robson. Algorithms for maximum independent sets. J. Algorithms 7:425­440, 1986. G. van Rossum et al. Python Language Website. http://www.python.org/. U. Schoning. A probabilistic algorithm for k-SAT and ¨ constraint satisfaction problems. Proc. 40th Symp. Foundations of Computer Science, pp. 410­414. IEEE, October 1999. 797