Probabilistic Analysis of Knapsack Core Algorithms Rene Beier rbeier@mpi-sb.mpg.de Max-Planck-Institut fur Informatik ¨ Saarbrucken, Germany ¨ Abstract We study the average-case performance of algorithms for the binary knapsack problem. Our focus lies on the analysis of so-called core algorithms, the predominant algorithmic concept used in practice. These algorithms start with the computation of an optimal fractional solution that has only one fractional item and then they exchange items until an optimal integral solution is found. The idea is that in many cases the optimal integral solution should be close to the fractional one such that only a few items need to be exchanged. Despite the well known hardness of the knapsack problem on worst-case instances, practical studies show that knapsack core algorithms can solve large scale instances very efficiently. For example, they exhibit almost linear running time on purely random inputs. In this paper, we present the first theoretical result on the running time of core algorithms that comes close to the results observed in practical experiments. We prove an upper bound of O(n polylog (n)) on the expected running time of a core algorithm on instances with n items whose profits and weights are drawn independently, uniformly at random. A previous analysis on the average-case complexity of the knapsack problem proves a running time of O(n4 ), but for a different kind of algorithms. The previously best known upper bound on the running time of core algorithms is polynomial as well. The degree of this polynomial, however, is at least a large three digit number. In addition to uniformly random instances, we investigate harder instances in which profits and weights are pairwise correlated. For this kind of instances, we can prove a tradeoff describing how the degree of correlation influences the running time. Berthold Vocking ¨ voecking@cs.uni-dortmund.de Fachbereich Informatik Universitat Dortmund, Germany ¨ is to select a subset of the items whose total weight does not exceed the capacity bound b and whose total profit is as large as possible. In terms of an integer linear program (ILP), the problem is maximize subject to and i[n] i[n] p i xi wi x i b xi {0, 1}, for i [n] . We assume that weights and profits are drawn independently, uniformly at random from [0, 1]. Following the conventions in previous analyses [11, 9], the value of n is assumed to be chosen according to the Poisson distribution with parameter N . Furthermore, b = N , for some constant (0, 1 ). 2 Our focus lies on the analysis of so-called core algorithms that have been proven to be the most efficient algorithms in numerous practical studies [7, 8, 14]. This algorithmic concept was suggested by Balas and Zemel [1]. The idea is to start with the computation of an optimal fractional solution with at most one fractional item and then to exchange some of the items until an optimal integral solution is found. The set of items that are candidates to be exchanged, is called the core, and the hope is that the size of the core for "typical instances" is relatively small. As a first step towards analyzing core algorithms, Lueker proved an upper bound on the expected gap between the optimal integral and the optimal fractional solution [11]. Based on this result, Goldberg and Marchetti-Spaccamela [9] were able to prove structural properties of the core resulting in the following bound on the running time of a Las Vegas type core algorithm. For every fixed k > 0, with probability at least 1 - 1/k, the running time 1 Introduction of their algorithm does not exceed a specified upper bound This paper is concerned with a probabilistic analysis of the that is polynomial in the number of items. However, the de0/1 knapsack problem. A subset of n given items has to be gree of this polynomial is quite large, the leading constant packed in a knapsack of capacity b. Each item has a profit in the exponent is at least a large three digit number. Even pi and a weight wi , for i [n] = {1, 2, . . . , n}. The problem more dramatically, the degree of the polynomial grows with the reciprocal of the failure probability like k log(k). Ob Supported by the graduated studies program on "Quality Guarantees serve that this kind of tail bounds does not allow to conclude for Computer Systems" funded by the German Science Foundation (DFG). Supported in part by DFG grant Vo889/1-1. any sub-exponential upper bound on the expected running Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 468 time. This work was later extended to the multidimensional knapsack problem by Dyer and Frieze [4]. Better bounds on the running time are only known for an algorithm by Nemhauser and Ullmann [13]. This algorithm, however, does not follow the core concept but instead applies a dominance criterion to reduce the search space. A subset S [n] with weight w(S) = iS wi and profit p(S) = iS pi dominates another subset T [n] if w(S) w(T ) and p(S) p(T ). For simplicity, let us assume that all sets have different profits. The considered random instances satisfy this assumption with probability 1. Under this assumption, no subset dominated by another subset can be an optimal solution to the knapsack problem, regardless of the knapsack capacity. Consequently, it suffices to consider only those sets that are not dominated by other sets, the so-called dominating sets. In a recent study [2], we showed that, for uniformly random instances, the expected number of dominating sets is bounded by O(n3 ), even if the weights are chosen by an adversary. This result implies an upper bound of O(n4 ) on the expected running time of the Nemhauser/Ullmann algorithm. In fact, experiments show that the running time of this algorithm behaves approximately like n3 . Core algorithms, however, show a much better performance in experiments [7, 8, 14]. Their running time on uniformly random instances is almost linear. Interestingly, the most successful implementations of core algorithms [7, 14] additionally exploit domination in order to decrease the number of subsets generated by the core. 1.1 New results The motivation for our study is to understand and explain the efficiency of knapsack core algorithms on random instances. We present the first theoretical results on the running time of a core algorithm coming close to the results observed in practice. In particular, we prove an upper bound of n polylog (n) on the expected running time of a core algorithm on uniformly random instances. In addition, following a recent trend in practical studies ([7],[15]), we investigate also harder instances in which profits and weights are correlated. Here we prove a tradeoff describing how the running time increases with the correlation. As in the most efficient implementations, the algorithms underlying our analysis combine the core and the domination concept. However, certain details of this combination are quite different. In particular, we use Goldberg and MarchettiSpaccamela's definition of the core [9] and combine it with the enumeration method for dominating sets by Nemhauser and Ullmann [13]. Somewhat surprisingly, this combination of theoretical concepts does not only enable us to do a rigorous mathematical analysis, but also yields a new implementation of a core algorithm that outperforms the best previous implementations by orders of magnitudes on well studied benchmark instances. 1.2 Outline In Section 2 we start with a short overview of the techniques and results from previous work that we apply in our analysis. In Section 3, we present an algorithm with almost linear expected running time on uniformly random instances. This algorithm, however, has some small failure probability, that is, it might compute a sub-optimal solution with polynomially small probability. In Section 4, we show how failures can be handled without increasing the expected running by more than a constant factor. In Section 5, we describe an average-case model for so-called weakly correlated instances and generalize our analysis towards this model. Finally, we briefly present a few preliminary experimental results. 2 Tools and techniques 2.1 Core algorithms Core algorithms start by computing an optimal solution for the relaxed or fractional knapsack problem. In this problem, the constraints xi {0, 1} are replaced by 0 xi 1. An optimal solution to the fractional problem can be found by the following greedy algorithm [1]. Starting with the empty knapsack and we add items one by one in order of non-increasing profit-to-weight ratio1 . The algorithm stops when coming to the first item that would exceed the capacity bound b. This item is called break item and the computed knapsack filling not including the break item is called break solution. It has been shown that the break solution, and hence also the fractional solution, can be found in linear time by solving a weighted median problem [1]. For a geometrical interpretation, let us map each item wi , pi to the point (wi , pi ) 2 . Then the greedy algorithm described above can be pictured as rotating a ray clockwise around the origin starting from the vertical axis and inserting all swept items until the insertion of the current item would exceeds the capacity. The ray defined by the break item is called the Dantzig ray (see Figure 1). There is a strong motivation to start with the computation of the break solution. It has been observed in practical studies that the break solution is quite similar to the optimal integral solution in the sense that the they differ in only a few variables. So only a few items need to be exchanged to transform the break solution to an optimal solution. The problem is, of course, that we do not know in advance which items to exchange. Therefore, one uses an appropriate subset of candidate items, called the core. Assume the core contains all items that needs to be exchanged to obtain an optimal solution, then items outside the core can be fixed to the value given by the break solution. There are various different ways to define the core. 1 In previous studies, the profit-to-weight ratio of an item is also called its density. In this paper, the term density solely refers to the density function describing the probability distributions of the profits. 469 1 i profit loss(i) loss(j) j 0 weight 1 Dantzig Ray Break Item with loss at most . The value of can be obtained, e.g., by guessing or by adding the items in order of increasing loss to the core until the loss of the next item is not smaller than the difference in profit between the optimal fractional solution and the best integer solution computed so far. 2.2 Properties of the core Lueker's analysis [11] bounds the expected integrality gap for uniformly random instances. In particular, he shows 2 E [] = O( log N ). Goldberg and Marchetti-Spaccamela [9] N observe that this analysis can easily be generalized to obtain exponential tail bounds on . L E M M A 2 . 1 . There is conant c0 such that for every st log2 N 4 log N , Pr c0 N 2 - . 2 Figure 1: Dantzig ray and Break Item. The core stripe has vertical width 2 with denoting the integrality gap. The loss of an item is the vertical distance to the Dantzig ray. Goldberg and Marchetti-Spaccamela [9] define the core as follows. Assume the items are given in order of nonincreasing profit-to-weight ratio and let denote the index of the break item. The solution vector for the fractional problem ¯ has the form X = (1, . . . , 1, f , 0, . . . , 0) where f (0, 1] is the entry at position . For any feasible integer solution X = (x1 , . . . , xn ), define ch(X ) = {i [n] : xi = xi }, i.e., ch(X ) ¯ ¯ is the set of items on which X and X do not agree. When removing an item from the break solution the freed capacity can be used to include items that have profit-to-weight ratio at most r := p /w corresponding to the slope of the Dantzig ray. Obviously, replacing items by other items with same weight but smaller profit-to-weight ratio leads to some loss in profit. This loss can only be compensated by filling the knapsack with more weight, that is, by decreasing the wasted capacity. One can quantify the loss that is incurred by replacing a "valuable" item ( pi /wi r) by a "cheap" item ( pi /wi r) as follows. Based on the slope r := p /w of the Dantzig ray define loss(i) = | pi - rwi |, i.e., loss(i) corresponds to the vertical distance of item i to the Dantzig ray. Goldberg and Marchetti-Spaccamela proved that the difference in profit between any feasible integer solution X ¯ and the optimal fractional solution X can be expressed as + b i[n] ¯ x i p i - xi p i = r i[n] - i[n] x i wi ich(X ) loss(i) . The first term on the right hand side corresponds to an unused capacity of the integer solution X whereas the second term is due to the accumulated loss of all items in ch(X ). Define P(X ) = i[n] xi pi , and let X be an optimal integral solution. ¯ The integrality gap = P(X ) - P(X ) gives an upper bound for the accumulated loss of X and therefore an upper bound for the individual loss of each item in ch(X ). Thus, all items in ch(X ) have vertical distance at most from the Dantzig ray. Hence, the core can be defined to consist of all items In words, the integrality gap does not exceed O( log N ), N with high probability. Let us remark that the value of c0 depends on , the constant determining the knapsack capacity. The constraint log4 N satisfies our requirements but, in general, the tail bound can be extended to hold for every N , for every fixed < 1 . We will use this bound to 2 obtain a tail bound on the number of items in the core. Goldberg and Marchetti-Spaccamela [9] use this tail bound for their probabilistic analysis of a core algorithm. Let us briefly sketch their analysis and explain why it fails to bound the expected running time. It is the basis for our modifications to the core algorithm. Let X denote the number of core items, i.e., items with vertical distance at most from the Dantzig ray. Basically, the expected value of X corresponds to N times the area covered by the -region around the Dantzig ray, that is, E [X ] 2N . (There are some slight dependencies that we neglect here. In fact, the Dantzig ray has some tendency to fall into dense regions.) If is fixed then this number is sharply concentrated. Furthermore, because of Lemma 2.1 the 2 random variable is sharply concentrated around O( log N ). N Combining these results, it follows X = O(log2 N ), with high probability. Consequently, the number of sets generated by the core items is 2X = N O(log N ) . Obviously, enumerating all these sets cannot be done in polynomial time. Therefore, Goldberg and Marchetti-Spaccamela use a further trick to significantly reduce the number of sets enumerated. They use a filtering mechanism that we call loss filter. This mechanism generates only those sets with loss at most . For every fixed > 0, they show that the number of sets generated is 2O( X ) = N O(1) , with probability 1 - . Unfortunately, the degree of this polynomial grows rapidly with the reciprocal of the failure probability . Roughly speaking, this is because the random variable X has moved to the exponent so that small deviations in X might cause large deviations in the running time. For this reason, the analysis 470 2.3 The Nemhauser/Ullmann algorithm Fix a knapsack instance with n items. Recall, that a subset S [n] with weight w(S) = iS wi and profit p(S) = iS pi dominates another subset T [n] if w(S) w(T ) and p(S) p(T ). For simplicity, let us assume that all sets have different profits. Observe that the considered random instances satisfy this assumption with probability 1. Sets that are dominated by other sets cannot be optimal solutions to the knapsack problem, regardless of the specified knapsack capacity. Consequently, it suffices to consider those sets that are not dominated by other sets, the so-called dominating sets. In other terminology, dominating sets are Pareto-optimal solutions, i.e., solutions that cannot be improved in weight and profit simultaneously by other solutions. Nemhauser and Ullman [13] introduced the following elegant algorithm that computes the sequence of all dominating sets in a very efficient way. For i [n], let Si be the sequence of dominating subsets over the items 1, . . . , i. The sets in Si are assumed to be listed in increasing order of their weights. Given Si , the sequence Si+1 can be computed from Si as follows. First duplicate all subsets in Si and then add item i + 1 to each of the duplicated sets. In this way we obtain two ordered sequences of sets. Now we merge the two sequences by removing those subsets that are dominated by any other set in the union of the two sequences. The result is the ordered sequence Si+1 of dominating sets over the items 1, . . . , i + 1. The sequence Si+1 can be calculated from the sequence Si in time linear in the length of Si , that is, linear in the number of dominating subsets over the items 1, . . . , i. Since the optimal knapsack filling is described by one of the subsets in the list Sn , namely the subset with largest weight not exceeding the capacity b, generating Sn solves the knapsack problem. This yields the following lemma. Combining the two lemmas, it follows that the expected running time of the Nemhauser/Ullmann algorithm is O(µn5 ). Observe that profits can be rescaled such that µ = 1, unless there is an item whose expected profit is unbounded. The interesting parameter is the maximum density . The bound on the expected number of dominating sets increases linearly with the maximum density. The same holds for the expected running time. Saying it the other way around, the less randomness is available, the larger the expected running time. 3 Algorithm FastCore 1: filtering dominated solutions Our first algorithm has almost linear running time but fails with a small probability. We use a static core consisting of all items with loss at most d = cd N -1 log3 N , for a suitable constant cd . In Figure 2a, core items correspond to those items falling into the regions A or B. We use the Nemhauser/Ullmann algorithm to generate all dominating sets over the core items. The items in region C, i.e. items outside the core and above the ray, are virtually added to these sets. Among all dominating sets satisfying the capacity bound the most profitable one is selected. If the profit of this set differs from the profit of the optimal fractional solution by at most d then we have a proof of optimality and the algorithm outputs this set, otherwise the algorithm outputs failure. The correctness of this algorithm follows from the discussions in the previous sections. The algorithm fails only if the chosen height of the core stripe is smaller than the integrality gap, that is, if d < . Notice, that the algorithm might have nevertheless found the optimal solution but only the proof of optimality is missing. From Lemma 2.1, it follows that the failure probability is Pr [ > d ] 2-cd log N /c0 = N -cd /c0 . In the following probabilistic analysis, we will show L E M M A 2 . 2 . For every i [n], let q(i) denote the number of dominating sets over items 1, . . . , i and assume E [q(i + 1)] that the expected running time of our algorithm is E [q(i)]. The Nemhauser/Ullman algorithm computes an O(N polylog N ). This analysis is quite tricky and compli-1 optimal knapsack filling in expected time O(n=1 E [q(i)]) = cated because of vast dependencies between different rani dom variables. The central ideas, however, are rather simple O(n · E [q(n)]). and elegant. Thus, before going into the details, let us try to In [2] it is shown that E [q(n)] = O(n3 ) for uniformly give some intuition about the main ideas behind the analysis. random instances. Hence, the expected running time for As the area covered by the core stripe is O(N -1 log3 N ), the 471 ¡ of Goldberg and Marchetti-Spaccamela fails to bound the expected running time. Moreover, constant factors lost in the analysis of X and go directly to the exponent of the polynomial running time bound. Instead, we will replace the loss filter by a better filtering mechanism reducing the number of enumerated sets from 2O( X ) to N X O(1) . This way, we will be able to bound the expected running time by N polylog N . Our filtering mechanism is based on the following dominance criterion. these instances is O(n4 ). Furthermore, an analysis is presented which allows adversarial weights and randomly drawn profits that possibly follow different probability distributions. L E M M A 2 . 3 . Suppose profit pi is drawn according to a continuous, nonnegative probability distribution with density function fi : 0 0 . Suppose µ maxi[n] (E [ pi ]) and . s Then there is a constant maxi[n] upx 0 ( fi (x)) c1 >0 such that the expected number of dominating sets is E [q] c1 µn4 + 1. 1 C d profit A B D 0 weight 1 d 1 G F G 1-1/N 1 H F C 1-1/N Dantzig Ray profit Break Item B profit A D 1/N 1 0 H 1/N 1 0 G F weight F weight Figure 2: The core stripe consists of regions A and B which are defined by the Dantzig Ray. Items in region G have insufficient variation in profit. 1 1 G ={(w, p) A | rw > 1 - N } {(w, p) B | rw < N } 1 {(w, p) B | rw - d > 1 - N } . very additional item can increase the number of dominating sets at most by a factor of two. For this reason, we can assume that the break item is covered by the bound in Lemma 3.1 as well. Furthermore, can apply this fact to the items in G and obtain that the number of dominating sets over the core items is qAB 2 XG · q(AB)\G. The running time of the Nemhauser/Ullmann algorithm is roughly E [qAB ] times the number of core qtems. In = e following, i th we will (implicitly) show that E (AB)\G N polylog N 2 = and E XG O(1). Unfortunately, however, the random 472 ¢ Proof. The bound on the failure probability follows from previous discussions. It remains to prove the upper bound on the expected running time. Let A and B denote the set of points above and below the Dantzig ray r with vertical distance at most d = cd N -1 · log3 N from r, respectively. In the following, we identify the ray with its slope, i.e., r = p /w with denoting the index of the break item. Define LEMMA 3.1. E any E . j q (AB)\G | X(AB)\G T H E O R E M 3 . 1 . For uniformly random instances, algorithm FastCore 1 computes an optimal solution with probability at least 1 - N -cd /c0 . The expected running time of this algorithm is O(N polylog N ). number of core items is only O(log3 N ), with high probability. The running time of the Nemhauser/Ullmann algorithm depends polynomially on the number of core items and is linear in , i.e., the maximum density of the probability distributions for the profits of the core items. When calculating , we have to condition on the fact that these items have weights and profits that fall into the core stripe. The height 3 of the core stripe is ( log N ). Thus, intuitively, the profit N of a "typical" core item follows a uniform distribution over 3 an interval of length ( log N ). More formally, we will show N that the conditional probability distributions for the profits of almost all core items have density at most N . Hence, N . By Lemma 2.3, the expected number of dominating sets is at most polylog N = N polylog N . Thus, by Lemma 2.2, the running time can be upper-bounded by N polylog N as well. Hence, interestingly, the linear term in the upper bound on the running time is not due the number of core items but due to the density of their profit distributions. 1 1 Define F = {(x, y) 2 : x [0, 1] y [0, N ] [1 - N , 1]}. Figure 2b depicts these regions. The motivation for these definitions will become clear soon. For a moment, let us assume that r and the number of items in A and B as well as their individual weights are fixed arbitrarily. Consider an item i with weight wi in region A. The profit of this item corresponds to a point on the line segment Li = { p [0, 1] : ( p, wi ) A}. Observe that the Dantzig ray depends on the weight wi of this item but not on its profit. In particular, moving the point corresponding to item i arbitrarily on the line segment Li does not affect the position of the Dantzig ray. Under these assumptions, the random variable pi is chosen uniformly from the interval Li . The same holds for the items in region B. Observe that the profit intervals for the items in (A B) \ G have length at least 1/N , except for the break item that will be handled separately. As a consequence, the density of the profit distributions for these items is upper bounded by N . Hence, applying Lemma 2.3 yields the following bound on the expected number of dominating sets for these items. For any given region R 2 , let XR denote the number of items in R and qR the number of dominating sets over these items. =j c1 N j4 + 1, for In the following, we upper-bound the two probability terms occuring in this bound one after the other. We start Proof. The idea is to consider only a few essential positions of the ray r and to sum the probability over all these k=0 g=0 positions. For this purpose, we define a set of overlapping parallelograms Ri having the property that any given Next replacing f (k, g) by c2 N (k - g + 7)5 2g and rearranging core stripe A B is completely covered by two of these parthe sums yields allelograms. The first l = 1/(2d) parallelograms cover the right lower triangle of the unit square U . Parallelo X 2 gram Ri (i = 1, . . . , l ) is a quadrangle with corner coordinates E [T ] c2 N Pr (AB)F = g g g=0 (0, d ), (0, -d ), (1, i2d - 2d ) and (1, i2d ). Another l parallel ograms cover the upper left triangle of the unit square U . X ( Pr AB = k | X(AB)F = g k - g + 7)5 Parallelogram Rl+i (i = 1, . . . , l ) is a quadrangle with corner k=g coordinates (-d , 0), (d , 0), (i2d , 1) and (i2d - 2d , 1). Every X 2g parallelogram covers an area of size 2d . It is easy to ver= c2 N Pr (AB)F = g ify that this set of regions has the required property. Let g=0 Xi denote the number of items in region Ri . Then E [Xi ] = X ( Pr (AB)\F = k | X(AB)F = g k + 7)5 . area(U Ri ) · N < 2d N = µ. Using Chernoff bounds it holds k=0 for all 1 and i [2l ]: e-1 µ Let us switch to a more compact notation and define X = Pr [Xi µ] . X(AB)\F and Y = X(AB)F . This way, e-1 µ (3.1) l Hence, Pr [i : Xi µ] 2=1 Pr [Xi µ] = 2l i e-1 µ e-1 µ E [T ] c2 N Pr [Y = g] 2g Pr [X = k|Y = g] (k + 7)5 . Tnd so Pr [X 2µ] 2l N . a g=0 k=0 by proving =0 Pr [X = k | Y = g] (k + 7)5 = O(polylog N ), k for any choice of g 0. Afterwards, we show that =0 Pr [Y = g] 2g = O(1). Hence, E [T ] = O(polylog N ) so g that the theorem follows. First, let us study the term Pr [X = k | Y = g]. Observe, the variables X and Y describe numbers of points in the disjoint regions (A B)\F and (A B) F , respectively. If these regions would be fixed, then these two variables would be independent as points are generated by the Poisson distribution. Unfortunately, however, both regions are defined by the Dantzig ray r and this ray somehow depends on the E [T | XAB = k XG = g] points found in these regions. In fact, both of the consid4 4 g ered regions are adjacent to r, and r has some tendency to c2 [(N (k - g) + 1)(k - g) + ((N (k - g) + 1)2 )] , fall into a dense region. Therefore, we have to take into acfor every k g 0 and a suitable constant c2 . Define count the dependency of the variables X and Y on the random f (k, g) = c2 N (k - g + 7)5 2g . Notice that f is defined in a variable r. We deal with this dependency by placing worstway such that it is monotonically increasing in g, for every case assumptions on r. In particular, we assume an adver0 g k. Rewriting the above bound in terms of this sary knowing all points in the unit square chooses the ray r to be any ray through the origin instead of assuming that r function, we obtain the following slightly weaker bound. is the ray through the break item. The regions (A B) \ F E [T | XAB = k XG = g] f (k, g) . and (A B) F are now defined with respect to this adversarial ray r. Let µ = 2d N . Observe that the value of E [X ] is Applying first G A B and then G (A B) F combined roughly equal to µ. with the monotonicity property of f yields L E M M A 3 . 2 . For ee ery µ dversarial choice of the ray r, v a k -1 , for every 1 Pr [X 2µ] N E [T ] Pr [XAB = k XG = g] f (k, g) variables XG and q(AB)\G are not completely independent as both of them depend on the position of the Dantzig ray. The major difficulty in the remaining analysis is to formally prove that this kind of dependence is insignificant. We assume that the algorithm first computes the dominating sets over the items in region (A B) \ G adding the items in some order that is independent of the profits, e.g., in order of increasing weight. Afterwards, the items from region G are added. Let T denote the running time of this algorithm. Lemma 2.2 combined with Lemma 3.1 yields k=0 g=0 k Pr X AB = k X(AB)F = g f (k, g) . he above tail bound on X holds for any adversarial choice of a ray r. Consequently, when letting r denote the Dantzig 473 ray, the bound holds for any outcome of this ray as well. Furthermore, it holds for any choice of the variable Y , too, as the dependence between the variables X and Y is only via the position of the Dantzig ray. As a consequence, for every g 0, k=0 inally, combining Lemma 3.3 with the Equations 3.2 and 3.3 yields 2 E [T ] c2 c3 N log15 N E XF c2 c3 e2 N log15 N . 4 Thus Theorem 3.1 is shown. Algorithm FastCore 2: two lists of dominating sets In order to obtain an algorithm that always computes an optimal solution, one can dynamically expand the core until algorithm FastCore 1 is successful. A somewhat extreme variant of this approach is to start with a core stripe that immediately gives a high success probability, say 1 - N -3 , and to use a single backup routine that computes all dominating sets using the Nemhauser/Ullmann algorithm if the core algorithm fails. The analysis in [2] shows that the expected running time for computing all dominating sets under the uniform distribution is only O(N 4 ). Thus, neglecting dependencies, this approach promises an expected running time of N -3 O(N 4 ) = O(N ) for the backup routine. Let us have a closer look at this approach. The running time is mainly determined by the number of dominating sets. Let q all , qin , and qout denote the number of dominating sets over all items, over the items inside and over the items outside the core stripe, respectively. Let E denote the event that the core computation is successful. Then the expected number of dominating sets is E [qin ] + E [q all |¬E ] · Pr [¬E ]. The difficulty lies in estimating E [q all |¬E ]. Observe that the event ¬E , by definition, is very unlikely. Thus the value of q all might be extremely biased by ¬E , and, hence, the running time of the Nemhauser/Ullmann algorithm conditioned on ¬E might be much larger than O(N 4 ). In order to bypass the difficulties caused by dependencies, we use a different approach utilizing two lists of dominating sets. First, we compute a list with all dominating sets over the items in the core. If this computation does not find an optimal solution, we compute a second list with all dominating sets over the items outside the core. Observe that combining these lists to obtain all dominating sets yields a combined list of maximum length qin · qout . This way the expected number of dominating sets is upper-bounded by E [qin |E ] · Pr [E ] + E [qin · qout |¬E ] · Pr [¬E ]. In the following analysis, we will be able to give a good estimate for E [qout |¬E ] as the random variable qout is only slightly biased by ¬E . Still this does not immediately solve the problem since the variable qin heavily depends on the event ¬E so that we are not able to upper-bound E [qin |¬E ] appropriately. Now the key observation is that we do not need to compute all dominating sets but we only need to find the set among them that is maximal with respect to the given capacity bound b. This, however, given the two sorted lists of dominating sets can be done in time O(qin + qout) instead of O(qin · qout ) by scanning the two sorted lists as described by Horowitz and Sahni in [10]. They use this technique to reF Pr [X = k | Y = g] (k + 7)5 Pr [X 4µ | Y = g] (4µ + 7)5 + =2 Pr [X 2µ | Y = g] (2( + 1)µ + 7)5 5 =2 (4µ + 7) + N e-1 µ (2( + 1)µ + 7)5 . Using µ = 2d N = 2cd log3 N yields e-1 µ N (2( + 1)µ + 7)5 =2 e-1 2cd log3 N 5 = µ N =2 5 2 7 +2+ . µ For any fixed cd > 0, this term is bounded by O(µ5 ). Consequently, there exists a constant c3 > 0, such that k=0 Pr [X = k | Y = g] (k + 7)5 c3 log15 N . Applying this upper bound to Equation 3.1 gives (3.2) E [T ] c2 c3 N log15 N · Pr [Y = g] 2g . g=0 We further simplify. (3.3) g=0 Pr [Y = g] 2g = E The latter term can be bounded as follows. 2 =2 L E M M A 3 . 3 . E XF e. 2Y = E 2 X(AB)F E 2X F . Proof. The number of points in F is a Poisson random variable whose mean corresponds to the area covered by F times N because the number of points in the unit square is a Poisson variable with mean N and points are placed uniformly at random. Obviously, the area of F is 2/N . Hence, the number of points in F follows the Poisson distribution with parameter 2. Consequently, E 2X F = f 0 Pr [X f = f ] 2 f = = e-2 4f = e-2 e4 = e2 . f! f 0 e-2 · 2 f f ·2 f! f 0 474 duce the worst case running time of the Nemhauser/Ullmann algorithm from O(2n ) to O(2n/2 ). We use their idea to deal with the dependencies in our average-case analysis. Using this technique, the expected number of dominating sets generated by our algorithm can be estimated by E [qin ] + E [qout |¬E ]. We have shown E [qin ] = O(N polylog N ) in the proof of Theorem 3.1. The following analysis mainly deals with bounding E [qout |¬E ]. T H E O R E M 4 . 1 . Algorithm FastCore 2 always computes an optimal solution. Its expected running time is O(N polylog N ). Proof. We begin the proof by specifying some details of the algorithm that are missing in the description above. The algorithm uses a fixed core stripe with d = 5c0 (log N )3 /N , where c0 is the constant given in Lemma 2.1. Let us adopt the notation from the previous section for the regions defined by the core stripe as depicted in Figure 2. Let C and D denote the regions above and below the core stripe, respectively. For the purpose of a simple analysis, we add the region H (see Figure 2c) to the core stripe. The analysis of the running time of algorithm FastCore 1 for the items inside the core is not affected by this change as we upper-bounded XG by XF and F G H . Including H into the core stripe ensures that all items in C D follow a distribution with density at most N . Let T , TABH , and TCD denote the running time of algorithm FastCore 2, algorithm FastCore 1 applied to the items in A B H , and the Nemhauser/Ullmann algorithm applied to the items in C D, respectively. Furthermore, let E be the event that the core stripe was chosen sufficiently large, i.e., the integrality gap is at most d . We account the time for combining the two lists to the time needed for their computation. This way, The analysis of algorithm FastCore 1 yields E [TABH ] = O(N polylog N ). In the following, we will show E [TCD | ¬E ] N polylog N , Pr [¬E ] E [T ] = E [TABH ] + Pr [¬E ] · E [TCD | ¬E ] Observe that Lemma 2.1 with d = 5c0 (log N )3 /N yields Pr [¬E ] 2-5 log N = N -5 . Hence, it suffices to show that X = E 5 | ¬E O(max{N 5 , Pr [¬E ]-1 }). The idea is that the random variable X is very sharply concentrated around its mean E [X ] < N so that conditioning on ¬E does not significantly change the expected value of X . For every 1, X X = E 5 | ¬E Pr 5 i | ¬E i=1 5 Let us switch to a more compact notation and define X = XCD . It remains to show p . X = olylog N E 5 | ¬E O Pr [¬E ] + Pr i= As X follows a Poson distribution with mean at most N , is X Pr 5 (2N )5 exp(- 1 N ), for every 1. Hence, 2 X5 -1 . for i (2N )5 , Pr i exp 4 i1/5 Now setting = (2N )5 gives = X - 1 1/5 E 5 | ¬E (2N )5 + Pr [¬E ]-1 exp 4i i= X i | ¬E PX Pr 5 i + r [¬E ] i= . 5 This completes the proof of Theorem 4.1. (2N )5 + Pr [¬E ]-1 O(1) . which yields the theorem. First, let us verify that every item in C D follows a distribution with density at most N . For a moment assume that the Dantzig ray is fixed. Suppose an adversary specifies the weight wi of an item i in region C (or analogously in region D). Then the item can be moved up and down on the line segment Li = { p [0, 1] : ( p, wi ) C} without affecting the event ¬E . The length of this line segment is at least 5.1 Integrality gap for -correlated instances 1 N as we added the region H to the core. Consequently, independent of the outcome of ¬E , the profit of each item In this section we prove an upper bound on the integrality follows a uniform distribution with density at most N . Thus gap for -correlated instances. for all k , L E M M A 5 . 1 . There s a constant c0 uch that for every i s 2N 4 5 2 - . 1 log N , Pr c0 N log E [TCD | ¬E XCD = k] = O(k N ) . ¢ Correlated instances Several experimental studies [7, 8, 14, 15] do not only investigate uniformly random instances but also some other, harder classes of random inputs, e.g., so-called "weakly correlated" instances. We define -correlated instances, 0 < 1, a parameterized version of Weakly correlated instances (the latter correspond to -correlated instances with = 10%) as follows. All weights are randomly drawn from [0, 1]. Profits are set to a random perturbation of the corresponding weight value, i.e., for all i [n], pi := wi + ri , where ri is a random variable uniformly distributed in [-/2, /2]. This can be seen as choosing n points independently at random from the quadrangle defined by the points (0, /2), (0, -/2), (1, 1 - /2), (1, 1 + /2) (see Figure 3). For our analysis, we again assume that the value of n is chosen according to the Poisson distribution with parameter N , and b = N , for some constant (0, 1 ). 2 475 1+/2 1 1-/2 G profit Q d F A Q B /2 0 -/2 weight d Figure 3: For -correlated instances items are uniformly sampled from region Q. The right figure (magnified rectangle from the left figure) shows an example for core stripes A and B with height d . Items in region G F have insufficient randomness. Proof. (sketch). We adapt the proof from Lueker [11] for uniform instances. Please refer to [11] for details. In particular we compare the solution of an approximation algorithm with the optimal fractional solution. The difference of the objective values is an upper bound on the integrality gap. The approximation works as follows. Define k := log4 (N /) and = O(k4-k ). Assume that items are ordered according to non-decreasing profit-to-weight ratio. Starting with the empty knapsack we insert items in this order until the remaining knapsack capacity is about vk, where v is the expected weight of the next item (the average weight of points in a region swept by the density ray when slightly rotating it further). Let cap be the remaining capacity of the knapsack. We then repeatedly consider successive sets S1 , S2 , . . . of about 2k items trying to find a subset S Si with weight in [cap - 2, cap]. The probability that we find such a subset 1 is at least 2 for every Si . This can be shown with the help of the following lemma, which essentially shows that subsets of random numbers lay exponentially dense. notes the number of iterations performed by the algorithm and r0 and rl are the slopes of the density ray before the first and after the last iteration, respectively. In each iteration the density ray advances O(k/N ) with high probability. The accumulated loss of items in S Sl is O(l k2 /N ). In each iteration the success probability is at least 1 , therefore 2 5 Pr [l > ] 2- , for all . .2 Running time We adapt algorithm FastCore 2 to the new situation by using a smaller core stripe, that is, we set d = cd N log3 N instead of -1 log3 N . This way, we obtain the following result. d = cd N T H E O R E M 5 . 1 . The expected running time of FastCore 2 on -correlated instances is O( N polylog N ). ¢ Proof. (sketch). Let Q be the region from where we sample the items (see Figure 3). The area of Q has size . Compared to the uniform model, the concentration of items is larger by factor 1/. Choosing core height d = cd N log3 N we expect 3N 3 L E M M A 5 . 2 . (Lueker [11]) Let f be a piecewise continuous about 2cd log core items in contrast to 2cd log N for the density function with domain [-a, a]. Suppose f is bounded uniform case. Let A and B denote the core regions above and and has mean 0 and variance 1. Let xk be a real sequence below the Dantzig ray, respectively. Consider the case when with xk = o( k). Suppose we draw 2k variables X1 , . . . , X2k the slope of the Dantzig ray is larger than 1 (the other case is according to f . Then, for large enough k, the probability similar). Define regions F and G with G F (see Figure 3). that there exists some k-item subset S [2k] with iS Xi F ={(x, y) Q : y - x [d - /N , d ]} , [xk - , xk + ] is at least 1 , provided = 7k4-k . 2 G ={(w, p) A | (w, rw) F } {(w, p) B | (w, rw - d ) F }. The profit-gap between the approximate and the optimal For items in (A B) \ G the maximum density of the fractional solution has two contributions: residual capacity profit distribution is N /. Therefore, for any j , and cumulated loss of packed items. When the approxima- E q c1 (N /) j4 + 1. Items in G (AB)\G | X(AB)\G = j tion algorithm find a suitable subset S, the residual capacity have larger densities, so we again pessimistically assume that of the knapsack is at most 2 causing a loss in profit of at each of these items doubles the number of dominating sets. most r2 = O( r · log N ), where r = p /w is the slope of We have chosen the size of F so that on average there is only N the Dantzig ray. With high probability the slope r is upper- one item in F . This way our analysis, which accounts also bounded by a constant term depending on . The cumulated for items in G, applies here as well. 4 loss can be estimated by cap(r - rl ) k(r0 - rl ) where l de¢ 76 1/ 2 4 8 16 32 64 128 256 512 1024 Dom 0.34 1.01 2.88 7.47 17.83 41.28 109.31 ­ ­ ­ DomF LossF 0.07 0.16 0.32 0.77 2.54 6.83 16.92 36.13 86.51 202.01 DomF LossF 2-lists ­ ­ 0.01 0.01 0.02 0.02 0.04 0.06 0.11 0.18 combo 0.02 0.04 0.07 0.13 0.27 0.54 1.23 3.05 8.42 21.47 much more interesting. Table 1 presents first measurements obtained on a Sun FireTM 15K. As our theoretical results suggest, the running time increases slightly super-linear in 1/ for most of the implementations. The implementation using all the features shows even a slightly sub-linear behavior. We want to point out that these experimental result are only preliminary, and we plan to continue with a more thorough study. 7 Acknowledgments We want to thank David Pisinger for providing us with the code of the combo knapsack implementation. References instances of size n = 10000 without preprocessing. Each entry gives the average over 1000 test instances. Columns 2­4 give the numbers for three variants of our algorithm implementing a specified subset of features: DomF = dominating-set filter; LossF = Loss-filter; combo = knapsack solver by Pisinger. We fixed = 0.4 for all experiments. Table 1: Average running time (in seconds) for -correlated 6 A few notes on experimental results We implemented a core algorithm that combines the ideas presented in this and other theoretical studies. Our implementation uses the core concept from Goldberg and Marchetti-Spaccamela considering only items in a stripe around the Dantzig ray. In contrast to our theoretical analysis, our experimental study uses a dynamically growing core. This way the algorithm automatically finds the optimal core size. The reason why we immediately switch to a dynamically growing core is that any algorithm with a static core cannot seriously compete with this dynamic approach. In the theoretical analysis we only assumed a static, fixed size core because a dynamically growing core introduces additional dependencies that we cannot analyze. Our experimental study shows, if we combine all the concepts presented in the preceding sections, then we obtain an implementation that can outperform the best known previous implementations. In particular, we apply the loss filter from Goldberg and Marchetti-Spaccamela as well as the dominance filter based on the Nemhauser/Ullmann algorithm. In addition, we use two lists, which are finally merged using the linear time algorithm of Horowitz and Sahni. Our implementation beats the best previous implementation by Pisinger (combo code with 64-bits integer arithmetic [7]) on -correlated instances by several orders of magnitudes, but only if we add all the features listed above. In fact, for uniformly random instances the running time is dominated by the time to find the optimal fractional solution and there are no significant differences between different implementations. The experimental results for -correlated instances are [1] E. Balas and E. Zemel. An algorithm for large zero-one knapsack problems. Operations Research, Vol. 28, pp. 11301154, 1980. [2] R. Beier and B. Vocking. Random Knapsack in Expected ¨ Polynomial Time. Proceedings of the 35th ACM Symposium on Theory of Computing (STOC), 232­241, 2003. [3] G. B. Dantzig. Discrete Variable Extremum Problems. Operations Research, Vol 5, pp. 266-277, 1957. [4] M. E. Dyer and A. M. Frieze. Probabilistic Analysis of the Multidimensional Knapsack Problem. Mathematics of Operations Research 14(1), 162-176, 1989. [5] A. M. Frieze. On the Lagarias-Odlyzko algorithm for the Subset-Sum Problem. SIAM J. Comput. 15(2), 536-539, 1986. [6] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, 1979. [7] S. Martello, D. Pisinger and P. Toth. New Trends in Exact Algorithms for the 0/1 Knapsack Problem. European Journal of Operational Research, Vol. 123, pp. 325-332, 2000. [8] S. Martello and P. Toth. Knapsack Problems ­ Algorithms and Computer Implementations. Wiley, 1990. [9] A. Goldberg and A. Marchetti-Spaccamela. On Finding the Exact Solution to a Zero-One Knapsack Problem. Proceedings of the 16th ACM Symposium on Theory of Computing (STOC), 359­368, 1984. [10] E. Horowitz and S. Sahni. Computing Partitions with Applications to the Knapsack Problem, J. of the ACM, Vol. 21, 277­ 292, 1974. [11] G. S. Lueker. On the Average Difference Between the Solutions to Linear and Integer Knapsack Problems. Applied Probability - Computer Science, The Interface, Vol. 1, Birkhauser ¨ 1982. [12] G. S. Lueker. Exponentially Small Bounds on the Expected Optimum of the Partition and Subset Sum Problems. Random Structures and Algorithms, 12, 51-62, 1998. [13] G. Nemhauser and Z. Ullmann. Discrete Dynamic Programming and Capital Allocation. Management Science, 15(9), 494­505, 1969. [14] D. Pisinger. Algorithms for Knapsack Problems. Ph.D. thesis, DIKU, University of Copenhagen, 1995. [15] D. Pisinger and P. Toth. Knapsack Problems. In D-Z. Du, P. Pardalos (ed.), Handbook of Combinatorial Optimization, vol. 1, Kluwer Academic Publishers, 299-428, 1998. 477