On-line sequential bin packing 1 ´ ¨ ´ ¨ ´ Andras Gyorgy1 and Gabor Lugosi2 and Gyorgy Ottucsak3 Machine Learning Research Group, Computer and Automation Research Institute, Budapest, Hungary gya@szit.bme.hu 2 ICREA and Department of Economics, Universitat Pompeu Fabra, Barcelona, Spain gabor.lugosi@gmail.com 3 GusGus AB, Budapest, Hungary ottucsak@gmail.com Abstract We consider a sequential version of the classical bin packing problem in which items are received one by one. Before the size of the next item is revealed, the decision maker needs to decide whether the next item is packed in the currently open bin or the bin is closed and a new bin is opened. If the new item doesn't fit, it is lost. If a bin is closed, the remaining free space in the bin accounts for a loss. The goal of the decision maker is to minimize the loss accumulated over n periods. The main result of the paper is an algorithm that has a cumulative loss not much larger than any strategy that uses a fixed threshold at each step to decide whether a new bin is opened. 1 Introduction In the classical off-line bin packing problem, an algorithm receives items (also called pieces) x1 , x2 , . . . , xn (0, 1]. We have an infinite number of bins, each with capacity 1, and every item is to be assigned to a bin. Further, the sum of the sizes of the items assigned to any bin cannot exceed its capacity. A bin is empty if no item is assigned to it, otherwise, it is used. The goal of the algorithm is to minimize the number of used bins. This is one of the classical N P-hard problems and heuristic and approximation algorithms have been investigated thoroughly, see, e.g., Coffman, Garey, and Johnson [3]. Another well-studied version of the problem is the socalled on-line bin packing problem. Here items arrive one by one and each item xt must be assigned to a bin (with free space at least xt ) immediately, without any knowledge of the next pieces. In this setting the goal is the same as in the off-line problem, that is, the number of used bins is to be ´ minimized, see, e.g., Seiden [8], He and Dosa [6]. In both the off-line and on-line problems the algorithm has access to the bins in arbitrary order. In this paper we The first author acknowledges support by the Hungarian Scientific Research Fund (OTKA F60787) and the Mobile Innovation Center of Hungary. The second author acknowledges support by the Spanish Ministry of Science and Technology grant MTM2006-05650 and by the PASCAL Network of Excellence under EC grant no. 506778. abandon this assumption and introduce a more restricted version that we call sequential bin packing. In this setting items arrive one by one (just like in the on-line problem) but in each round the algorithm has only two possible choices: assign the given item to the (only) open bin or to the "next" empty bin (in this case this will be the new open bin), and items cannot be assigned anymore to closed bins. An algorithm thus determines a sequence of binary decisions i1 , . . . , in where it = 0 means that the next item is assigned to the open bin and it = 1 means that a new bin is opened and the next item is assigned to that bin. Of course, if it = 0, then it may happen that the item xt doesn't fit in the open bin. In that case the item is "lost." If the decision is it = 1 then the remaining empty space in the last closed bin is counted as a loss. The measure of performance we use is the total sum of all lost items and wasted empty space. Just as in the original bin packing problem, we may distinguish off-line and on-line versions of the sequential bin packing problem. In the off-line sequential bin packing problem the entire sequence x1 , . . . , xn is known to the algorithm at the outset. Note that unlike in the classical bin packing problem, the order of the items is relevant. This problem turns out to be computationally significantly easier than its non-sequential counterpart. In Section 3 we present a simple algorithm with running time of O(n2 ) that minimizes the total loss in the off-line sequential bin packing problem. Much more interesting is the on-line variant of the sequential bin packing problem. Here the items xt are revealed one by one, after the corresponding decision it has been made. In other words, each decision has to be made without any knowledge on the size of the item. Formulated this way, the problem is reminiscent to an on-line prediction problem, see [2]. However, unlike in standard formulations of on-line prediction, here the loss the predictor suffers depends not only on the outcome xt and decision it but also on the "state" defined by the fullness of the open bin. Our goal is to extend the usual bin packing problems to situations in which one can handle only one bin at a time, and items must be processed immediately so they cannot wait for bin changes. To motivate the on-line sequential model, one may imagine a simple revenue management problem in which a decision maker has a unit storage capacity at his disposal. A certain product arrives in packages of different size and after each arrival, it has to be decided whether the stored packages are shipped or not. (Storing of the product is costly.) If the stored goods are shipped, the entire storage capacity becomes available again. If they are not shipped one waits for the arrival of the next package. However, if the next package is too large to fit in the remaining open space, it is lost. In another example of application, a sensor collects measurements that can be compressed to variable size (these are the items). The sensor communicates its measurements by sending frames of some fixed size (bins). Since it has limited memory, it cannot store more data than one frame. To save energy, the sensor must maximize its throughput (the proportion of useful data in each frame) and at the same time minimize data loss (this trade-off is reflected in the definition of the loss function). Just like in on-line prediction, we compare the performance of an algorithm with the best in a pool of reference algorithms (experts). Arguably the most natural comparison class contains all algorithms that use a fixed threshold to decide whether a new bin is opened. In other words, reference predictors are parameterized by a real number p (0, 1]. An expert with parameter p simply decides to open a new bin whenever the remaining free space in the open bin is less than p. We call such an expert a constant-threshold strategy. The main result of this paper is the construction of a randomized algorithm for the sequential on-line bin packing problem that achieves a cumulative loss (measured as the sum of the total wasted capacity and lost items) that is less than the total loss of the best constant-threshold strategy (determined in hindsight) plus a quantity of the order of n2/3 log1/3 n. The principal difficulty of the problem lies in the fact that each action of the decision maker takes the problem in a new "state" (determined by the remaining empty space in the open bin) which has an effect on future losses. Moreover, the state of the algorithm is typically different from the state of the experts which makes comparison difficult. In related work, Merhav, Ordentlich, Seroussi, and Weinberger [7] considered a similar setup in which the loss function has a "memory," that is, the loss of a predictor depends on the loss of past actions. Furthermore, Even-Dar, Kakade and Mansour [4] considered the M D P case where the adversarial reward function changes according to some fixed stochastic dynamics. However, there are several main additional difficulties in the present case. First, unlike in [7], but similarly to [4], the loss function has an unbounded memory as the state may depend on an arbitrarily long sequence of past predictions. Second, the state space is infinite (the [0, 1) interval) and the class of experts we compare to is also infinite, in contrast to both of the above papers. However, the special properties of the bin packing problem make it possible to design a prediction strategy with small regret. Note that the M D P setting of [4] would be a too pessimistic approach to our problem, as in our case there is a strong connection between the rewards in different states, thus the absolute adversarial reward function results in an overestimated worst case. Also in the present case, state transitions are deterministically given by the outcome, the previous state, and the action of the decision maker, while in the setup of [4] transitions are stochastic and depend only on the state and the decision of the algorithm, but not on the reward (or on the underlying individual sequence generating the reward). We also mention here the similar on-line bin packing with rejection problem where the algorithm has an opportunity to reject some items and the loss function is the sum of the number of the used bins and the "costs" of the rejected ´ items1 (see He and Dosa [6]). However, instead of the number of used bins, we use the sum of idle capacities (missed or free spaces) in the used bins to measure the loss. The following example may help explain the difference between various versions of the problem. Example 1 Let the sequence of the items be 0.4, 0.5, 0.2, 0.5, 0.5, 0.3, 0.5, 0.1 . Then the cumulative loss of the optimal off-line bin packing is 0 and it is 0.4 in the case of sequential off-line bin packing (see Figure 1). In the sequential case the third item (0.2) has been rejected. 0.1 0.3 0.5 0.5 0.5 0.4 0.2 0.5 0.4 0.5 0.3 0.5 0.5 0.5 0.1 a) off-line b) sequential off-line Figure 1: The difference between the optimal solutions for the off-line and sequential off-line problems. The rest of the paper is organized as follows. In Section 2 the problem is defined formally. In Section 3 the complexity of the off-line sequential bin packing problem is analyzed. The main results of the paper are presented in Section 4. 2 Setup We use a terminology borrowed from the theory of on-line prediction with expert advice. Thus, we call the sequential decisions of the on-line algorithm predictions and we use forecaster as a synonym for algorithm. We denote by It {0, 1} the action of the forecaster at time t (that is, when t - 1 items have been received). Action 0 means that the next item will be assigned to the open bin and action 1 represents the fact that a new bin is opened and the next item is assigned to the next empty bin. Note that we assume that we start with an open empty bin, thus for any reasonable algorithm, I1 = 0, and we will restrict our attention to such algorithms. The sequence of decisions up to time t is denoted by It {0, 1}t . Denote by st [0, 1) the free space in the open (last) bin at time t 1, that is, after having placed the items x1 , x2 , . . . , xt according to the sequence It of actions. This is the state of the forecaster. More precisely, the state of the forecaster is defined, recursively, as follows: As at the beginning we have an empty bin, s0 = 1. For t = 1, 2, . . . , n, 1 In sequential bin packing we assume that the cost of the items coincides with their size. In this case the optimal solution of binpacking with rejection is to reject all items. · st = 1 - xt , when the algorithm assigns the item to the next empty bin (i.e., It = 1); · st = st-1 , when the assigned item does not fit in the open bin (i.e., It = 0 and st-1 < xt ); This may be written in a more compact form: st · st = st-1 - xt , when the assigned item fits in the open bin (i.e., It = 0 and st-1 xt ). S E Q U E N T I A L O N - L I N E B I N PAC K I N G P RO B L E M W I T H E X P E RT A D V I C E where I{·} denotes the indicator function of the event in brackets, that is, it equals 1 if the event is true and 0 otherwise. The loss suffered by the forecaster at round t is (It , xt | st-1 ), = st (It , xt , st-1 ) (1) = It (1 - xt ) + (1 - It )(st-1 - I{st-1 xt } ) b Parameters: set E of experts, state space S = [0, 1), action space A = {0, 1}, nonnegative loss function : (A × (0, 1]|S ) [0, 1), number n of items. Initialization: s0 = 1 and sE ,0 = 1 for all E E . For each round t = 1, . . . , n, (b) the forecaster observes the actions of the experts and forms its own decision It A; (c) the next item xt (0, 1] is revealed; (d) the algorithm incurs loss (It , xt | st-1 ) and each expert incurs loss (fE ,t , xt | sE ,t-1 ). The states of the experts and the algorithm are updated. (a) each expert forms its action fE ,t A; where the loss function is defined by 0 , if s x; (0, x | s) = x, otherwise and (2) (1, x | s) = s . (3) The goal of the forecaster is to minimize its cumulative loss defined by Lt = LI ,t = t st (Is , xs | ss-1 ) . Figure 2: Sequential on-line bin packing problem with expert advice. is evaluated relative to this set of experts, and the goal is to perform asymptotically as well as the best expert from the reference class. Formally, let fE ,t {0, 1} be the decision of an expert E at round t, where E E and E is the set of the experts. This set may be finite or infinite, we consider both cases below. Similarly we denote the state of expert E with sE ,t after the t-th item has been revealed. Then the loss of expert E at round t is (fE ,t , xt | sE ,t-1 ) and the cumulative loss of expert E is LE ,n = tn (fE ,t , xt | sE ,t-1 ). =1 In the off-line version of the problem, the entire sequence x1 , . . . , xn is given and the solution is the optimal sequence I of actions n I = arg n In {0,1}n min LIn ,n . In the on-line version of the problem the forecaster does not know the size of the next items, and the sequence of items can be completely arbitrary. We allow the forecaster to randomize its decisions, that is, at each time instance t, It is allowed to depend on a random variable Ut where U1 , . . . , Un are i.i.d. uniformly distributed random variables in [0, 1]. Since we allow the forecaster to randomize, it is important to clarify that the entire sequence of items are determined before the forecaster starts making decisions, that is, x1 , . . . , xn (0, 1] are fixed and cannot depend on the randomizing variables. (This is the so-called oblivious adversary model known in the theory of sequential prediction, see, e.g., [2].) The performance of a sequential on-line algorithm is measured by its cumulative loss. It is natural to compare it to the cumulative loss of the off-line solution I . However, it is n easy to see that in general it is impossible to achieve an online performance that is comparable to the optimal solution. (This is in contrast with the non-sequential counterpart of the bin packing problem in which there exist on-line algorithms for which the number of used bins is within a constant factor of that of the optimal solution.) So in order to measure the performance of a sequential on-line algorithm in a meaningful way, we adopt an approach extensively used in on-line prediction (the so-called "experts" framework). We define a set of reference forecasters, the so-called experts. The performance of the algorithm =1 The goal of the algorithm is to perform almost as well as the best expert from the reference class E . Ideally, the normalized difference of the cumulative losses (the so-called regret) should vanish as n grows, that is, one wishes to achieve lim sup n with probability one, regardless of the sequence of items. This property is called Hannan consistency, see [5]. The model of sequential on-line bin packing with expert advice is given in Figure 2. In Section 4 we design sequential on-line bin packing algorithms for two cases. In the first (and simpler) case we assume that the class E of experts is finite. In the second case we consider the (infinite) class of experts defined by constant-threshold strategies. But before turning to the on-line problem, we show how the off-line problem can be solved by a simple quadratic-time algorithm. 1L ( n - inf LE ,n ) 0 E E n 3 Sequential off-line bin packing v1 0/(0, x1 | s1 ) v2 0/(0, x2 | s2 ) As it is well known, most variants of the bin packing problem are N P-hard, including bin packing with rejection [6], and maximum resource bin packing [1]. In this section we show that the sequential bin packing problem is significantly easier. Indeed, we offer an algorithm to find the optimal sequential strategy with time complexity O(n2 ) where n is the number of the items. The key property is that after the t-th item has been received, the 2t possible sequences of decisions cannot lead to more than t different states. Lemma 1 For any fixed sequence of items x1 , x2 , . . . , xn and for every 1 t n, |St | t, where St = {s : s = sIt ,t , It {0, 1} } t 1/ ( 1 1/ (1 v4 ... ,x ,x 2| s2 ) 1| s1 ) v3 1/(1, x2 | s3 ) 0/ ( 0 v5 ... ,x 2| s3 ) v6 ... Figure 3: The graph corresponding to the off-line sequential bin packing problem. and sIt ,t is the state reached after the sequence It of decisions. Proof: The proof goes by induction. Note that since I1 = 0, we always have sI1 ,1 = 1 - x1 , and therefore |S1 | = 1. Now assume that |St-1 | t - 1. At time t, the state of every sequence of decisions with It = 0 belongs to the set St = {s : s = s - I{sxt } xt , s St-1 } and the state of those with It = 1 becomes 1 - xt . Therefore, |St | |St | + 1 |St-1 | + 1 t equal to the loss of the corresponding sequence of actions. Thus, if we find a path with minimal total weight from v1 to v|V | , we also find the optimal sequence of actions for the off-line bin packing problem. It is well known that this can be done in O(|V | + |E |) time. Now by Lemma 1, |V | n(n + 1)/2 + 1, where the additional vertex accounts for the sink. Moreover it is easy to see that |E | n(n-1)+n = n2 . Hence the total time complexity of finding the off-line solution is O(n2 ). 4 Sequential on-line bin packing as desired. To describe a computationally efficient algorithm to compute I , we set up a graph with the set of possible states as n a vertex set (there are O(n2 ) of them by Lemma 1) and we show that the shortest path on this graph yields the optimal solution of the sequential off-line bin packing problem. To formalize the problem, consider a finite directed acyclic graph with a set of vertices V = {v1 , . . . , v|V |} and a set of edges E = {e1 , . . . , e|E |}. Each vertex vk = v (sk , tk ) of the graph is defined by a time index tk and a state sk Stk and corresponds to state sk reachable after tk steps. To show the latter dependence, we will write vk Stk . Two vertices (vi , vj ) are connected by an edge if and only if vi St-1 , vj St and state vj is reachable from state vi . That is, by choosing either action 0 or action 1 in state vi , the new state becomes vj after item xt has been placed. Each edge has a label and a weight: the label corresponds to the action (zero or one) and the weight equals the loss, dependig on the initial state, the action, and the size of item. Figure 3 shows the proposed graph. Moreover a sink vertex v|V | is introduced that is connected with all vertices in Sn . These edges have weight equal to the loss of the final states. The losses of these edges only depend on the initial state of the edges. More precisely, for (vi , v|V | ) the loss is 1 - vi , where vi Sn . Notice that there is a one to one coresspondence between paths from v1 to v|V | and possible sequences of actions of length n. Furthermore, the total weight of each path (calculated as the sum of the weights on the edges of the path) is In this section we study the sequential on-line bin packing problem with expert advice, as described in Section 2. We deal with two special cases. First we consider finite classes of experts (i.e., reference algorithms) without any assumption on the form or structure of the experts. We construct a randomized algorithm that, with large probability, achieves a cumulative loss not larger than that of the best expert plus O(n2/3 ln1/3 N ) where N = |E | is the number of experts. Then we consider the class of all constant-threshold experts and show that a regret of the order O(n2/3 ln1/3 n) may be achieved with high probability. The following simple lemma is a key ingredient of the results of this section. It shows that in sequential on-line bin packing the cumulative loss is not sensitive to the initial states in the sense that the cumulative loss depends on the initial state in a minor way. Lemma 2 Let i1 , . . . , im {0, 1} be a fixed sequence of decisions and let x1 , . . . , xm (0, 1] be a sequence of items. Let s0 , s [0, 1) be two different initial states. Finally, 0 let s0 , . . . , sm and s , . . . , sm denote the sequences of states 0 generated by i1 , . . . , im and x1 , . . . , xm starting from initial states s0 and s , respectively. Then 0 m tm t (it , xt | st-1 ) (it , xt | st-1 ) - =1 =1 s + s0 2 . 0 Proof: Let m denote the smallest index for which im = 1. Note that st-1 = s -1 for all t > m . Therefore, we have t tm tm (it , xt | st-1 ) (it , xt | s -1 ) - t =1 =1 = = +(1, xm | sm -1 ) - (1, xm | sm -1 ) . Now using the definition of the loss (see (2) and (3)), we write tm tm (it , xt | st-1 ) (it , xt | st-1 ) - =1 =1 m -1 t =1 tm =1 (it , xt | s -1 ) t - tm =1 (it , xt | st-1 ) (0, xt | st-1 ) (0, xt | st-1 ) - m -1 t =1 = m -1 t m -1 t =1 xt (I{st-1 0, 1 m n, and (0, 1). For any sequence x1 , . . . , xn (0, 1] of items, the cumulative loss Ln of the randomized strategy defined above satisfies, with probability at least 1 - , Ln - min Li,n i=1,...,N 4.1 Finite sets of experts First we consider the on-line sequential bin packing problem when the goal of the algorithm is to keep its cumulative loss close to the best in a finite set of experts. In other words, we assume that the class of experts is finite, say |E | = N , but we do not assume any additional structure of the experts. The ideas presented here will be used below when we consider the infinite class of constant-threshold experts. The proposed algorithm partitions the time period t = 1, . . . , n into segments of length m where m < n is a positive integer whose value will be specified later. This way we obtain n = n/m segments of length m, and, if m does not divide n, an extra segment of length less than m. At the beginning of each segment, the algorithm selects an expert randomly, according to an exponentially weighted average distribution. During the entire segment, the algorithm follows the advice of the selected expert. By changing actions so rarely, the algorithm achieves a certain synchronization with the chosen expert, since the effect of the difference in the initial states is minor, according to Lemma 2. (A similar idea was used in [7] in a different context.) The algorithm is described in Figure 4. Recall that each expert E E recommends an action fE ,t {0, 1} at every time instance t = 1, . . . , n. Since we have N experts, we may identify E with the set {1, . . . , N }. Thus, experts will be indexed by the positive integers i {1, . . . , N }. At the beginning of each segment, the algorithm chooses expert i randomly, with probability pi,t , where the distribution pt = (p1,t , . . . , pN ,t ) is specified below. The random selection is made independently for each segment. The following theorem establishes a performance bound of the algorithm. Recall that Ln denotes the cumulative loss of the algorithm while Li,n is that of expert i. =1 (it , xt | st-1 ) - tm =1 (it , xt | s -1 ) t s + s0 0 m ln N n + + 8 n m 1 2n ln + + 2m 2 m and the statement follows. The following example shows that upper bound of the lemma is tight. Example 2 Let x1 = s0 , s < s0 , and m = 2. Then 0 tm tm (it , xt | st-1 ) (it , xt | st-1 ) - =1 =1 In 8 articular, choosing m = (16n/ ln(N/ ))1/3 and = p m ln N/n, one has Ln - min Li,n i=1,...,N N 3 n2/3 ln1/3 +4 3 2 1/ 3 2 n . ln(N/ ) = (0, x1 | s ) + (1, x2 | s ) 0 1 = - (0, x1 | s0 ) + (1, x2 | s1 ) = (0, s0 | s ) + (1, x2 | s ) 0 0 - (0, s0 | s0 ) + (1, x2 | 0) s0 + s - (0 + 0) . 0 Proof: We introduce an auxiliary quantity, the so-called hypothetical loss, defined as the loss the algorithm would suffer if it had been in the same state as the selected expert. This hypothetical loss does not depend on previous decisions of the algorithm. More precisely, the true loss of the algorithm at time instance t is (It , xt | st ) and its hypothetic loss is (It , xt | sJt ,t ). Introducing the notation i,t = (fi,t , xt | si,t ) , SEQUENTIAL O N - L I N E B I N PAC K I N G A L G O R I T H M Parameters: Real number > 0 and m N+ . Initialization: wi,0 = 1 and si,0 = 1 for i = 1, . . . , N , and s0 = 1. For each round t = 1, . . . , n, Now we may decompose the regret as follows: Ln - min Li,n i=1,...,n +L tn Jt ,t = n- t =1 (a) If ((t - 1) mod m) = 0 then ­ calculate the updated probability distribution wi,t-1 pi,t = N j =1 wj,t-1 for i = 1, . . . , N ; ­ randomly select an expert Jt {1, . . . , N } according to the probability distribution pt = (p1,t , . . . , pN ,t ); otherwise, let Jt = Jt-1 . n =1 Jt ,t - min Li,n i=1,...,n . The second term on the right-hand side is bounded using (4). To bound the first term, observe that by Lemma 2, Ln - min Li,n = i=1,...,n tn =1 (d) The algorithm incurs loss (b) Follow the chosen expert: It = fJt ,t . (c) The size of next item xt (0, 1] is revealed. (It , xt | st-1 ) m+ (It , xt | st-1 ) - n -1 tm s =0 =1 tn =1 (It , xt | sJt -1 ) m + 2n -(Ism+t , xsm+t | sJsm+t-1 ,sm+t-1 ) (Ism+t , xsm+t | ssm+t-1 ) and each expert i incurs loss (fi,t , xt | si,t-1 ). The states of the experts and the algorithm are changed. (e) Update the weights wi,t = wi,t-1 e-(fi,t ,xt |si,t-1 ) for all i {1, . . . , N }. Figure 4: Sequential on-line bin packing algorithm. where in the second inequality we bounded the difference on the last segment separately. 4.2 Constant-threshold experts the hypothetical loss of the algorithm is just (It , xt | sJt ,t ) = (fJt ,t , xt | sJt ,t ) = Jt ,t . Now it follows by a well-known result of randomized on-line prediction (see, e.g., [2, Corollary 4.2]) that the hypothetical loss of the sequential on-line bin packing algorithm satisfies, with probability at least 1 - , tn Jt ,t - min tn i,t n 1 ln 2 =1 i=1,...,N (4) + m, m l nN n + + 8 =1 n where n = m and the last m term comes from bounding the difference on the last, not necessarily complete segment. Now we are prepared to address the sequential on-line bin packing problem when the goal is to perform almost as well as the best in the class of all constant-threshold strategies. Recall that a constant-threshold strategy is parameterized by a number p (0, 1] and it opens a new bin if and only if the remaining empty space in the bin is less than p. More precisely, if the state of the algorithm defined by expert with parameter p is sp,t-1 , then at time t the expert's advice is I{sp,t-1 r > 0 be such that expert p and expert r are t-indistinguishable. Then for any p > q > r expert q is t-indistinguishable from both experts p and r. Thus, the maximal t-indistinguishable expert sets form subintervals of (0, 1]. Proof: By the assumption of the lemma the decision sequences of experts p and r coincide, that is, fp,u = fr,u and sp,u = sr,u S E Q U E N T I A L O N - L I N E B I N PAC K I N G A L G O R I T H M W I T H C O N S TA N T- T H R E S H O L D E X P E RT S Parameters: > 0 and m N+ . Initialization: w0,1 = 1, N1 = 1, Q1 = {1}, s1,0 = 1 and s0 = 1. For each round t = 1, . . . , n, (a) If ((t - 1) mod m) = 0 then ­ for i = 1, . . . , Nt , compute the probabilities wi,t-1 pi,t = Nt ; j =1 wj,t-1 for all u = 1, 2, . . . , t. Let t1 , t2 , . . . denote the time instances when expert p (or expert r) assigns the next item to the next empty bin (i.e., fp,u = 1 for u = t1 , t2 , . . .). If expert q also decides 1 at time tk for some k , then it will decide 0 for t = tk + 1, . . . , tk+1 - 1 since so does expert p and p > q , and will decide 1 at time tk+1 as q > r. Thus the decision sequence of expert q coincides with that of expert p and r for time instances tk + 1, . . . , tk+1 in this case. Since all experts start with the empty bin at time 0, the statement of the lemma follows by induction. Based on the lemma we can identify the t-indistinguishable sets by their end points. Let Qt = {q1,t , . . . , qNt ,t } denote the set of the end points after receiving t items, where Nt = |Qt | is the number of maximal t-indistinguishable sets, and q0,t = 0 < q1,t < q2,t < · · · < qNt ,t = 1. Then the t-indistinguishable sets are (qk-1,t , qk,t ] for k = 1, . . . , Nt . The next result shows that the number of maximal t-indistinguishable sets cannot grow too fast. Lemma 5 The number of the maximal t-indistinguishable sets is at most quadratic in the number of the items t. More precisely, Nt 1 + (t - 1)t/2 for any 1 t n. Proof: The proof is by induction. First, N1 = 1 (and Q1 = {1}) since the first decision of each expert is 1. Now assume that Nt-1 1 + (t - 2)(t - 1)/2 for some 1 t n - 1. When the next item xt arrives, an expert p with state s decides 1 in the next step if and only if 0 s - xt < p. Therefore, as each expert belonging to the same indistinguishable set has the same state, the k -th maximal (t - 1)-indistinguishable interval with state s is split into two subintervals if and only if qk-1,t-1 < s - xt qk,t-1 (experts in this interval with parameters larger than s - xt will form one subset, and the ones with parameter at most s - xt will form the other one). As the number of possible states at time t - 1 is at most t - 1 by Lemma 1, it follows that at most t - 1 intervals can be split, and so Nt Nt-1 + t - 1 1 + (t - 1)t/2, where the second inequality holds by the induction hypothesis. This lemma makes it possible to apply our earlier algorithm for the case of finite expert classes. However, note that the number of "distinguishable" experts, that is, the number of the maximal indistinguishable sets, constantly grows with time, and each indistinguishable set contains a continuum number of experts. Therefore we need to redefine the algorithm carefully. This may be done by a two-level random ­ randomly select an interval Jt {1, . . . , Nt } according to the probability distribution pt = (p1,t , . . . , pNt ,t ); ­ choose an expert pt uniformly from the interval (qJt -1,t , qJt ,t ]; otherwise, let pt = pt-1 . (b) Follow the decision of expert pt : It = fpt ,t . (c) xt (0, 1], the size of the next item is revealed. (d) The algorithm incurs loss (It , xt | st-1 ) and each expert p (0, 1] incurs loss (fp,t , xt | sp,t-1 ), where p [0, 1). (e) Compute the state st of the algorithm by (2), and calculate the auxiliary weights and states of the expert sets for all i = 1, . . . , Nt by wi,t = wi,t-1 e-(fi,t ,xt |si,t-1 ) ~ si,t = fi,t (1 - xt ) ~ +(1 - fi,t )(si,t - I{si,t xt } ). (f) Update the end points of the intervals: Qt+1 = Qt iNt {si,t : qi-1,t < si,t qi,t } ~ ~ =1 (g) Assign the new states and weights to the (t+1)indistinguishable sets si,t+1 = sj,t ~ and wi,t+1 = wj,t ~ and Nt+1 = |Qt+1 |. for all i = 1, . . . , Nt+1 and j = 1, . . . , Nt such that qj -1,t < qi,t+1 qj,t . Figure 5: Sequential on-line bin packing algorithm with constant-threshold experts. choice of the experts: first we choose an indistinguishable expert set, then we pick one expert from this set randomly. The resulting algorithm is given in Figure 5. Up to step (e) the algorithm is essentially the same as in the case of finitely many experts. The two-level random choice of the expert is performed in step (a). In step (f) we update the t-indistinguishable sets, and usually introduce new indistinguishable expert sets. Because of these new expert sets, the update of the weights wi,t and the states si,t are performed in two steps, (e) and (g), where the actual update is made in step (e), and reordering of these quantities according to the new indistinguishable sets is performed in step (g) together with the introduction of the weights and states for the newly formed expert sets. The performance and complexity of the algorithm is given in the next theorem. Theorem 6 Let N = 1+n(n-1)/2, m = (16n/ ln(n2 / ))1/3 m and = 4 ln n/n and (0, 1). Then the regret of the algorithm defined above is bounded, with probability at least 1 - , by 1/ 3 2 3 n2 n n2/3 ln1/3 +4 . 3 ln(n2 / ) 2 Moreover, the algorithm can be implemented with time complexity O(n3 ) and space complexity O(n2 ). Proof: It is easy to see that the two-level choice of the expert pt ensures that the algorithm is the same as for the finite expert class with the experts defined by Qn . Thus, Theorem 3 can be used to bound the regret, where the number of experts is Nt . By Lemma 5, the latter is bounded by N < n2 , which finishes the proof of the first statement. For the second part note that the algorithm has to store the states, the intervals, the weights and the probabilities, each on the order of O(n2 ) based on Lemma 5. Concerning time complexity, the algorithm has to update the weights and states in each round (requiring O(n2 ) computations per round), and has to compute the probabilities in every m step, which requires O(n3 /m) computations. Thus the time complexity of the algorithm is O(n3 ). The next example reveals that the loss of the best expert can be arbitrarily far from that of the optimal sequential offline packing. Example 3 Let the sequence of items be , 1-, , 1-, . . . , , 1-, , 1, 1, . . . , 1 , k 2 k 5 Conclusions Ln - inf Lp,n p(0,1] In this paper we provide an extension of the classical bin packing problems to an on-line sequential scenario. In this setting items are received one by one, and before the size of the next item is revealed, the decision maker needs to decide whether the next item is packed in the currently open bin or the bin is closed and a new bin is opened. If the new item doesn't fit, it is lost. If a bin is closed, the remaining free space in the bin accounts for a loss. The goal of the decision maker is to minimize the loss accumulated over n periods. As the main result of the paper, we give an algorithm that has a cumulative loss not much larger than any finite set of reference algorithms, and, more importantly, another algorithm that has a cumulative loss not much larger than any strategy that uses a fixed threshold at each step to decide whether a new bin is opened. An interesting aspect of the problem is that the loss function has an (unbounded) memory. The presented solutions rely on the fact that one can "synchronize" the loss function in the sense that no matter in what state an algorithm is started, its loss may change only by a small additive constant. The second result is obtained by a covering of the uncountable set of constant-threshold experts such that the cardinality of the chosen finite set of experts grows only quadratically with the sequence length. The approach in the paper can easily be extended to any control problem where the loss function has such a synchronizable property. References [1] J. Boyar, L. Epstein, L.M. Favrholdt, J.S. Kohrt, K.S. Larsen, M.M. Pedersen, and S. Wøhlk. The maximum resource bin packing problem. Theoretical Computer Science, 362:127­139, 2006. [2] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. [3] E.G. Coffman, M.R. Garey, and D.S. Johnson. Approximation algorithms for bin packing: a survey. In Approximation algorithms for NP-hard problems, pp. 46­ 93, PWS Publishing Co., Boston, MA, 1997. [4] E. Even-Dar, S.M. Kakade, and Y. Mansour. Experts in a Markov Decision Process. In L.K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pp. 401­408. MIT Press,Cambridge, MA, 2005. [5] J. Hannan. Approximation to Bayes risk in repeated plays. In M. Dresher, A. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games, volume 3, pp. 97­ 139. Princeton University Press, 1957. ´ [6] Y. He and Gy. Dosa. Bin packing and covering problems with rejection. Lecture Notes in Computer Science 3595, pp. 885­894, 2005. [7] N. Merhav, E. Ordentlich, G. Seroussi, and M. J. Weinberger. On sequential strategies for loss functions with memory. IEEE Transactions on Information Theory, 48:1947-1958, 2002. [8] S.S. Seiden. On the online bin packing problem. in Proceedings of the 28th International Colloquium on Automata, Languages and Programming, pp. 237 - 248, 2001. where the number of items is n = 3k + 1 and 0 < < 1. An optimal sequential off-line packing is achieved if we drop anyone of the terms; then the total loss is . In contrast to this, the loss of the constant-threshold experts is 1 - + k independently of the choice of the parameter p. Namely, if p 1 - then the loss is 0 for the first 2k items, but after the algorithm is stuck and suffers k + 1 - loss. If p > 1 - , then the loss is k for the first 2k items and after that 1 - for the rest of the sequence.