On minimizing the total flow time on multiple machines Nikhil Bansal Abstract We consider the problem of minimizing the total flow time on multiple machines with preemption, where the flow time of a job is the time spent since it arrives until it finishes. Our main result is a quasi-polynomial time approximation scheme for a constant number of machines (m). The result also extends to total weighted flow time where either the job weights or the job sizes are polynomially bounded by the number of jobs (n). We also show that the dependence on m cannot be substantially improved. In particular, obtaining an O(1) approximation for the weighted case (even when all weights and sizes are polynomially bounded by n) by an algorithm with running time npolylog(n,m) would imply that N P DT I ME (npolylog(n) ). 1 Intro duction flow time. One difficulty in getting efficient approximation schemes is that flow time is very sensitive to small changes in the schedule. For example, the usual techniques of rounding the job sizes or release dates to powers of 1 + do not work. The first recent break through in this direction was an elegant QP T AS for minimizing the weighted flow time on a single machine [2]. However, their results do not carry over to m > 1. The main difficulty is that if we consider a particular machine, an arbitrary subset of jobs could be assigned to be processed on it. As we need to keep track of the state (remaining processing requirements, number of jobs etc.) in sufficient detail it is not clear how to encode this information using few bits. Our main contribution is a technique which allows us to store the approximate state under S RP T for any subset of jobs using O(log2 n) bits of information. The technique extends to minimizing the total weighted flow time in the special case when either the job weights or job sizes are polynomially bounded in n. Our results suggest that a PTAS is likely to exist for these problems. Finally, while the exponential dependence on m seems excessive, we show that it is unlikely that it can be substantially improved for the weighted case. 2 Total Flow Time Let I be a problem instance with largest job size P . Without loss of generality we can assume that all release dates are at most nP and that all jobs finish execution by time 2nP . Next, the release dates and sizes can be rounded without losing too much (proof deferred). Lemma 2.1. Given I , rounding up the job sizes and release dates to a multiple of P/n2 increases the optimal cost by a factor of at most (1 + 2 ). Let I denote this rounded instance. By Lemma 2.1 we can assume that all job sizes are integers in the range [0, n2 / ], and all events (release dates, job departures and preemptions) occur at times that are integers in [0, 2n3 / ]. Moreover it easily follows that a schedule for I gives only a possibly better schedule for I . We now describe how to compute an approximately optimal schedule (i.e. up to a factor of 1 + O( )) for I . We say that a job with size pj lies in class i, iff (1 + )i-1 pj < (1 + )i . This divides the jobs Flow time, defined as the time a user has to wait for his request to be completed, i.e. the time elapsed since the request is submitted until it finishes, is one of the most relevant measure of the quality of service received. A basic question then is to minimize the total flow time, given a collection of n jobs (requests) with arbitrary arrival times and arbitrary processing requirements. It is well known that the Shortest Remaining Processing Time (S RP T ) algorithm is optimal for m = 1 and the problem is NP-Hard for m 2 machines. The first algorithm for multiple machines was an O(log n) approximate algorithm [3]. The algorithm of [3] involved migration of jobs (i.e. a job can be interrupted on one machine and moved to another). A O(log n) approximation algorithm without migrations is also known [1]. While these algorithms are online and work for arbitrary number of machines, no algorithm with a guarantee o(log n) is known even for the case of m = 2 machines. Moreover, it is also not known whether the problem is AP X hard. While elegant approximation schemes are known for the related (but strictly easier) problems of minimizing the total completion time on multiple machines and minimizing makespan, not many results are known for Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Email: nikhil@cs.cmu.edu. This work was supported in part by NSF ITR grant CCR-0122581. Computer Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 572 job is added to the first 1/ jobs in Sj (i, t), or else if it is smaller that the 1/ largest jobs in its class, then its size is added to the (1/ + 1)th entry. When there are no arrivals, at any time t, the algorithm works on some state s(j ) for machine j . We do not know s(j ), but we can try out all the different O(log n/ )m choices for machine j and class s(j ). For a fixed choice of s(j ), we either decrement the 1 + 1th entry of s(j ) by 1, or if Lemma 2.2. For al l k = 1, . . . , m and at al l times t, at there are no more than 1 jobs, then the smallest nonmost one job in Sk (i, t) has a remaining processing time zero entry is decremented by 1. Thus, that is not in the range (1 + )i-1 and (1 + )i . Theorem 2.1. The above algorithm gives a (1 + ) We now define the state of the algorithm Q(t) approximation for minimizing total flow time on m at time t. For each class i and each machine j , we identical machines and runs in time nO(m log n/ 2) . store 1 + 1 numbers: the first 1 are the remaining processing times of the 1 jobs in Sj (i, t) with the largest Similarly, we can give a 1 + approximation algoremaining processing time; the last entry is the sum of rithm with running time nO(m log2 n/ 2) (proof deferred the remaining processing times of the rest of the jobs to the full version), for minimizing total weighted flow in Sj (i, t). Notice that as each job has size at most time on m machines, where either the job weights or the n2 / , there are at most (n2 / )1/ = O(n2/ ) possible jobs sizes are polynomially bounded in n. choices for the first 1/ entries and n3 / possible choices for the sum of remaining sizes. Finally, since there 3 Dep endence on the numb er of machines there are at most O(log n/ ) classes and m machines, Consider an instance of 3-Partition: Given a set A of the total number of distinct states at any step is at 3m elements, an integer B > 0; for eachxx A a integer most nO(m log n/ 2) . The following lemma shows how size s(x) s.t. B /4 < s(x) < B /2 and A s(x) = mB . this information helps us in estimating the number of The question is whether A can be partitioned into m unfinished jobs at any time. As the total flow time is disa oint sets A1 , A2 , . . . , Am such that, for 1 i m, j the sum of the number of jobs over time, this will allow Ai s(a) = B . 3-Partition is known to b e strongly us to estimate the total flow time. NP-Complete (in particular, if B = O(m4 )). We transform this instance of 3-Partition as follows. Lemma 2.3. We can estimate the number of jobs at There are m machines. Each element x A corresponds time t within a factor of (1 + 2 ) using Q(t). to a job, with size s(x), weight m and is released at Proof. If there are fewer than 1/ jobs in level i, we time t = 0. Next at each time instance t = B + i/m2 , know their number precisely. Suppose, there are more for i = 0, 1, 2, . . . , B m5 , m jobs each with size 1/m2 than 1/ jobs in some level i. Clearly, if the remaining and weight 1/m are released. It can be verified that processing times of all these jobs were between (1 + )i-1 if a solution to the 3-partition instance exists then the and (1 + )i , then using the total remaining processing total weighted flow time is O(B m3 ) and least O(B m4 ) time, we could estimate the correct number up to a otherwise. As the weights, processing times and the factor of 1 + . Now, as at most one job in every level number of jobs are poly (m), we have that, i has remaining time outside the range (1 + )i-1 and (1 + )i (by Lemma 2.2), our estimate could be off by Theorem 3.1. An O(1) approximation with running poly log (n,m) for weighted flow time (even with another job. As there are at least 1/ unfinished jobs, time 2 weights and sizes mO(1) ), implies that N P we get an estimate within a factor of 1 + 2 . DT I ME (npolylog(n) ). Thus the algorithm to compute the approximately optimal schedule is a dynamic program which has a state for each time t and each possible configuration References Q(t). Thus there are at most O(n3 / ) × nO(m log n/ 2) states. With each state we store the value of the least [1] B. Awerbuch, Y. Azar, S. Leonardi, and total flow time incurred to reach that state. O. Regev, Minimizing the flow time without migraWe now show how to update Q(t) with time. When tion, in (STOC), 1999, pp. 198­205. a new job arrives, we have m choices for the machine to [2] C. Chekuri and S. Khanna, Approximation schemes which it can be assigned. Once a machine is decided, for preemptive weighted flow time, in ACM Symposium on Theory of Computing (STOC), 2002, pp. 297­305. the size of the job determines its class. Now, either the into O( 1 log n) classes. Note that the class of a job depends only on its initial processing time. Consider the optimum algorithm. Let Sk (i, t) denote the jobs of class i on machine j which are alive at time t. Since each machine in the optimum algorithm can be assumed to process the jobs assigned to it in the SRPT order, we have that (proof deferred) 573 [3] S. Leonardi and D. Raz, Approximating total flow time on paral lel machines, in (STOC), 1997, pp. 110­ 119. 574