SRPT Optimally Utilizes Faster Machines to Minimize Flow Time Jason McCulloughand Eric Torng Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824 torng@msu.edu Abstract We analyze the shortest remaining processing time (SRPT) algorithm with respect to the problem of scheduling n jobs with release times on m identical machines to minimize total flow time. It is known that SRPT is optimal if m = 1 but that SRPT has a worstcase approximation ratio of (min(log n/m, log )) for this problem, where is the ratio of the length of the longest job divided by the length of the shortest job. It has previously been shown that SRPT is able to use faster machines to produce a schedule as good as an optimal algorithm using slower machines. We now show that SRPT optimal ly uses these faster machines with respect to the worst-case approximation ratio. That is, if SRPT is given machines that are s 2 - 1/m times as fast as those used by an optimal algorithm, SRPT's flow time is at least s times smaller than the flow time incurred by the optimal algorithm. Clearly no algorithm can offer a better worst-case guarantee, and we show that existing algorithms with similar performance guarantees to SRPT without resource augmentation do not optimally use extra resources. 1 Intro duction In this paper, we consider the problem of scheduling m identical machines to minimize total flow time. In more detail, we are given m identical machines and an input instance I , which is a collection of n independent jobs {J1 , J2 , . . . , Jn }. Each job Jj has a release date rj and a size or length pj . Note that size is commonly referred to as processing time, but since we will consider machines that run at different speeds, length or size is more appropriate. A job can run on only one machine at a time, and a machine can process only one job at a time. We denote the completion time of job Jj in Supp orted in part by NSF grant CCR 9701679, an MSU professorial assistantship, and an MSU summer research internship. Supp orted in part by NSF grants CCR 9701679 and CCR 0105283. S schedule S by Cj and drop the superscript when the schedule is understood from context. The flow time of a S job in schedule S is FjS Cj -rj . The id le time or delay S s of a job in schedule S is Dj Fjs - pj Cj - rj - pj . jS The total flow time of schedule S is Fj . We consider a preemptive and migratory scheduling model where a job may be interrupted and subsequently resumed on any machine with no penalty. In the classic notation j of Lenstra et al., this is the P | pmtm | Fj problem. Instead of focusing on total flow time, we could equivalentF consider the minimization of average ly 1 flow time n Furthermore, total flow time is j. mij imized when we minimij e total completion time, n z S S Cj , or total id le time, Dj . We focus our attention on the Shortest Remaining Processing Time (SRPT) algorithm that, at any time, schedules the m jobs with shortest remaining length (processing time) breaking ties arbitrarily. SRPT is an online scheduling algorithm. An online scheduling algorithm must construct the schedule up to time t without any prior knowledge of jobs that become available at time t or later. When a job arrives, however, we assume that all other relevant information about the job is known. In constrast, an offline scheduling algorithm has full knowledge of the input instance when constructing a schedule. When the number of machines m = 1, SRPT is known to be an optimal algorithm for this problem. When m 2, this problem is known to be NP-complete. Leonardi and Raz showed that the SRPT algorithm has a worst-case approximation ratio of (min(log n/m, log )) for this problem, where is the ratio of the length of the longest job divided by the length of the shortest job [10]. They also showed that (min(log n/m, log )) is the best possible approximation ratio for any deterministic or randomized online algorithm. A simpler analysis of SRPT is given in [9]. No offline algorithm with a superior approximation ratio is known for this problem. Kalyanasundaram and Pruhs popularized the us- Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 350 age of resource augmentation as a method for analyzing online algorithms, in particular online scheduling algorithms [8]. Using this technique, we compare the performance of an online algorithm to an offline algorithm when the online algorithm is given extra resources in the form of faster machines or extra machines. In this paper, we ignore extra machines and consider only faster machines as well as stretched input instances, a concept related to faster machines introduced in Phillips et al. [11]. A job with length pj takes pj /s time to complete if run on a speed-s machine. We use the terminology introduced by Phillips et al. Let I be an instance of an m machine minimization scheduling problem with optimal solution value V . An s-speed -competitive algorithm finds a solution of value at worst V using m speed-s machines. For any input instance I , define I s to be the s-stretched input instance where job Jj has release time srj instead of rj . An s-stretch -competitive algorithm finds a solution to I s of value at worst V using m speed-1 machines. The relationship between faster machines and stretched input instances is captured by the following speedstretch theorem from Phillips et al. Theorem 1.1. Any s-speed -competitive algorithm is also an s-stretch s-competitive algorithm when working with the flow time metric. Note, the reverse holds as well. That is, any sstretch -competitive algorithm is also an s-speed /scompetitive algorithm. 1.1 Our contributions and related work Our primary result is that SRPT optimal ly uses faster machines. That is, if SRPT is given speed-s machines where s 2 - 1/m, then SRPT incurs a total flow time that is at least s times smaller than that incurred by the optimal algorithm using speed-1 machines. More formally, SRPT is an s-speed 1/s-competitive algorithm for minimizing total flow time when s 2 - 1/m. No algorithm can use faster machines to get a better worstcase guarantee as can be seen by considering an input instance consisting of a single job. This improves upon the result in Phillips et al. where they proved that SRPT is an s-speed 1-competitive algorithm for this problem when s 2 - 1/m [11]. In contrast, we also show that existing algorithms with similar performance guarantees to SRPT without extra resources are not s-speed 1/s-competitive algorithms for any s. This includes the non-migratory algorithms developed by Awerbuch, Azar, Leonardi, and Regev [3], Chekuri, Khanna, and Zhu [5], and and Avrahami and Azar [2]. This offers some evidence in favor of SRPT for this problem. In addition, we hope that several of the concepts and techniques used in this paper including profiles, SRPT charging, and stretched input instances may be helpful in proving other results concerning flow time and weighted flow time. Note, Anderson and Potts [1] used the concept of a double problem to help prove a result regarding minimizing total completion time in a nonpreemptive uniprocessor environment. In their double problem, they multiply not only release times but also processing times by a factor of 2 to create a related input instance. Resource augmentation has been used to study the problem of minimizing flow time in a nonclairvoyant uniprocessor environment where the algorithm has no knowledge of pj until job Jj completes [8, 6, 7]. Edmonds also shows that the Round Robin algorithm is an s-speed O(1)-competitive algorithm for the parallel machine problem for s 2, but RR is not s-speed 1-competitive for s < 4 and is at best s-speed 2/scompetitive for s 4 for a more general problem setting [7]. More recently, Chekuri, Khanna, and Kumar have shown that the algorithm of Avrahamai and Azar is a (1 + )-speed O(1/ )-competitive algorithm for this problem (and others) [4]. This is an important result as it shows that with modest resource augmentation, a constant competitive ratio is achievable. However, as noted earlier, we can show that this algorithm does not optimally use extra resources as SRPT does. It is likely that SRPT is also a (1 + )-speed O(1/ )-competitive algorithm for minimizing total flow time, but its analysis is likely to be more complex. The rest of this paper is organized as follows. In section 2, we first show that several algorithms are not s-speed 1/s-competitive algorithms for any s > 1. In section 3, we first give an intuitive overview of the proof. In section 4, we introduce some building blocks for the proof such as profiles, partial schedules, canonical schedules, and an idle time accounting scheme we name SRPT charging. In section 5, we introduce, for analysis purposes only, an algorithm we name Relaxed SRPT (RSRPT). We then show that RSRPT incurs no more idle time than the optimal algorithm and that SRPT on a stretched input instance incurs no more idle time than RSRPT on the original input instance. We conclude with a brief discussion of open problems. 2 Bounds on other algorithms The key idea in algorithms that eliminate migration is the idea of classifying jobs by size [3, 5, 2]. In [3], jobs are classified as follows: a job j whose remaining processing time at time t is in [2k , 2k+1 ) is in class k for - < k < at time t. Note that jobs change classes as they execute. In [5] and [2], a job j whose initial processing time is in [2k , 2k+1 ) is in class k for 351 - < k < for all times t after its release up until its completion. Note, 2 is used to determine classes, but 2 could be c for any constant c > 1. In [5], they optimize their algorithm by identifying the best possible constant c. The algorithms of [3, 5] use the following data structures to organize available jobs at any time. Jobs not yet assigned to any machine are stored in a central pool. Jobs assigned to machine k are stored on a stack for machine k . The algorithms of [3, 5] schedule jobs as follows. Each machine processes the job at the top of its stack. When a new job arrives, if there is any machine k that is idle or currently processing a job of a higher class than the new job, the new job is pushed onto machine k 's stack and machine k begins processing the new job. If multiple machines satisfy the above criteria, the job is assigned to any of one of them. Otherwise, the job enters the central pool. When a job is completed on any machine k , machine k compares the class of the job on top of its stack (if such a job exists) with the minimum class of any job in the central pool. If the minimum in the pool is smaller than the class of the job on top of its stack, then any job in the pool of that minimum class is pushed onto machine k 's stack. Machine k then begins processing the job on top of its stack. In [5], they also define an algorithm where migration is allowed so that when a job completes on machine k , machine k looks for the smallest class job on other machines' stacks in addition to the central pool. We first show that the algorithms of [3, 5] cannot be s-speed 1/s-competitive algorithms for this problem. We use A to denote any implementation of the nonmigratory algorithms of [3, 5]. strategy is to preempt the jobs of size c - for the jobs of size 1 resulting in a total flow time of m(c - +2). Since and can be made arbitrarily small, the result follows. This result actually holds for migratory versions of the above algorithms as well. c 1 m Lemma 2.2. A is at best an s-speed c-1 - cm -1 s competitive algorithm for any c 1, m 2, and s 1. c This asymptotical ly approaches c-1 1 as m . s Proof. Consider the following input instance. We release m - 1 jobs of size at time 0. A will assign each of these jobs to its own machine. Then, we release a sequence of m jobs at unique times in the interval (0, ). The jobs released are of size ck for 0 k m - 1, and the jobs are released in decreasing order of size. A will assign each of these jobs to the machine that did not schedule any job of size . Thus, A will incur a tom-1 tal flow time of (1/s) k=0 (m - k )cm + (m - 1)/s = (1/s)[(cm+1 - c)/(c - 1)2 - m/(c - 1) + (m - 1) ]. On the other hand, the m larger jobs could be assigmed to individual machines for a total flow time of n -1 k m can be k=0 c = (c - 1)/(c - 1) + 2(m - 1) . As made arbitrarily small, the result follows. The proof of Theorem 2.1 is completed by finding the value of c where c/(c - 1) = (2c + 1)/(c + 2), and b this occurs when c = (3 + 13)/2. The lower ound on the competitive ratio then evaluates to (3 + 13)/(1 + 13) 1.43. Similar arguments can also be developed to show that the algorithms of [2] and migratory versions of [3, 5] are not s-speed 1/s-competitive. Theorem 2.1. As m , A is at best an s-speed 3 Pro of overview 3+13 1 -competitive algorithm for any speed s 1 and 1+ 13 s 3.1 Bad example for SRPT Before we describe for any constant c 1 used to define the classes of jobs the proof, it is helpful to review an example input in A. instance for two machines that causes problems for We prove this by considering two different example SRPT without resource augmentation. Suppose 3 jobs input instances, one that is more effective for small c, are released at time 0 with lengths 1, 1, and 2. An optimal offline algorithm (Opt) will execute the job of and one that is more effective for large c. length 2 on one machine from time 0 to time 2 while 2c+1 1 Lemma 2.1. A is at best an s-speed c+2 s -competitive executing one of the jobs of size 1 on the second machine from time 0 to time 1 and the other job of size 1 on the algorithm for any c 1 and any s 1. second machine from time 1 to time 2. Thus, all jobs Proof. Consider the following input instance. Suppose released at time 0 are completed by time 2. SRPT, on m jobs of size c - where > 0 arrive at time 0, and m the other hand, will schedule the two jobs of size 1 on the jobs of size 1 arrive at time > 0. All 2m jobs belong two machines from time 0 to time 1 and the job of size 2 to the same class 0 at time since their initial and on either machine from time 1 to time 2. Thus, at time remaining sizes at time lie in the range [c0 , c1 ). A will 2, SRPT has not "kept up with" Opt; in particular, process the jobs of size c - first on each machine before SRPT completes less work in interval [0, 2] than Opt processing the jobs of size 1 on each machine resulting does. (We will define a formal notion of "keeping up" in a total flow time of (m/s)(2c - 2 + 1). The optimal in the building blocks section.) This pattern can now 352 be repeated with new jobs released at time 2 and so on time of 0 since no jobs would be scheduled after it on to cause SRPT to have an arbitrarily larger flow time its machine. Likewise, the job of size 2 incurs an idle than Opt. time of 0. To estimate Opt's flow time (or idle time) for each interval Ii , we develop a notion of a canonical 3.2 Pro of outline SRPT with resource augmenta- schedule for each such interval and argue that Opt's idle tion overcomes the issue in the following manner. The time cost cannot be smaller than the idle time cost of first step in our proof is to focus on SRPT with stretched the canonical schedule as computed by SRPT charging. input instances rather than faster machines. Based on Puting this together, we conclude that the total flow the speed-stretch theorem cited earlier, we can prove time of RS RP T (I ) is no larger than the total flow time our result if we show that SRPT is an s-stretch 1- of Opt(I ). competitive algorithm for s 2 - 1/m. We now need to show that the total flow time The next step is to break the input instance into incurred by RSRPT for I is at least the total flow intervals as defined by the release times of input I . Let time incurred by SRPT for I s . To prove this, we Ii be the interval defined by the ith and i + 1st release define a notion of containment and then ensure that at times. We can show that SRPT at the end of stretched each release time, S RP T (I s ) is always contained within s interval Ii is at least "keeping up with" Opt at the RS RP T (I ). Essentially, this containment property end of interval Ii . If we could show that the flow time means that the number of jobs with remaining length s incurred by SRPT on the stretched interval Ii is at most larger than the remaining length for a given job Jj in the flow time incurred by Opt on the interval Ii , we RS RP T (I ) at time t is at least as many as the number would be done. However, this often is not true because of jobs with remaining length larger than the remaining s the stretched interval Ii is s times longer than interval length for the equivalent job in S RP T (I s ) at time st. Ii . This containment property then allows us to argue on Instead, we define for analysis purposes only a a job by job basis that the idle time incurred by each Relaxed SRPT (RSRPT) algorithm. We then show job in RS RP T (I ) is at least as large as the idle time that RSRPT "keeps up with" Opt at the end of each incurred by the equivalent job in S RP T (I s ) when we interval Ii and that the flow time incurred by RSRPT apply SRPT charging to both schedules. Thus, by on each interval Ii is at most the flow time incurred transitivity, we are able to conclude that the idle time by Opt on Ii . The key idea behind RSRPT is that and thus total flow time of S RP T (I s ) is no larger than it can produce illegal schedules where some machines the idle time and total flow time of Opt(I ). We now may run for more than the total time in an interval. proceed to a more detailed description of this proof. This will be compensated by some machines running for less total time in the interval. To illustrate, consider 4 Building blo cks the bad example for SRPT cited earlier. The RSRPT 4.1 Interval and partial schedule notation For "schedule" for the interval from time 0 to time 2 will any input instance I , let r(I ) denote the number of be jagged. Machine one will run for three units of time distinct release times of jobs in I , and ri (I ) denote processing one of the jobs of size 1 from time 0 to time the ith release time in I for 1 i r(I ). When the 1 and the job of size 2 from time 1 to time 3. The other input instance I is unambiguous, we use the notation machine will run for only one unit of time processing ri for ri (I ). We use these release times to define time the other job of size 1 from time 0 to time 1. It may be intervals as follows. Time interval Ii is defined to be possible in extreme cases for RSRPT to allocate all of [ri (I ), ri+1 (I )) for 1 i r(I ) - 1. Time interval its processing in some interval to one machine and one Ir(I ) is defined to be [rr(I ) , ). We use Ii- to denote job. the time interval [r1 (I ), ri (I )) for 1 i r(I ). For Given that RSRPT may produce illegal jagged 1 i r(I ) - 1, we use |Ii | to denote the length of schedules for a given interval of time, we need to define a interval Ii . For any schedule S (I ) and 1 i r(I ), we way to compute a flow time cost for RSRPT during this define S (Ii ) and S (Ii- ) to be the partial schedules of interval. We develop a charging scheme that we name S (I ) for intervals Ii and Ii- , respectively. SRPT charging that computes the idle time incurred by For example, consider the input instance on two each job in the jagged schedule assuming we processed machines where two jobs of size 1 and one job of size 2 all jobs to completion using SRPT and that no new jobs are released at time 0 and two jobs of size 3 are released arrived. For example, the job of size 1 on the machine at time 2. Then r(I ) = 2, r1 (I ) = 0, r2 (I ) = 2, one would incur an idle time of 1 since the job of size I1 = [0, 2), I1- = [0, 0) which is empty, I2 = [2, ), 2 was idle for one unit of time waiting for the first job and I2- = I1 = [0, 2). The partial schedule S RP T (I1 ) to finish. The other job of size 1 would incur an idle (which is also partial schedule S RP T (I2- )) is to run 353 the two jobs of size 1 from time 0 to time 1 and the one job of size 2 from time 1 to 2 on one of the machines. The partial schedule S RP T (I2 ) is to run the job of size 2 from time 2 to 3, one of the jobs of size 3 on the same machine from time 3 to time 6, and the other job of size 3 on the other machine from time 2 to time 5. 4.2 Profiles We will compare different schedules for an input instance I at the r(I ) different release times of I . To facilitate this comparison, we introduce the notion of a profile of a schedule and the notion of one profile being larger than another. This is the formalization of a schedule "keeping up with" another schedule. Definition 4.1. Let I be any input instance, and let S (I ) be a legal schedule for I . We define the profile of schedule S (I ) at release time ri for 1 i r(I ), denoted S (I , i), to be the non-decreasing vector of remaining lengths of al l jobs that were released up to but not including time ri . We use S [I , i] to denote the profile that includes jobs released at time ri . We define |S (I , i)| and |S [I , i]| to be the number of elements in profiles S (I , i) and S [I , i], respectively. 1. |P1 | |P2 |. 2. For 1 i |P1 |, the ith element of profile P1 is no larger than the ith element of profile P2 . We denote this by writing P1 P2 . Concepts similar to a profile have been used in many other papers analyzing SRPT and other algorithms for minimizing total flow time and other ob jective functions. One key point about our definition of profiles is that we include the jobs with remaining length 0 in the vector. For example, consider a uniprocessor environment and an input instance I where jobs of size 1, 1, and 2 are released at time 0 and a job of size 1 is released at time 2. Suppose schedule S (I ) schedules the job of size 2 on the single machine from time 0 to time 2. Then S RP T (I , 2) = 0, 0, 2 S (I , 2) = 0, 1, 1 . However, it is not true that S RP T (I , 2) S (I , 2). Furthermore, S RP T (I , 2)[2] = 0 while S (I , 2)[2] = 1. We will sometimes use a profile P as an input instance to this problem with a single release time. Specifically, profile P is an input instance with |P | jobs, the ith smallest job of the instance has the size of the ith smallest element of P , and all jobs are assumed to be released simultaneously. Sometimes, for convenience, we will assume that the jobs of size 0 do not exist and remove them from the instance. Typically, though, this is not necessary. We now list a few easy to prove observations about profiles. Definition 4.2. We define S (I , i)[j ] and S [I , i][j ] to be the jobs with the j th smal lest remaining length in profiles S (I , i) and S [I , i], respectively. If two jobs tie for the j th smal lest remaining length, S (I , i)[j ] (or S [I , i][j ]) is the one that received more processing time in partial schedule S (Ii-1 ). If there is stil l a tie in amount of processing time received in S (Ii-1 ), ties are broken arbitrarily. We overload notation and also use Fact 4.1. Let P1 and P2 be two arbitrary profiles such S (I , i)[j ] and S [I , i][j ] to denote that job's remaining that P1 P2 . Then the sum of al l remaining lengths in length at release time ri . P1 is no larger than the sum of al l remaining lengths in P2 . We can define profiles at any time, not just release times, but we will only use profiles at release times Proof. This follows from the definition of P1 P2 . in this paper. When working with general profiles independent of a specific schedule or input instance, we Fact 4.2. Let P1 and P2 be two profiles such that will use the notation P or Pi to denote a profile and the P1 P2 (P1 P2 , respectively). If we add a job of size notation |P | or |Pi | to denote that profile's number of x to both P1 and P2 to create P1 and P2 , then P1 P2 (P1 P2 , respectively). elements. Definition 4.3. We say that a profile P1 is smaller Corollary 4.1. Let I be any input instance. For any than another profile P2 if the fol lowing conditions hold: 1 i r(I ) and s1 , s2 1, and any schedules S1 and S2 such that S1 (I s1 , i) S2 (I s2 , i) (S1 (I s1 , i) 1. |P1 | |P2 |. S2 (I s2 , i), respectively), then S1 [I s1 , i] S2 [I s2 , i] s1 s2 2. For 1 i |P |, the sum of the first i elements (S1 [I , i] S2 [I , i], respectively). 1 of profile P1 is no larger than the sum of the first i Continuing the example started above, this implies that elements of profile P2 . S RP T [I , 2] = 0, 0, 1, 2 S [I , 2] = 0, 1, 1, 1 . We denote this by writing P1 P2 . 4.3 Building blo cks from previous work There Definition 4.4. We say that a profile P1 is contained are two critical ideas we need to borrow from the paper by another profile P2 if the fol lowing conditions hold: of Phillips et al. The first is the speed-stretch theorem 354 cited earlier. Based on this theorem, we can prove our result if we show that SRPT is an s-stretch 1competitive algorithm for s 2 - 1/m. Thus, we consider SRPT with stretched input instances rather than faster machines throughout most of this paper. Also, as noted earlier, total flow time is minimized exactly when total idle time is minimized. Thus, our goal is to show that SRPT with stretched input instances incurs no more idle time than the optimal algorithm does with the unstretched input instance. The second critical idea is a generalization of the proof that SRPT is a (2 - 1/m)-speed 1-competitive algorithm. That proof worked in two steps. In the first step, Phillips et al. showed that any busy scheduling algorithm, when given speed-(2 - 1/m) machines, performs at least as much work by any given time as any other schedule on any input instance. They extended this to also conclude that SRPT, given speed-(2 - 1/m) machines, completes at least as many jobs by any given time as any other schedule on any input instance. We generalize Phillips et al.'s result to show that SRPT on I s is at least "keeping up with" Opt on I . Corollary 4.2. Consider any input instance I , any legal speed-1 schedule S (I ), and any i 1 i r(I ). If s 2 - 1/m, then S RP T (I s , i) S (I , i). Proof. The proof is essentially identical to that of Phillips et al. It also holds for any time t, not just release times ri . 4.4 Canonical schedules We now define the notion of a canonical partial schedule S (Ii ). Definition 4.5. For any input instance I and 1 i r(I ), partial schedule S (Ii ) is canonical if it has the fol lowing properties: 1. No job that is finished in S (Ii ) had a remaining length at time ri larger than any job that is not completed in S (Ii ). 2. Each machine processes at most one job that is not completed in S (Ii ). 3. Suppose jobs Jj and Jk are processed but not completed in S (Ii ) and that job Jj receives more processing time in S (Ii ) than Jk . Then job Jj had a smal ler remaining length at time ri than job Jk did. 4. On any machine, no unfinished job is processed prior to any finished job in S (Ii ). Lemma 4.1. Any legal partial schedule S (Ii ) for interval Ii can be converted into a canonical schedule S (Ii ) for interval Ii such that S (Ii ) S (I ) and S (Ii ) incurs no more id le time than S (Ii ) does. Proofsketch. The proof is to use job swapping arguments. We first swap jobs to ensure that there are no finished jobs in interval Ii that are longer (at time ri ) than any unfinished jobs in interval Ii . We then swap jobs to ensure that the unfinished jobs with smallest length are processed whenver there is a choice of unfinished jobs to work on and that these unfinished jobs are processed on a single machine. Finally, we swap jobs to ensure that finished jobs are processed before unfinished jobs on the same machine. 4.5 SRPT charging We now define a method for computing a lower bound on the amount of idle time incurred by a canonical partial schedule. We will also use this to determine the amount of idle time incurred by RSRPT (still to be defined) on a partial schedule. We term this method SRPT Charging as we charge the canonical partial schedule as if it ran its jobs as SRPT would. This may produce an illegal partial schedule in some cases. For example, suppose the canonical two machine partial schedule for jobs of size 1, 1, 2, 3, 4, 5 in the interval [0, 2] is to run the two jobs of size 1 on one machine and one job of size 2 on the second machine. If we assume that SRPT runs the exact same jobs, we get instead that the two jobs of size 1 run simultaneously from time 0 to time 1 and the job of size 2 is processed from time 1 to time 3. This leaves a jagged pattern with one machine finishing at time 1 and the other finishing at time 3 rather than both finishing at time 2. In order to charge idle times, we use the following ordering on jobs. We number the jobs in non-decreasing order of remaining length at the beginning of the interval, breaking ties arbitrarily for completed jobs. For jobs that are not completed in this partial schedule, we break ties by the amount each job is processed. If one job is processed more in the partial schedule, it receives a smaller number. Otherwise, ties are broken arbitrarily. Let k be the total number of jobs with nonzero remaining lengths at the beginning of the interval. We then charge idle time as follows. Suppose job numbered j is processed for ej time units in this interval. We then say that jobs j + m, j + 2m, . . . incur ej units of idle time and we will charge these incurred idle times to job j . For our example, the first job of size 1 is charged 1 unit of idle time each for the jobs of size 2 and 4. The second job of size 1 is charged 1 unit of idle time each for the jobs of size 3 and 5. The job of size 2 is charged 2 units of idle time for the job of size 4. Thus, a total of 6 units of idle time are incurred. In the actual canonical schedule, the second job of size 1 incurs one unit of idle time, and each of the jobs of size 3, 4 and 5 each incur two units of idle time for a total of 7 units 355 of idle time. Thus, the idle time incurred by applying SRPT charging is at most (in this case smaller than) the actual idle time incurred. We now prove that this is always the case for any canonical schedule. Lemma 4.2. The actual amount of id le time incurred by any canonical partial schedule is at least the amount of id le time incurred by applying SRPT charging to that canonical schedule. Proofsketch. Let J denote the set of jobs at the beginning of the interval of Ii . Let C (Ii ) be the canonical partial schedule under consideration. Let us ignore all jobs that arrive after time ri ; that is, we assume the single release time case where our goal is to schedule the jobs in J in order to minimize total idle time. It is known that SRPT minimizes total flow time and total idle time in cases where there is a single release time. Thus S RP T (J ) incurs no more idle time than that incurred by C (Ii ) plus any legal schedule to complete all the jobs in J after interval Ii ends. We can apply SRPT charging to S RP T (J ) to exactly compute the total idle time of S RP T (J ). We can decompose this cost into two components: SRPT charging of canonical partial schedule C (Ii ) and SRPT charging of the remainder of S RP T (J ). Since S RP T (J ) is optimal, this total idle time cost must be no more than the true idle time cost of C (Ii ) plus the minimum idle time cost for completing the remaining jobs after the end of the interval. By the property of being a canonical schedule, the order of remaining processing times of jobs not completed in partial schedule C (Ii ) is identical to their order at the beginning of the interval. Thus, a minimum idle time cost for completing the remaining jobs after the end of the interval is to apply SRPT to the remaining jobs. Thus, the minimum idle time cost for completing these jobs is identical to the cost of SRPT charging the remaining jobs after the end of the interval. This means that SRPT charging of C (I ) must be no larger than the actual idle time cost incurred by C (I ). 5 Pro of of main result We now prove the main result. Let I be any input instance. Consider any optimal schedule Opt(I ) for input instance I . For each interval Ii , we lower bound the idle time incurred by partial schedule Opt(Ii ) by working with the idle time incurred by the corresponding canonical partial schedule C (Ii ). We would like to show that the total idle time incurred by each canonical partial schedule C (Ii ) is at least the idle time incurred by s S RP T (Ii ) for s 2 - 1/m. However, this may not be s true as Ii is s times longer than Ii . 5.1 Relaxed SRPT (RSRPT) In order to be able to implement such a comparison, we define for analysis purposes only the Relaxed Shortest Remaining Processing Time (RSRPT) schedule. We will show that the idle time incurred by canonical partial schedule C (Ii ) is at least the idle time incurred by RS RP T (Ii ) and that the total idle time incurred by RS RP T (I ) is at least the total idle time incurred by S RP T (I s ). We construct RS RP T (I ) interval by interval. One factor driving the formation of RS RP T (Ii ) will be the schedule Opt(Ii- ) concatenated with schedule C (Ii ). We will use the notation OptC (Ii ) to denote this concatenated schedule. The other factor we will use to s construct RS RP T (Ii ) is S RP T (I(i+1)- ). Any of the partial schedules RS RP T (Ii ) may be illegal, but we still concatenate them together to form RS RP T (I ). To determine the idle time incurred by these potentially illegal partial schedules, we use SRPT charging. It is important to note that RS RP T is dependent on the stretch factor s. Technically, we should include s as a parameter in the definition of RS RP T (I ), but we choose to omit it to simplify notation. Before we construct RS RP T (Ii ), we define the following quantities based on OptC (Ii ). Let OptC (I , i + 1) denote the profile at the end of OptC (Ii ) not including jobs released at time ri+1 . Let k be the number of jobs with zero remaining length (some of these jobs may have been completed in Opt(Ii- )). Finally, let jobs OptC (I , i + 1)[k + 1] through OptC (I , i + 1)[k + l] denote the jobs that received some processing in C (Ii ) but are not complete by time ri+1 , and let ej denote the processing time received by job OptC (I , i + 1)[k + j ]. Now consider profile RS RP T [I , i], the vector of remaining lengths of jobs to be processed by RSRPT in interval Ii . We first give enough processing time to complete jobs RS RP T [I , i][j ] for 1 j k . That is, RSRPT must complete at least k jobs by the end of RS RP T (Ii ). Note, some of these jobs may start with zero remaining length meaning they were completed in RS RP T (Ii- ). Now for 1 j l, we assign the minimum of ej and the time required to complete job RS RP T [I , i][k + j ] to this job in RS RP T (Ii ). That is, either job RS RP T [I , i][k + j ] is completed or it receives ej processing time. Let q be the largest value of j for 0 j l such that RS RP T [I , i][k + j ] is completed by the above assignment. Let X denote the total amount of processing C (Ii ) devoted to jobs OptC (I , i + 1)[j ] for 1 j k + q minus the total amount of processing RS RP T (Ii ) devoted to jobs RS RP T [I , i][j ]. We will prove later that X must be nonnegative. The remainder of the construction of RS RP T (Ii ) is characterized by the following pseudocode. Intuitively, we apply the excess 356 processing time to the remaining jobs with priority given to the shortest remaining jobs. The limits on the application of processing to a job are either the amount needed to complete it or the amount received by the corresponding job in the profile S RP T (I s , i + 1). The second limit is given to ensure that S RP T (I s , i + 1) is contained in RS RP T (I , i + 1). · Let y = k + q + 1. · Define Xy = X . · While Xy > 0 { ­ Add processing time to RS RP T [I , i][y ] until This implies that these same k + l - z jobs must also be complete in S RP T (I s , i + 1) since removing the extra jobs will not cause these jobs to receive any less processing time in interval Ii . Furthermore, there must be holes of size at least el-z+1 through el into which jobs S RP T [I s , i][j ] for k + l - z + 1 j k + l can be slotted with the slots being assigned to jobs in inverse order of size; that is, the smallest job gets the largest slot. Thus, the result holds. Corollary 5.1. S RP T (I s , i) RS RP T (I , i) for 1 i r(I ). Proof. This is by induction on i. The base case with i = 1 is trivially true as the profile RS RP T (I , 1) is RS RP T [I , i][y ] is complete identical to that of S RP T (I s , 1) as no jobs have been We have added Xy processing time, processed yet. or RS RP T (I , i + 1)[y ] = S RP T (I s , i + Let us now assume the result holds for 1 i < r(I ) 1)[y ]. and we wish to show it holds for i+1. We first observe by s ­ Let ay be the amount of processing time we Corollary 4.1 that S RP T (I , i) RS RP T (I , i) implies s that S RP T [I , i] RS RP T [I , i]. added to this job. Looking at how we construct RS RP T (Ii ), the re­ Let Xy+1 = Xy - ay . sult follows as we always strive to maintain the con­ Increment y by 1 tainment property. The only place where containment might be violated is the assignment of ej units ·} to jobs RS RP T [I , i][k + j ] for 1 j l. However, s Lemma 5.1 shows that S RP T (Ii ) either finishes job 5.2 RSRPT incurs at least as much idle time s S RP T [I , i][k + j ] or assigns it at least ej units of proas stretched SRPT The key to proving that RSRPT cessing. incurs at least as much idle time as SRPT does on a stretched input instance is showing that the profiles We now argue on a job by job basis that each job created by SRPT on I s are always contained within in S RP T (I s ) delays other jobs by less time than they the profiles created by RSRPT on I . To prove this do in RS RP T (I ). Thus, the total idle time incurred by containment property, we need the following result. SRPT on I s is no more than the total idle time incurred by RSRPT on I . Lemma 5.1. Consider job S RP T [I s , i][k + j ] for 1 j l. Either this job has 0 remaining processing time in Theorem 5.1. For any input instance I , the total id le S RP T (I s , i + 1) or SRPT applied at least ej processing time of S RP T (I s ) is no larger than the total id le time s time to this job in interval Ii . incurred by RS RP T (I ). Proofsketch. Consider input instance I . Suppose at release time ri , we release l additional jobs of size e1 through ej . Let this modified input instance be denoted . by I and its stretched variant as I s i We can reorganize OptC (I ) to finish these l new jobs instead of processing jobs OptC (I , i + 1)[j ] for k + 1 j k + l in C (Ii ). Thus, OptC (I , i + 1) will now have k + l complete jobs. , By Corollary 4.2, we know that S RP T (I s i + 1) , s, OptC (I i + 1) which means that S RP T (I i + 1) must have at least k + l complete jobs. Suppose z l of these k + l complete jobs are the newly injected jobs. If so, these z jobs are the z smallest ones with sizes el , el-1 , . . ., el-z+1 . It then follows that the k + l - z shortest jobs , in S RP T [I s , i] must be complete in S RP T (I s i + 1). Proofsketch. Consider an arbitrary job Jj in I and I s . Our claim is that the amount of idle time incurred by other jobs that is charged to Jj in S RP T (I s ) by the SRPT-charging scheme is no more than the idle time incurred by other jobs that is charged to the corresponding job in RS RP T (I ) (with some possible swapping due to tie-breaking). This follows from the fact that for 1 i r(I ), S RP T (I s , ri ) RS RP T (I , i). Thus, whenever job Jj is executed in S RP T (I s ), the number of jobs larger than it is at most the number of jobs larger than its corresponding job in RS RP T (I ). The result follows. 5.3 Opt incurs at least as much idle time as RSRPT 357 Lemma 5.2. For 1 i r(I ), RS RP T (I , i) 1-competitive algorithm? This question addresses the OptC (I , i). tradeoff between faster machines and lack of knowledge of the future. Proofsketch. We prove this by induction on i. The base case with i = 1 is trivially true as the profiles are References identical because no jobs have been processed yet. Let us now assume the result holds for i < r(I ). We now need to show the result holds for i + 1. We first ob[1] E. Anderson and C. Potts. On-line scheduling of a serve that RS RP T (I , i) OptC (I , i) and OptC (I , i) single machine to minimize total weighted completion time. In Proceedings of the 13th ACM-SIAM SympoOpt(I , i) imply that RS RP T (I , i) Opt(I , i). We sium on Discrete Algorithms, pages 548­557, 2002. next observe that RS RP T (I , i) Opt(I , i) implies that [2] Nir Avrahami and Yossi Azar. Minimizing total flow RS RP T [I , i] Opt[I , i]. time and total completion time with immediate disWe now argue that RSRPT spends no more time on patching. In Proceedings of the 15th ACM Symposium the first k completed jobs in Ii than C does. Suppose on Paral lel Algorithms and Architectures, pages 11­18. this were not true. That would mean that RSRPT ACM, 2003. spends Y > 0 extra time units on the first completed k [3] Baruch Awerbuch, Yossi Azar, Stefano Leonardi, and jobs than C does. This would mean that the prefix sum Oded Regev. Minimizing the flow time without miof the first k jobs in Opt[I , i] must be Y smaller than gration. SIAM Journal on Computing, 31:1370­1382, the prefix sum of the first k jobs in RS RP T [I , i], but 2001. this is not possible because RS RP T [I , i] Opt[I , i]. [4] Chandra Chekuri, Sanjeev Khanna, and Amit Kumar. Multi-processor scheduling to minimize Lp norms of This also proves that the value X in the construction of flow and stretch. Manuscript, 2003. RS RP T (Ii ) must be non-negative. [5] Chandra Chekuri, Sanjeev Khanna, and An Zhu. AlSuppose that RS RP T (I , i + 1) is not less than gorithms for weighted flow time. In Proceedings of the OptC (I , i + 1). Then there must be a smallest integer j 31st Annual ACM Symposium on Theory of Computsuch that the sum of the first j jobs of RS RP T (I , i + 1) ing, pages 84­93. ACM, 2001. is strictly larger than the sum of the first j jobs of [6] C. Coulston and P. Berman. Speed is more powerful OptC (I , i + 1). A similar argument shows that no such than clairvoyance. Nordic Journal of Computing, j exists. 6:181­193, 1999. This leads to the following result. Corollary 5.2. The id le time incurred by partial schedule RS RP T (Ii ) must be at most the id le time incurred by partial schedule C (Ii ) within OptC (Ii ) as determined by SRPT charging. This corollary clearly implies that the the total idle time incurred by RS RP T (I ) is no larger than than the idle time incurred by Opt(I ), and our result is proven. 6 Op en problems We have shown that SRPT optimally uses sufficiently faster machines with respect to minimizing total flow time. Some interesting open problems include the following. Do any non-migratory algorithms also have this property? We have shown that existing nonmigratory algorithms are not s-speed 1/s-competitive algorithms for any s 1. Another interesting question is to analyze how well SRPT performs using machines of speed 1 < s < 2 - 1/m? In particular, can we show that SRPT is a (1 + )-speed O(1)-competitive algorithm for minimizing total flow time (as well as other metrics)? Also, what is the smallest s such that SRPT (or any other online algorithm) is an s-speed [7] J. Edmonds. Scheduling in the dark. Theoretical Computer Science, 235:109­141, 2000. [8] B. Kalyanasundaram and K. Pruhs. Speed is as powerful as clairvoyance. Journal of the ACM (JACM), 47:617­643, 2000. [9] S. Leonardi. A simpler proof of preemptive flow-time approximation. In Approximation and On-line Algorithms, Lecture Notes in Computer Science. Springer, 2003. [10] S. Leonardi and D. Raz. Approximating total flow time on parallel machines. In Proceedings of the 27th Annual ACM Symposium on Theory of Computing, pages 110­ 119, 1997. [11] C. Phillips, C. Stein, E. Torng, and J. Wein. Optimal time-critical scheduling via resource augmentation. Algorithmica, 32:163­200, 2002. 358