Facility Location with Service Installation Costs (Extended Abstract) David B. Shmoys Abstract We consider a generalization of the uncapacitated facility location problem which we call Facility Location with Service Instal lation Costs. We are given a set of facilities, F , a set of demands or clients D, and a set of services S . Each facility i has a facility opening cost fi , and we have a service instal lation cost of fil for every facility-service pair (i, l). Each client j in D requests a sp ecific service g (j ) S and the cost of assigning a client j to facility i is given by cij . We want to op en a set of facilities, install services at the op en facilities, and assign each client j to an op en facility at which service g (j ) is installed, so as to minimize the sum of the facility op ening costs, the service installation costs and the client assignment costs. Our main result is a primal-dual 6-approximation algorithm under the assumption that there is an ordering on the facilities such that if i comes b efore i in this ordering then for every service typ e l, fil fil . This includes (as sp ecial cases) the settings where the service installation cost fil dep ends only on the service typ e l, or dep ends only on the location i. With arbitrary service installation costs, the problem b ecomes as hard as the set-cover problem. Our algorithm extends the algorithm of Jain & Vazirani [9] in a novel way. If the service installation cost dep ends only on the service typ e and not on the location, we give an LP rounding algorithm that attains an improved approximation ratio of 2.391. The algorithm combines b oth clustered randomized rounding [6] and the filtering based technique of [10, 14]. We also consider the k-median version of the problem where there is an additional requirement that at most k facilities may b e op ened. We use our primal-dual algorithm to give a constant-factor approximation for this problem when the service installation cost dep ends only on the service typ e. Chaitanya Swamy Retsef Levi 1 Intro duction Facility location problems have been widely studied in the Operations Research community (see for e.g. [12]). In its simplest version, uncapacitated facility lo cation (UFL), we are given a set of facilities, F , and a set of demands or clients D. Each facility i has a facility opening cost fi and the cost of assigning a client j to facility i is given by cij . We want to open some facilities {shmoys,swamy}@cs.cornell.edu. Dept. of Computer Science, Cornell University, Ithaca, NY 14853. Research supported partially by NSF grant CCR-9912422. rl227@cornell.edu. School of ORIE, Cornell University, Ithaca, NY 14853. Research supported partially by a grant from Motorola and NSF grant CCR-9912422. from the set F and assign each demand to an open facility. The goal is to minimize the sum of the facility opening costs and the client assignment costs. This problem has a wide range of applications. For example, a company might want to open its warehouses at some lo cations so that its total cost of opening warehouses and servicing customers is minimized. In various applications, the clients are differentiated according to the kind of service they require and to satisfy the service requirement of a client we have to assign it to a facility that can provide the service required by the client. For example, in the warehouse lo cation problem above, the customers may be retail stores that request different kinds of supplies. A warehouse may store different kinds of supplies. To satisfy a customer we have to assign it to a warehouse that holds inventory of the type requested by the customer. We mo del such a setting by saying that in addition to facilities and clients, we have a set of services S . Each client j in D requests a specific service g (j ) S . To satisfy client j we have to assign it to an open facility on which service g (j ) is installed. Further, if we install service l on an open facility i we incur a service instal lation cost of fil . We want to open a set of facilities, install services at the open facilities, and assign each client j to an open facility i such that service g (j ) is installed at i. The cost of a solution is the sum of the facility opening costs, the service installation costs and the client assignment costs, and the goal is to find a solution with minimum total cost. We call this problem, Facility Location with Service Instal lation Costs. In the warehouse lo cation problem, the service installation cost corresponds to the initial cost of setting up the warehouse to store the particular kind of inventory. The notion of service-dependent fixed costs is also used in inventory problems where one incurs a joint setup cost to start a new order and an item-dependent fixed cost to order a specific item, so one needs to coordinate the placement of item orders; see [1] for a survey. We assume throughout that the assignment costs cij form a metric. Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 1088 Applications and Related Work. Facility lo cation with service installation costs can be used to mo del a data management/caching problem. Here we are given a set of lo cations in a network at which caches may be built and a set of pro cesses lo cated at the no des of the network that request data items. Each pro cess requests a specific data item. To satisfy the request we must assign the process to a cache that stores the requested data item, incurring an access cost proportional to the distance between the pro cess site and the cache lo cation. Building a cache at a location incurs a lo cation dependent cost and storing a data item in a cache at a particular lo cation incurs a cost that depends on the data item and the lo cation. The goal is to build caches, store data items in the caches and assign each pro cess to a cache containing the data item requested by the pro cess, so as to minimize the total cost of building caches, storing data items and the access cost of requests. This is exactly the facility lo cation problem with service installation costs where the caches are facilities, the pro cesses are clients and the data items correspond to services. Baev & Ra jaraman [3] considered a closely related problem called the data placement problem and gave a 20.5approximation algorithm. Here caches of fixed capacity are already built at certain locations and the goal is to find a placement of data items to caches respecting the cache capacities that minimizes the sum of the access costs and the cost of storing data items. The ratio has recently been improved by Swamy [16]. Facility location with service installation costs is a generalization of UFL -- if there is just one service type then this is simply the uncapacitated facility lo cation problem. There is a large bo dy of literature that deals with designing approximation algorithms for metric UFL and we sample only a few results below; see [13] for a survey of this and earlier work. Shmoys, Tardos & Aardal [14] gave the first constantfactor approximation algorithm for this problem using the filtering technique of Lin & Vitter [10] to round the optimal solution of a linear program. Chudak & 1 2Shmoys [5, 6] gave an LP rounding based +e approximation algorithm. They combined randomized rounding and the decomposition results of [14] to get a variant that might be called clustered randomized rounding. Sviridenko [15] improved the ratio to 1.58. Jain & Vazirani [9] gave a combinatorial primal-dual 3-approximation algorithm where the LP is used only in the analysis. The current best ratio for UFL is 1.52 [11] obtained by building upon a dual-fitting based greedy algorithm of Jain, Mahdian, Markakis, Saberi & Vazirani [8]. Our Results. Our main result is a primal-dual 6approximation algorithm for the facility lo cation problem with service installation costs under the assumption that there is an ordering on the facilities such that if i comes before i in this ordering then for every service type l, fil fil . This is reasonable in many settings; for example, one expects the inventory setup cost of a warehouse in New York city to be less than the inventory setup cost in a remote town like Ithaca regardless of the kind of inventory. As special cases this includes the cases where the service installation cost fil depends only on the service type l, or depends only on the location i. In the former setting where the service installation cost depends only on the service type we give an algorithm based on LP rounding that attains a much improved approximation ratio of 2.391. We show that with arbitrary service installation costs the problem becomes as hard as the set-cover problem. Combined with the result of Feige [7], this shows that no polynomial-time algorithm with a ratio of (1 - ) ln |D| exists for this problem in the general case unless NP DTIME[nO(log log n) ]. We also consider the k -median version of the problem where there is an additional requirement that at most k facilities may be opened. We use our primal-dual algorithm to give a constant-factor approximation for this problem when the service installation cost depends only on the service type. Our Techniques. Facility lo cation with service installation costs is a generalization of UFL. It differs however from traditional multi-level extensions of facility location where we assume that a demand can be assigned to any facility in any level. In our problem demand j may only be assigned to a level 1 "facility" (i, g (j )) and then to facility i in level 2. Moreover, existing techniques for UFL and multi-level facility lo cation do not readily generalize. If there were no facility opening costs we could decouple the problem into several UFL instances, one for each service type, and solve each one separately. With facility opening costs this approach fares badly since we may end up opening a lot of facilities and spend to o much on the facility opening costs. All known algorithms for UFL rely on the fact, either in the design or the analysis, that a client j can be moved from a facility i to another nearby facility i without increasing its assignment cost by much, and leaving the facility opening cost unchanged. In our problem, reassigning j to i may now require us to install service g (j ) on i causg (j ) ing us to pay the installation cost fi which could b e large. Technically the hard part is to find a way to reassign clients to nearby facilities so that we do not pay to o much to install services at the new lo cations. With arbitrary service installation costs such a reassignment 1089 need not be possible since we can enco de the constraint that a client may only be assigned to a specific set of facilities, making the problem set-cover hard. We build upon the primal-dual algorithm of [9] in a novel way and give a 6-approximation algorithm under the assumption that the facilities are ordered so that if i comes before i then fil fil for every service l. At a high level, the idea is to consider an integer programming formulation of the problem and the dual of its linear programming relaxation, and construct simultaneously an integer primal solution and a dual solution. Each client j has a dual variable j which can be interpreted intuitively as the payment that j is willing to make to get itself assigned to an open facility. The Jain-Vazirani (JV) algorithm [9] for UFL works in two phases. In phase I we grow each dual variable j uniformly and gradually build a primal feasible solution. Once j becomes equal to cij for some facility i, j starts paying toward the facility opening cost of i. When the total contribution to i from the various clients equals fi , we declare i to be tentatively open and assign all the unassigned clients contributing toward i to i. Phase I ends when each client is assigned to a tentatively open facility. At this point a client could be contributing towards multiple tentatively open facilities. We call a set of facilities independent if each client contributes towards at most one facility in the set. In phase II we select a maximal independent subset of tentatively open facilities and open these. The analysis shows that if the facility i to which a client j was assigned in phase I is not opened, then there is a "nearby" open facility i to which j can be reassigned. Our algorithm also pro ceeds in phases. During phase I we tentatively open some facilities, tentatively install service l on some facilities for each service type l, and assign each client j to a tentatively open facility on which service g (j ) is tentatively installed. Phase II is more involved. We have to select a set of facilities to open and install services in such a way that we can, (1) pay for installing the services, and (2) ensure that if a client j has to be reassigned there is a nearby open facility on which service g (j ) is installed. We show that we can achieve properties (1) and (2) if we look at the facilities in a particular order and pick a maximal independent subset greedily. This gives us a 6-approximation algorithm. Our primal-dual algorithm exploits the property that in the JV algorithm any maximal independent set of tentatively open locations may be picked. Although this is a well-known fact, to our knowledge this has been used in only a couple of applications previously. Bartal, Charikar and Raz [4] consider a clustering problem and use the JV algorithm to solve a relaxation of the problem by picking an appropriate maximal independent set. Archer, Ra jagopalan and Shmoys [2] pick a maximum independent set and use this to prove a bound on the integrality gap of the k -median LP. When the service installation cost depends only on the service type, we give an LP-rounding algorithm that combines clustered randomized rounding [6] and the filtering based technique of [10, 14]. A feature of the algorithm is that we bound distances cij using both the j bound due to complementary slackness and the bound obtained by filtering. This gives a better performance guarantee than that obtained by using either of the two bounds separately. Sviridenko [15] also used the two bounds in conjunction to tmprove the i 1 approximation ratio for UFL from + 2 o 1.58. e 2 A Linear Program We can formulate the problem as an integer program and relax the integrality constraints to get a linear program. We use i to index the facilities in F , j to index the clients in D and l to index the services in S . i il ji l min fi yi + fil yi + cij xij (P) s.t. i xij 1 xij yi xij yi l xij , yi , yi g (j ) j i, j i, j i, j, l. 0 l Variable yi indicates if facility i is open, yi indicates if service type l is installed at i, and xij indicates if client j is connected to facility i. The first constraint states that each client must be assigned to a facility, the second and the third constraints say that if client j is assigned to facility i, then service g (j ) must be installed on i and i must be open. An integral solution corresponds exactly to a solution to our problem. Let Gl be the set of clients requesting service l. The dual program is, j j (D) max s.t. j Gl j cij + ij + ij ij j fil i, j i, l i i, j. (2.1) ij fi (2.2) j , ij , ij 0 Intuitively j is the budget that j is willing to spend to get itself assigned to an open facility. Constraint (2.1) says that a part of this go es towards paying for 1090 the assignment cost cij . The rest gets divided into a payment for the service installation cost ij , and a payment for the facility opening cost ij . 3 A Primal-Dual Algorithm case, we tentatively open i. For each demand j , we do not increase ij from now on. If demand j is tight with i and service g (j ) is tentatively installed at i, we freeze j (and no longer increase j ). We only raise the j , ij , ij of unfrozen demands. Frozen demands do not participate in any events. We continue this pro cess until all demands become frozen. Let (, , ) denote the final dual solution obtained by the above pro cess. Observe that if i is the facility that caused j to freeze, then service g (j ) must be tentatively installed at i, and i must be tentatively open. We now specify which facilities to open, how to install services on facilities, and how to assign demands to facilities. Let F be the set of tentatively open facilities, and let Fl F be the set of tentatively open facilities on which service l is tentatively installed. For facility i F , let ti be the time at which i became tentatively open. If i Fl , let til be the time at which service l was tentatively installed at i. Op ening facilities. We open a subset of facilities from F . We say that i, i F are dependent if there is a demand j such that both ij and i j are positive. We consider the facilities in F in the order given by O and pick a maximal independent set of facilities, F F . We open the facilities in F . Installing services. Consider service type l and the set of facilities Fl . We say that facilities i, i Fl are service-l-dependent if there exists some demand j in Gl such that both ij and i j are positive. We pick a maximal independent subset Fl by lo oking at facilities in Fl in a particular order: first we consider facilities in Fl F in increasing order of til , and then facilities in Fl - F in increasing order of ti . We first install service l on all facilities in Fl F . Furthermore, for each i Fl - F , we pick a facility i F such that i and i are dependent and i i (in the ordering given by O), and install service l on facility i . We say that i is the neighbor of i and denote it by nbr(i). Note that nbr(.) depends only on i and not on the service type l: if i F , then we can cho ose a single / facility i F such that i i regardless of the service type l. Assigning demands. We assign each client j to the nearest open facility at which service g (j ) is installed. 3.1 Analysis. We now bound the performance of our algorithm. The following lemma just says what it means for a demand j to get frozen. Lemma 3.1. Let i be the facility that causes a demand We consider instances of the problem where there is an ordering on the facilities in F such that if i comes before i in this ordering then for every service type l, fil fil . Equivalently this means that for any two lo cations i, i , f fT the vectors il l=1...|S | and il T 1...|S | are comparable. l= Let O denote this total ordering on the facilities. We say that i i if i comes before i in the ordering O. The algorithm is strongly motivated by the primaldual algorithm of Jain and Vazirani for the traditional uncapacitated facility lo cation algorithm. In our algorithm, we first construct a feasible dual solution, and then use this dual solution to extract a feasible (integer) primal solution. As has been the norm, our algorithm is a dual ascent algorithm, so all dual variables are only increased throughout the execution of the algorithm. We next describe the algorithm. There is a notion of time around which the algorithm is specified. We start at time t = 0, all dual variables are initialized to 0, each demand j is said to be unfrozen, and all facilities are closed. At first, the variables that are increased are the j s; more precisely, for any unfrozen demand j , j is always equal to the time t. We say that demand j is tight with facility i, or has reached i, if j cij . As time increases, we will freeze demand points, tentatively open facilities, and tentatively instal l services at facilities. We increase the j of each demand j until one of the following events happens: 1. Suppose that demand j becomes tight with facility i. If service g (j ) is not tentatively installed at i, then we start increasing ij at the same rate as j ; that is, if j = t, then ij = t - cij . If service g (j ) is tentatively installed, but i is not tentatively open, we instead increase ij at the same rate as j ; that is, if j = t, then ij remains 0, but ij = t - cij . Finally, if service g (j ) is tentatively installed, and i is tentatively open, we freeze demand point j (and no longer increase j ). 2. Suppose thj t for a facility i and a service type l, we a l get that Gl ij = fi : in this case, we tentatively install service l at i. If i is also tentatively open, then we freeze each demand j Gl that is tight with i (and no longer increase j ). If i is not yet tentatively open, then for each demand j Gl that is tight with i, we no longer increase ij , but instead start increasing ij at the same rate as j . j 3. Suppose that for a facility i, ij = fi : in this 1091 j to freeze. Then, i is tentatively open, service g (j ) is tentatively instal led at i, and j = max(cij , ti , tig(j ) ). We start by bounding the cost incurred in opening facilities, and installing services. Let D be the subset Lemma 3.3. If j D , then the assignment cost inof demands {j : i F s.t. ij > 0}. curred for j is at most 3j ; if j D , then the assignLejmma 3.2. The cost of opening facilities is at most ment cost incurred for j is at most 5j . D j . Furthermore, the cost of instal ling services j Proof. We will show that there always exists some open is at most j . facility with service g (j ) installed that is no further from Proof. For each facility i that is tentatively opened, we j than the claimed bound (and hence the closest one, j , have that fi = to which j is assigned, is no further away). D ij . By the construction of F we know that for each demand j , there is at most one Consider j D with g (j ) = l. Let i be the unique i F such that ij > 0. Summing over all facilities, facility in F for which ij is positive. If i Fl , then and using this fact, we see that we have installed service l at i, and cij j - ij . i i j j i Otherwise, i and i are service-l-dependent for some i ij = i= ij j. n F Fl with ti l til . So, by Claim 3.2, cii < F f F j D D iF D 2 max(til , ti l ) = 2til . Since ij > 0, it follows that By the definition of nbr(i), we know that for any service til j - ij . So by the triangle inequality, ci j 3j . l Now consider a demand j D , and again let / type l, fnbr(i) fil . For each i Fl , we install service l g (j ) = l. Let i be the facility that caused j to freeze, l either at i F or at nbr(i), and so we can upper l F , then boi nd the total cost of installing services of type l by and so j = max(cij , ti , til ). If i F u l service l is installed at i, and cij j . Suppose that Fl fi . Since each service l is tentatively installed j i i Fl - F , and let i = nbr(i) (an open facility at l ij , we have that only when fil = l fi = Gl F which service l is installed, see Fig. 1a). By Claim 3.1, i j Gl Fl ij . The notion of service-l -dep endence cii 2ti = ci j 3j . insures that for each demand j Gl , there is at most Next suppose that i Fl . Since i Fl , there must one facility i Fl for which ij is positive. We obtain exist i Fl such that i was not picked in Fl because that the total cost of installing service l is at most i and i are service-l-dependent due to a client k . If i i j s also in F (Fig. 1b), then service l is installed there. Gl j , which immediately implies the lemma. Applying Claim 3.2, we obtain that both ci k and cik are We next bound the assignment cost incurred by the less than k max(ti , til ) j , and hence ci j 3j . solution computed. The following facts, which follow However, if i F (so i F ), then let i = nbr(i ); / directly from the construction of the algorithm, will be service l is installed at i (Fig. 1c). Since i and i serviceuseful in this analysis. l-dependent, we have that ti ti , ci i 2 min(ti , ti ) (Claim 3.1), and ci j 3j as above, which implies that Fact 3.1. Suppose that ik is positive. Then it fol lows ci j 5j . This completes the pro of. that cik k - ik and k ti . Fact 3.2. Suppose that ik is positive. Then cik k - ik and cik < tig(k) . If ik = 0 then k tig(k) . Theorem 3.1. The above algorithm returns a solution j j 6 · OPT . of cost at most 6 Proof. From the dependence of i and i , it follows that ik and i k are positive. Applying Fact 3.2 for both of these, and using the triangle inequality, we get that cii < 2 max(til , ti l), cik < k and ci k < k . For example, we use these in deriving the following Proof. Follows from Lemma 3.2 and Lemma 3.3. bounds. 4 An LP-Rounding Algorithm Claim 3.1. If i and i are dependent facilities in F , We now give an algorithm for the special case where then cii < 2 min(ti , ti ). fil = f l , i.e., the installation cost depends only on Proof. Let k be a client such that ik and i k are the service l and not on the lo cation at which it is positive. Applying Fact 3.1 for both of these, and installed. Note that this case is handled by the primalapplying the triangle inequality, we get that cii < dual algorithm above. Here we adapt the rounding pro cedure of [6] to give deterministic and randomized 2k 2 min(ti , ti ). approximation algorithms achieving ratios of 6 and Claim 3.2. Let i, i Fl be service-l-dependent due to 2.391 respectively. demand k Gl . Then cii < 2 max(til , ti l ) and both cik Let (x, y ) and (, , ) be the optimal solutions and ci k are less than k . to (P) and (D) respectively. We can ensure that for 1092 (a) (b) (c) j i k i j i F ti ti j j k i k k i i k i , k i F -F l i l i = nbr(i) F , / i , i are service-l-dep endent acility i caused j to freeze, i and i are service-l-dep endent, i = nbr(i ) Figure 1: Bounding the assignment cost of j . (a), (b) Different 3-hop cases, and (c) the 5-hop case. l every i, j and l, xij = 0 or xij = yi and yi = 0 or l yi = yi . We will round the fractional solution (x, y ) to an integer solution losing a factor of at most 6. Let Fj = {i : xij > 0}. We describe the algorithm briefly. g (j ) k j and j was removed from Gl in step A1 because a cluster was created around k ; we assign j to the same facility as k . Note that this is a feasible assignment of demands to facilities. Leimma 4.1. The facility opening cost is at most fi yi . Proof. The clusters corresponding to clients in D are disjoint and we open the cheapest facility in each cluster -- the lemma follows. Leimma 4.2. The cost of instal ling services is at most ll ,l fi yi . Proof. The cost of installing a service is independent of the lo cation at which it is installed, and for any service type l, we install service on at most one (new) lo cation per cluster center in Dl . l Soj the total l co st o f insta lling s er vices is a t mo st Dl f = i ll i lj l Dl f Fj yi ,l fi yi . Lemma 4.3. The assignment cost of client j is at most 5j . Proof. If j D , we assign j to a facility i Fj , and j by complementary slackness. If j D - D nd nbr(j ) = k D , we assign j to the facility i opened from Fk . Since k j and cj k 2j , cij 3j . If j D, j is assigned to the same facility as k where / k Dg(j ) D and j was removed from Gg(j ) in step A1 because a cluster was created around k . So k j and cj k 2j . From above, k is assigned to a facility i with cik 3k , so cij 5j . a cij A1. First, for every service type l, we consider the clients in Gl and cluster the facilities on which service l is installed around some cluster centers: pick j Gl with smallest j value and form a cluster around j consisting of the facilities in Fj . We remove every client k Gl (including j ) that is assigned (fractionally) to some facility in the cluster created, and recurse on the remaining set of clients until no client in Gl is left. Let Dl be the set of cluster centers. l A2. Let D = Dl . We cannot open a facility in every cluster since different clusters could share the same fractional facility weight (yi ) if the cluster centers request different services. Say that j, k D are dependent if Fj Fk = . Note that this can only happen if j and k request different services. We consider clients in D in order of increasing j and pick a maximal independent subset D . For each client j D , we open the facility in Fj with smallest fi and install service g (j ) on it. Further for every k D - D , there is some j D with j k such that j and k are dependent. We pick some such j and install service g (k ) on the facility opened from Fj . Call j the neighbor of k and denote it nbr(k ). A3. We assign demand j to a facility as follows: (i) if j D it is assigned to the facility opened from Fj , Thus we have proved the following theorem. (ii) if j D - D , then nbr(j ) = k D and j is assigned to the facility opened from Fk , and (iii) Theorem 4.1. The cost of the solution returned is at if j D, there is some k Dg(j ) D such that most 6 · OPT . / 1093 4.1 Improvement using randomization. We give a rounding procedure that combines clustered randomized rounding [6] and the filtering based technique of [10, 14]. We define some notation first. Let 0 < < 1 1 be a parameter that we will set later and r = . Sort the facilities in Fj by increasing cij . Let i be the first i facility in this ordering such that Fj :cij ci j xij . Let Nj be the subset of Fj consisting of all facilities (including i ) that come before i in this ordering. Define i ¯ cij xij . To simplify ithings Cj ( ) = ci j and Cj = we assume that each yi and for any j , N j yi is exactly . If some yi > , then we can create at most 1/ copies of i and set yi for each of the c copies so that opies i yi = yi (setting the variables i l xi j , yi accordingly). Similarly if Nj yi > , we can i take the facility i n Nj that comes last in the ordering in Fj and split it into two copies i1 and i2 setting i 1 2 yi2 = Nj yi - , yi = yi - yi (and the other variables accordinigly). We include only i1 in Nj thus ensuring that Nj yi = . The cost of the fractional solution remains unchanged by these transformations and any solution to the mo dified instance gives a solution to the original instance of no greater cost. R1. R7. We assign demand j to the nearest open facility at which service g (j ) is installed. Lemmas 4.1 and 4.2 are mo dified to the following. Lemma 4.4. The expected cost of opening facilities is i fi yi . The expected cost of instal ling services is at r· i ll r most + e1 r ,l fi yi . Proof. Each facility i is opened with probability r · yi . The ciost of installing services in stepl R5 is bounded i Pr[i is opened (in R3 or R4)] :yl >0 fil = r· by i i ll l l l l yi yi = yi . l : y i >0 f i = r · , l f i y i since y i > 0 = Consider client j D - D with g (j ) = l. For every non-central facility i Fj , let Ei be the event that i is opened in step R4 and pi = Pr[Ei ] = r · yi . For every cluster center k D such that Sk = Fj Nk = , let Ek be the event that a facility from Sk is open after i yi step R3. Let pk = Pr[Ek ] = iSk yi = r · Sk y i . È È R2. R3. R4. Let m be the total number of events. All the events Ei are independent. The probability that service l is installed in step R6 due to client j , is the probability that no facility from mj is open after steps R3 and R4, F which is at most n=1 (1 - pn ) e- n pn = e-r . e This is the same as step A1 except that we cho ose So thje cost of installing si rvices in stepi R6 is at most 1 g (j ) ll l ¯ e1 f j Gl with smallest 2j + Cj ( ) + Cj as the cluster er r D -D , l f i y i since Fj yi = 1 and center. This gives us a set of cluster centers Dl for any two clients in Dl have disjoint Fj . each service type l. To bound the assignment cost, we bound the asl We prune the set D = Dl as in step A2 but signment cost incurred under a provably worse way of mo dify the notion of dependency to say that j, k assigning demands to facilities. Demand j is assigned to D are dependent if Nj Nk = , and consider the a facility as follows. If some facility i Fj is open, we ¯ clients in D in increasing order of Cj ( ) + Cj . For assign demand j to the nearest such facility. Otherwise w k D - D e define nbr(k ) as before. We call the if j D - D , j is assigned to its backup facility. If facilities in Nj for clients j D central facilities, j D, there is some client k Dg(j ) D such that j / and the rest as non-central facilities. was removed from Gg(j ) because a cluster was formed around k in step R1. We assign j to the same facility as For every client j D we randomly open exactly k ; so j may be assigned either to a facility in Fk or to its one facility in Nj by cho osing facility i with probai backup facility in Nnbr(k) , if k D and no facility from / bility yi / Nj yi = r · yi . This facility now serves Fk is open. Note that service g (j ) is installed on the as a backup facility for all the clients that would get facility to which j is assigned. We need the following assigned to this facility in step A3 of the determinlemma from [6] (see also [15]). istic algorithm. Independent of step R3, each non-central facility i Lemma 4.5. Let d1 d2 . . . dm and 0 pn 1 for n = 1, . . . , m. Then, is opened independently with probability r · y . iNk È i R5. For any facility i, be it a central or a non-central facility, if i is opened (in R3 or R4), we install on it all services that are installed on it in the fractional l solution, i.e., all l such that yi > 0. p1 d1 + (1 - p1 )p2 d2 + · · · + (1 - p1 ) · · · (1 - pm-1 )pm dm n 1n . m pn dn n - (1 - pn ) m p n m R6. For every client j D - D , if no facility from Fj is Lemma 4.6. Let j be any demand. Let X be the open, we install service g (j ) on the facility opened distance between j and the facility assigned to it and Z be the event that no facility i Fj is open. Then, in R3 from Nnbr(j ) . 1094 X ¯ (i)X f j D , E |Z I / 3j + Cj ( ) + Cj , (ii) ¯ E Cj + e1 (3j + Cj ( )). r X i i Proof. If j D , E = Nj cij xij / Nj xij ¯j since every facility in Fj - Nj is farther from j than C every facility in Nj . For j D , we show (i) and use it / to prove (ii). Suppose j D - D , k = nbr(j ) and A = Nj Nk = . For any facility i A we have cij j and cik Ck ( ). Let B be the distance between j and its backup facility in Nk . Event Z implies that j is assigned to the backup facility in Nk so conditioned on ¯ Z , X = B . If there is some i A such that cik Ck we ¯k + Ck ( ). have a deterministic bound of B j + C If there is no such i in A, since the unconditional distance between k and the backup facility in Nk is at ¯ most Ck , by conditioning on Z we are only removing weight from facilities that are farther than the average distance. So the conditional expected distance between ¯ k and th= backup facility is at most Ck implying that e B X ¯ E X|Z E |Z j + Ck ( ) + Ck . In either case ¯ E |Z j + Cj ( ) + Cj , where the last inequality follows since we lo ok at clients in D in increasing order ¯ of Cj ( ) + Cj and k was picked before j . If j D, there must be a client k Dg(j ) such that / j was removed from Gg(j ) because a cluster was formed around k in step R1. So j, k are assigned to the same ¯ ¯ facility, 2k + Ck ( ) + Ck 2j + Cj ( ) + Cj and j + k . If a facility in Fk is op en then we have cj k a deterministic bound of X j + 2k . Otherwise j, k are assigned to the Xackup facility for k and by b f the above bound on E |Z or k , the conditional expected distance from j to the backup facility is at ¯ ¯ most j + 2k + Ck ( ) + Ck 3j + Cj ( ) + Cj . We now prove part (ii). For a non-central facility i Fj , let pi and Ei be as defined in Lemma 4.4 and let di = cij . For every k D such that Sk = Fj Nk = , 4 letdpk , Ek be as defined in Lemma i .4 and definie dk = = E istance from j to Sk |Ek Sk cij yi / Sk y i . Let the distances be ordered so that d1 d2 . . . dm where there are m events in all. Since them vents Ei are e independent and yi = xij , p = Pr[Z ] = n=1 (1 - pn ) e- n pn = e-r . So, solution of expected cost at most, r max + 1 3 4 ,1 + +r r r e (1 - )e e · OPT where r = 1/ . Taking = 0.67674 we get a solution of cost at most 2.391 · OPT . Proof. From the definition of Cj ( ) and the Markov ¯ C property we have, Cj ( ) 1-j . The pro of now follows from Lemmas 4.4 and 4.6. 5 The k-Median Variant We consider a variant of this problem where we have the additional constraint that at most ki facilities may yi k to the be opened. This adds the constraint linear program (P). The ob jective function of the dual j j - k z and constraint (D) gets mo dified to max j (2.2) changes to ij fi + z . Let (KP) and (KD) be the mo dified primal and dual programs and OPTK be the value of an optimal k -median solution. 5.1 A mo dified primal-dual algorithm. We first mo dify the primal-dual algorithm of Section 3 to obtain the stronger guarantee that we return a solution to (P) of cost (O, I , C ), and a jsolution (, , ) to (D) such j 6 · OPT , where O, I , C that 6O + I + C 6 denote respectively the facility opening cost, the service installation cost and the client assignment cost. We describe the algorithm briefly and sketch the analysis. The dual ascent pro cess is the same as in Section 3 but we mo dify the way in which we open facilities and install services to ensure that a demand j do es not pay for both opening a facility and for installing a service at some other facility. To do this we consider a more detailed notion of dependence between facilities. We classify 4 types of dependence between facilities. Say that the ordered pair (i, i ) is, (1) ff-dependent (f for facility) if there is a demand j such that ij , i j > 0. (2) sf-l dependent (s for service) if there exists j Gl such that ij , i j > 0. (3) ss-l dependent if for some j Gl , both ij , i j > 0. (4) fs-l dependent if for some j Gl , both ij , i j > 0. Recall that F is the set of tentatively open facilities, Fl F the set of tentatively open facilities on which service l is tentatively installed, ti is the time at which facility i F became tentatively open, and for i Fl , til is the time at which service l was tentatively installed at i. Initially for each facility i F , let Si be the set of services that are tentatively installed at i. È E X p1 d1 + (1 - p1 )p2 d2 + · · · X + (1 - p1 ) · · · (1 - pm-1 )pm dm + p · E |Z n m pn dn ¯ n (1 - p) + p(3j + Cj ( ) + Cj ) m p n 1 ¯ Cj + r (3j + Cj ( )). e Theorem 4.2. The randomized algorithm produces a 1095 M1. We first pick a set F F of facilities to open, Lemma 5.2. If j D , the assignment cost of j is at / and for each i F a set Ti Si of services to most 3(j - o(j )j ). If j D the assignment cost install at facility i. Initially F = and Ti = for incurred for j is at most 5j . all i. We consider facilities in F in the order given Combining the above two lemmas and the fact that by O. While F = , for j D , j = co(j )j + o(j )j + o(j )j , we get the 1. Let i F be the currently considered facility. following theorem. Remove i from F , set F F {i}, Ti = Si . Theorem 5.1. The solutionj returned has cost (O, I , C ) 2. For each i F we do the following. j 6 · OPT . such that 6O + I + C 6 ) a) If (i, i is ff-dependent or l Ti s.t. ) (i, i is sf-l dependent or l Si s.t. (i, i ) is fs-l dependent and ti l < ti , set 5.2 An 18-approximation algorithm. Suppose we fix z and run the above primal-dual algorithm with the a F F - {i }. Call i the neighbor of i facility costs mo dified to fi + z . Suppose the algorithm ) nd denote it by nbr(i . Otherwise, returns a primal solution of cost (O, I , C ) that opens ) b) For every l Si , if (i, i is fs-l dependent k facilities and a dual solution (, , ). Here O is the ) (so ti l ti ) or l Ti and (i, i is ss-l facility opening cost with the original costs fi . Then, dependent, set Si Si - {l}. we can show that we have a solution of cost at most i We open the facilities in F and for each i F 6 · OPTK . since (, , , z ) is a feasible solutionjto (KD) and by Theorem 5.1, 6(O + k z ) + I + C + 6 j = nstall all of the services in Ti at i. j j - k z ) 6 · OPTK . So our goal is to 6O + I + C 6( M2. We now install services at some more facilities. find such a z in polynomial time. We do not quite know Consider service type l. Let Al be the facilities in how to do this, instead we find two values z1 > z2 , close F at which service l is installed (i.e., l Ti ). Note together such that the algorithm opens k1 < k facilities that Al F Fl . Let Bl = Fl - F . We remove for z1 and k2 > k facilities for z2 . When z is very from Bl every facility i for which there is some large, e.g., |D|(maxij cij + maxil fil ), the algorithm will facility i F such that (1) (i, i ) is fs-l dependent, open just one facility and at z = 0 the algorithm opens or (2) i Al and (i, i ) is ss-l dependent. We say > k (we assume this -- otherwise the solution at z = 0 that a set of facilities is ss-l independent if no pair costs at most 6 · OPTK since (, , , 0) is a feasible dual (i, i ) of facilities from the set is ss-l dependent. We solution). We perform a bisection search in this range to pick a maximal ss-l independent subset Fl Bl . find z1 and z2 . We combine these two solutions to first Initially Fl = . We consider facilities in Bl in get a fractional solution of cost 6(1 - )-1 · OPTK , for increasing order of ti and add facility i to Fl if some small , that (fractionally) opens k facilities and in Fl {i} remains ss-l independent. For every i Fl which each demand is assigned to at most two facilities. we install service l on facility nbr(i) F . If the service cost depends only on the service type, we can round this solution losing a factor of 3(1 - ) to M3. Each client j is assigned to the nearest open facility get an integer solution that opens exactly k facilities, at which service g (j ) is installed. thus getting an overall approximation ratio of 18. This idea was used in [9] for the k -median variant of UFL. Analysis Sketch. Consider the set of demands D = However our final rounding pro cedure differs from the {j : i F s.t. ij > 0}. By the construction of F , we one in [9] since we also have to pay for installing services s know that for each demand j there is at most one i F and need to ensure that demand j is connected to an uch that ij > 0; for j D let o(j ) denote this unique open facility at which service g (j ) is installed. facility in F . By design we ensure that any demand j Gl contributes ij > 0 for at most one facility Theorem 5.2. If the service instal lation cost depends i Al Fl and further that if j D then i = o(j ) only on the service and not on the location, we can get is the only such facility. This gives the following. an 18-approximation algorithm for the k -median variant Lejmma 5.1. The cost of opening facilities is The cost of instal ling services is at D o ( j ) j . j j most o (j )j + D D j . / of facility location with service instal lation costs. 6 Extensions Arbitrary Demands. Our results carry over to the The bound on the assignment cost incurred is case where instead of unit demands, client j may have a similar to the bound in Lemma 4.3 and is proved demand dj 0. We can reduce this to the unit demand case by making dj copies of client j , but this makes similarly. 1096 the algorithm run in pseudo-polynomial time. We can simulate this reduction however. In the primal-dual algorithm we raise j at rate dj and say that j has reached i if j dj cij . In the LP rounding algorithms of Section 4 the only change is that j gets replaced with j /dj in steps A1, A2 and R1. The analysis in Sections 3, 4 and 5 extends in a straightforward way and we get the same approximation guarantees. Acknowledgments. We thank Robin Roundy for stimulating discussions and helpful suggestions. References [1] N. Aksoy and S. Erenguc. Multi-item inventory models with coordinated replenishments: a survey. International Journal of Operations & Production Management, 8:63­73, 1988. [2] A. Archer, R. Ra jagopalan and D. Shmoys. Lagrangian relaxation for the k-median problem: new insights and continuity prop erties. In Proceedings of 11th ESA, pages 31­42, 2003 [3] I. Baev and R. Ra jaraman. Approximation algorithms for data placement in arbitrary networks. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 661­670, 2001. [4] Y. Bartal, M. Charikar and D. Raz. Approximating min-sum k-clustering in metric spaces. In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pages 11­20, 2001. [5] F. A. Chudak. Improved approximation algorithms for uncapacitated facility location. In Proceedings of 5th IPCO, pages 180­194, 1998. [6] F. Chudak and D. Shmoys. Improved approximation algorithms for the uncapacitated facility location problem. SIAM Journal on Computing. To app ear. [7] U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45:634­652, 1998. [8] K. Jain, M. Mahdian, E. Markakis, A. Sab eri, and V. Vazirani. Greedy facility location algorithms analyzed using dual-fitting with factor-revealing LP. Journal of the ACM. To app ear. [9] K. Jain and V.V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of the ACM, 48:274­296, 2001. [10] J. H. Lin and J. S. Vitter. -approximations with minimum packing constraint violation. In Proceedings of the 24th Annual ACM Symposium on Theory of Computing, pages 771­782, 1992. [11] M. Mahdian, Y. Ye, and J. Zhang. Improved approximation algorithms for metric facility location. In Proceedings of 5th APPROX, pages 229­242, 2002. [12] P. Mirchandani and R. Francis, eds. Discrete Location Theory. John Wiley and Sons, Inc., New York, 1990. [13] D. B. Shmoys. Approximation algorithms for facility location problems. In Proceedings of 3rd APPROX, pages 27­33, 2000. ´ [14] D. B. Shmoys, E. Tardos, and K. I. Aardal. Approximation algorithms for facility location problems. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 265­274, 1997. [15] M. Sviridenko. An improved approximation algorithm for the metric uncapacitated facility location problem. In Proceedings of 9th IPCO, pages 240­257, 2002. [16] C. Swamy. A note on the data placement problem. Unpublished manuscript, 2003. 1097