Approximate classification via earthmover metrics
Aaron Archer
A

Jittat Fakcharoenphol Kunal Talwarś

Chris Harrelson ´ Eva Tardos

Robert Krauthgamer§

bstract Given a metric space (X, d), a natural distance measure on probability distributions over X is the earthmover metric. We use randomized rounding of earthmover metrics to devise new approximation algorithms for two well-known classification problems, namely, metric labeling and 0-extension. Our first result is for the 0-extension problem. We show that if the terminal metric is decomposable with parameter  (e.g., planar metrics are decomposable with  = O(1)), then the earthmover based linear program (for 0-extension) can be rounded to within an O() factor. Our second result is an O(log n)-approximation for metric labeling, using probabilistic tree embeddings in a way very different from the O(log k )-approximation of Kleinberg and Tardos. (Here, n is the number of nodes, and k is the number of labels.) The key element is rounding the earthmover based linear program (for metric labeling) without increasing the solution's cost, when the input graph is a tree. This rounding method also provides an alternate proof to a result stated in Chekuri et al., that the earthmover based linear program is integral when the input graph is a tree. Our simple and constructive rounding techniques contribute to the understanding of earthmover metrics and may be of independent interest.
 Op erations Research Department, Cornell University, Ithaca, NY 14853. Email: aarcher@cs.cornell.edu. Supp orted by the Fannie and John Hertz Foundation and NSF grant CCR-0113371.  Kasetsart University, Bangkok, Thailand. Work partially done while the author was a graduate student at UC Berkeley. Email: jtf@ku.ac.th  UC Berkeley. Supp orted in part by a GAANN fellowship and NSF grant CCR-0105533. Email: chrishtr@cs.berkeley.edu § IBM Almaden Research Center, 650 Harry Road, San Jose CA 95120, USA. Part of this work was done while this author was with the International Computer Science Institute and with the Computer Science Division of U.C. Berkeley. Email: robi@almaden.ibm.com ś UC Berkeley. Work partially done while the author was interning at Microsoft Research. Email: kunal@cs.berkeley.edu Computer Science Department, Cornell University, Ithaca, NY 14853. Research supp orted in part by NSF grant CCR0113371. Email: eva@cs.cornell.edu

1 Intro duction The metric labeling problem has been proposed by Kleinberg and Tardos [KT02] to model classification problems from several domains, ranging from categorizing web documents to machine vision, including image recovery and the stereo matching problem. Metric labeling models situations in which we wish to label nodes in a graph given some prior information about their true labels, and also information about whether each pair of nodes should have similar labels. See [KT02] for a motivation from the perspective of Markov random fields. Formally, the input is an n-node graph G = (V , E ) with nonnegative edge weights w(ˇ, ˇ), a set of labels L = [k ],1 assignment costs c(v , i) for all v  V , i  L, and a metric d(ˇ, ˇ) on the set of labels. The goal is to find a labeling f : V  L that minimizes the total sum of two kinds of costs. For each node v  V we incur an assignment cost c(v , f (v )), which penalizes us for assigning v a label f (v ), according to the prior (un)likelihood of v having that label. For each edge uv  E we also incur a separation cost w(u, v ) ˇ d(f (u), f (v )), which penalizes us for assigning dissimilar labels to adjacent nodes. The 0-extension problem is a special case of metric labeling, in which k of the nodes are designated as terminals, whose labels are fixed in advance, and all assignment costs are zero. When, in addition, the metric on the labels is uniform, the problem further reduces to the multiway cut problem, which is NP-hard (and actually APX-hard [DJP+ 94]) even for just three terminals. Thus, 0-extension and metric labeling are also NP-hard. Earthmover metric. In this paper, we consider a linear program (LP) for these problems based on a family of metrics, called earthmover metrics. Given a metric (X, d), the earthmover metric with respect to (X, d) is a natural extension of d to probability distributions over X . Consider mounds of earth sitting on points of X , and suppose that moving an amount of earth from point i to point j takes ˇ d(i, j ) work.
1 Throughout,

[k] denotes the set {1, . . . , k}.

Copyright Š 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688

1079


If x and y are probability distributions over X , define dEM (x, y ) to be the minimum amount of work necessary to move the mounds from the x configuration to the y configuration. There are several important properties of the earthmover metric. First, if d is a metric, then so is dEM (as the reader has already guessed from the name). Second, if xi denotes the probability distribution (over X ) that has all its mass concentrated on point i  X (i.e., xi = 1 and xl = 0 for all l = i), then dEM (xi , xj ) = d(i, j ) for all i, j  X . Hence, dEM is indeed an extension of d. The distance dEM (x, y ) is given by solving a transportation problem. That is, it is the optimal value of the following linear program (LP), where each "flow" variable zij represents the amount of earth to be moved from label i to label j when going from configuration x to configuration y. Minimize i d(i, j )zij

Adding integrality constraints on the variables yields an exact formulation. For the 0-extension problem, we identify the labels with the k terminals, fixing xii = 1 for all i  L. All assignment costs are zero, so the first term in the ob jective function is zero. Hence this LP relaxation also makes sense for 0-extension. 1.1 Preliminaries and related work

Definition 1.2. (Decomposable metric) A metric space (X, d) is cal led -decomposable if for every  > 0, there is a randomized algorithm that partitions X into clusters (subsets) {Xl }l such that2 1. Each Xl has diameter at most  . 2. For every x, y  X , the probability that x and y fal l in different clusters is at most  d(x,y) . 

We say that a graph (or family of graphs) is -decomposable if the shortest path metric on the ,j X graph is -decomposable. For example, it is easy to j sub ject to zij = xi i  X see that a line metric is 1-decomposable. It is wellknown (see [AP90, LR99, LS93]) that any graph on n i zij = yj j  X vertices is O(log n)-decomposable, and that for some graphs (expanders) this bound is the best possible. zij  0 i, j Klein, Plotkin and Rao [KPR93] showed that planar graphs are O(1)-decomposable, and that more generally, Note that in the special case where the metric d is graphs excluding a fixed minor of size r are O(r3 )uniform, the earthmover distance dEM is proportional to decomposable. Fakcharoenphol and Talwar [FT03] the 1 distance between the two distributions. Indeed, improved the latter bound to O(r2 ). Charikar et 1 1 one of our original motivations for this work was to al. [CCG+ 98] showed that p -metrics are dmax{ p ,1- p } d better understand the relationship between 1 metrics decomposable for any p  1, and that this bound is and earthmover metrics. tight. Earthmover based relaxation. The classification The 0-extension problem was posed by Karzanov problems mentioned above require us to map every [Kar98], who gave an LP relaxation that we will call node of G to a label in L. A natural relaxation is the semi-metric LP relaxation. He showed that this to allow the nodes to be mapped into the earthmover linear program is integral when the terminal metric metric that corresponds to the label metric (L, d). This is a tree (actually, a larger class of bipartite graphs relaxation can be expressed by an LP. We write this LP that contains trees; here, a tree metric cannot contain in terms of dEM , since this is how we shall think about Steiner nodes). Calinescu, Karloff and Rabani [CKR01] it. Namely, the configuration assigned to node v  V , gave an O(log k ) approximation for 0-extension based denoted by xv , represents the vector {xvi }iL (whose on this LP relaxation. For the special case of input components are the actual variables in the LP). This graphs that are planar, they gave an O(1) approximafurther suppresses the |L|2 flow variables necessary for tion algorithm. In fact, their algorithm extends to representing dEM on each edge of G. decomposable input graphs, giving an O() approximaDefinition 1.1. (Earthmover LP) The earthmover tion. Fakcharoenphol et al. [FHRT03] improved the general result to O(log k / log log k ). On the other had, the n linear program for metric labeling is as fol lows. semi-metric relaxation is known to have an ( log k ) v u integrality gap [CKR01]. Finally, we note that Lee and Min. c(v , i)x + w d (x , x )
vi uv EM u v

s.t.

i V ,iL
L

v E

xvi = 1

v  V i  L, v  V

xvi  0

2 A stronger definition of decomp osability that is sometimes used (padded decomp osition) requires that the distance from a vertex to its cluster's b oundary is well distributed. Our results do not require this condition.

1080


Naor [LN03] have recently and independently obtained a 0-extension result similar to ours, but using the semimetric relaxation. The semi-metric LP relaxation allows nodes to be mapped into an arbitrary metric containing the terminal metric.3 Since the earthmover metric is a specific metric of this type (containing the terminal metric), the earthmover LP relaxation has an integrality gap no larger than that of the semi-metric relaxation. (Furthermore, all the above approximation algorithms still work.) However, it is still unknown whether the (worst-case) integrality gaps of the two relaxations are different. Interestingly, it is not known if the integrality gap of the earthmover LP relaxation is more than a constant, even for metric labeling. The metric labeling problem has been studied for some time in the computer vision community. A polynomial-time algorithm for line metrics and a constant-factor approximation for uniform metrics were given by Boykov et al. [BVZ98, BVZ01]. Gupta and Tardos [GT00] gave a constant factor approximation on truncated line metrics. Kleinberg and Tardos [KT02] gave a 2-approximation algorithm for the case of a uniform metric on the labels, and combined this with Bartal's probabilistic tree embeddings [Bar98] to obtain an O(log k log log k )-approximation for general metrics. Fakcharoenphol et al. [FRT03] recently improved Bartal's result, leading to an improved bound of O(log k ). The earthmover LP was proposed independently by Charikar [Cha00] and by Chekuri et al. [CKNZ01]. The first successful use of this LP was in [CKNZ01], where it was used to give matching or improved algorithms for certain classes of distance functions d (convex and truncated linear), which had been previously addressed using flow techniques and local search [BVZ98, BVZ01, GT00, IG98]. It is worth noting that this is the first LP relaxation that directly models the general metric labeling problem. Previously, LP relaxations were only known for various simple classes of metrics. Interestingly, the history of the multiway cut problem parallels that of 0-extension in some ways. Dahlhaus et al. [DJP+ 94] introduced the problem, showed it is APX-hard, and gave a 2(1 - 1/k ) approximation using a combinatorial isolation heuristic. It is known that a simple randomized algorithm based on the semi-metric LP relaxation matches this factor, and that the integrality gap for this LP is also 2(1 - 1/k ) [Vaz01, p.155]. Calinescu et al. [CKR00] strengthened the LP
3 Since several no des may b e mapp ed onto the same p oint in the containing metric, the induced distances b etween mapp ed no des form only a semi-metric; hence the name. Our relaxation has the same prop erty.

by embedding the nodes of V in 1, using it to obtain a 1 ( 3 - k )-approximation. Karger et al. [KKS+ 99] further 2 improved these results to a bound approaching 1.38 as k  . The LP used for these algorithms is exactly the same as the earthmover LP when the metric on labels is uniform. It is plausible that the earthmover LP will be similarly used to improve the approximation bound for the general case of the 0-extension and metric labeling problems. We believe that our results make progress in this direction. 1.2 Our contributions. We devise two approximation algorithms that are based on rounding the earthmover LP. Our techniques improve the understanding of earthmover metrics and may be of independent interest. For 0-extension, we show in Section 2 that if the terminal metric is -decomposable, then there is a simple rounding algorithm for the earthmover LP which approximates the LP solution by an integral solution within a factor of O(). Previously, this approximation ratio was known to hold only under the more stringent requirement that the input graph is -decomposable. Technically, we give a randomized rounding procedure such that the expected separation cost of every two vertices of G is no more than O() times larger than their earthmover distance (which is simply their contribution to the LP ob jective). One case in which this is a significant improvement on the previous known approximations is the following. Suppose that there is a very large number of terminals, but they all lie in a low-dimensional normed space. In such cases, we know that  depends only on the dimen sion (e.g.  = d for Euclidean space). Hence we can use our rounding algorithm to give an approximation which depends only on the dimension, and does not depend on the number of terminals (or the size of G) at all. For metric labeling, we give in Section 3 an algorithm that achieves an O(log n)-approximation. In many cases, we would expect n, the number of ob jects to be labeled, to far exceed k , the number of labels, and then our approximation guarantee is worse than the O(log k ) currently known. However, there are applications where k n. For instance, in recognizing the position of a human body from an image (see [FH00]), the ob jects are the handful of rigid moving parts, and each part needs to be labeled with a six-tuple representing position, rotation and scale.4 Our new algorithm for metric labeling uses the earthmover LP as follows. The LP solution defines
4 The mo del used in [FH00] uses a different distance function for each edge, but is otherwise identical to our mo del.

1081


an earthmover metric on V , which we probabilistically approximate with a tree, and then round. The crux is that our rounding method incurs no loss on the tree. Our use of tree approximations contrasts with that of [KT02], which instead approximates the label metric by a tree, then uses a specialized LP for trees. Our rounding method also proves that the earthmover LP has no integrality gap when the input graph is a tree. This phenomenon was previously mentioned by Chekuri et al. [CKNZ01] (a proof appears in their upcoming journal version), but our proof is very different. Interestingly, this implies, for 0-extension on tree graphs G, a distinction between the integrality ratio of the earthmover LP and the semi-metric relaxations. The former is integral as a special case of the result above, while the latter has integrality ratio at least 2 - o(1) (e.g., if G is a star whose leaves are the terminals). 2 0-extension algorithm for decomp osable terminal metrics In this section we show that if the metric d on labels (terminals) is -decomposable, then a solution to the 0-extension earthmover LP relaxation can be rounded, via a randomized algorithm, to an integral solution whose expected cost is no more than O() times the LP cost. Since, as remarked above, any k -point metric is O(log k )-decomposable, we also get an O(log k ) approximation to the 0-extension problem in general.5 For better decomposable metrics, of course, the approximation ratio is better. Let T = [k ] be the set of terminals and d be a metric on T . For each vertex u, let xu = xu1 , xu2 , . . . , xuk be the distribution over terminals that is associated with u in the solution to the earthmover LP, and let dEM be the induced earthmover metric. For a vertex u  V , let Au = miniT {dEM (xu , i)} be the distance from u to its closest terminal. At a high level, the rounding algorithm consists of the following steps, each bringing us closer to an integral solution. 1. Break up V into groups Vs = {u : Au  2s }, containing vertices u with approximately the same value of Au . We will round each group separately. 2. Truncate the distribution xu so that it is concentrated only on terminals "close" to u.

4. Project the distribution xu onto representatives, so now all its mass is concentrated on representative terminals close to u. 5. Round each vertex in Vs to one of the representative terminals using the rounding algorithm of Kleinberg and Tardos [KT02] for metric labeling on uniform metrics. Intuitively, this last step works since to the vertices in Vs , the metric on the representatives looks approximately uniform. 2.1 Formal rounding algorithm. Assume without loss of generality that the smallest distance between terminals is 1, and denote the largest distance by . Let Decomp(T ,  ) be the decomposition algorithm implied by Definition 1.2. 1. r0  0. 2. For s = 1, 2, . . . , log2  : 2.1 Pick rs uniformly at random in [2s-1 , 2s ]. 2.2 (Grouping ) Let Vs = {u  V : rs-1 < Au  rs }. 2.3 Let  = 2s . 2.4 (Truncation ) Pick uniformly at random   [4, 7 ]. For each u  v , zero out xui whenever dEM (xu , i) >  and rescale the remaining xui 's so that their sum is 1. Let xu be the resulting vector. 2.5 Let {Tl }l be Decomp(T ,  ). the clusters output by

2.6 Pick an arbitrary terminal il in each cluster Tl , and make it the cluster's representative. 2.7 (Projection ) For u  Vs , pro ject the distributions xu from the clusters onto the representatives: i u i = il Tl x i xui  0 otherwise 2.8 (Rounding ) Apply on (Vs , x ) the rounding algorithm of [KT02] for metric labeling on the uniform metric.

2.2 Analysis. Let f be the final assignment output 3. Decompose the terminal metric into clusters of by our algorithm, i.e. vertex u is assigned to termidiameter 2s and choose a representative terminal nal f (u). For an edge uv , we will sometimes refer to dEM (u, v ) = dEM (xu , xv ) as the cost of uv ; the associfrom each cluster. ated distribution vectors will be implicit from context. 5 We can in fact do slightly b etter; see the discussion at end of Also, since for terminals i, j , dEM (i, j ) = d(i, j ), we will this Section. when convenient use the latter notation.

1082


Our analysis is edge by edge. We show that for every edge uv , the expected value of d(f (u), f (v )) is no more than O() times dEM (xu , xv ). Notice that the initial step of grouping breaks up the problem into several subproblems which we solve separately. Consequently, every edge can be either an intragroup edge, if its endpoints are in the same group Vs , or an intergroup edge that goes across different groups. Looking at the intragroup edges first, we show that each subsequent step of the algorithm does not increase the expected cost of any intragroup edge by too much. Intergroup edges are easier, and will be handled at the end. Truncation. In the truncation step, we pick  uniformly at random in [4, 7 ] and truncate each xui to be concentrated only on terminals within distance  from u. We then rescale the xui 's to restore their sum to 1. We first show that we never need to rescale by too much. Lemma 2.1. For any vertex u and any terminal i, 3 xui  2 xui . Proof. Let i be the terminal closest to u and consider the iteration in which u  Vs . The grouping phase ensures that dEM (u,ii )  rs   .  By definition, dEM (u, i ) = =i xui dEM (i , i)  i i 1 :dEM (i ,i)>3 3 xui . Thus :dEM (i ,i)>3 xui  3 and i so :dEM (i ,i)3 xui  2/3. Moreover, by the triangle inequality, dEM (u, i)  dEM (u, i ) + dEM (i , i). Thus at least two-thirds of the probability mass in xu is concentrated on terminals within distance 4 from u. Since   4 , the truncation removes no more than a third of the probability mass, and the scaling therefore requires multiplication by a factor no larger than 3/2. W e now show that the transformation does not increase the cost of an edge uv by much.6 Lemma 2.2. For any intragroup edge uv , E[dEM (xu , xv )]  16 ˇ dEM (xu , xv ). i Proof. Recall that dEM (xu , xv ) = j zij d(i, j ), where the zij flow variables transform the distribution xu to xv . We will show how to modify the zij 's slightly so that they become a valid flow to transform xu to xv , and such that the expected cost of this new flow is not too large. Clearly the earthmover distance dE M (xu , xv ), being the cost of the best flow, is no larger.
6 We make no attempts to optimize the constants here or anywhere else in the paper.

To define the new flow zij , we need to set up some terminology. Let Tu and Tv be the sets of terminals within diistance  from u and from v , respectively. Let pu = Tu xui and qu = 1 - pu , and define pv and qv similarly. Without loss of generality, assume pu  pv . The original flow variables z naturally fall into four categories : fpp , fpq , fqp and fqq , where, for example, the flow fpq goes from i  Tu to j  Tv . Notice that the new flow zij transforms xu to xv , and is thus nonzero only on flow going from Tu to Tv . To create the new flow, first scale up the first kind of flow fpp by a factor of 1/pu . Set the flows in fqp and fqq to zero. Finally, scale up the flows in fpq by 1/pu and re-route them greedily to terminals in Tv . It is easy to verify that the total flow leaving Tu is 1, the total flow leaving V \ Tu is 0, the total flow entering V \ Tv is 0, and by conservation of flow, the total flow entering Tv is 1. Furthermore, the total flow leaving every i  Tu is at most xui , and, since pu  pv , the last step can be done so that the total flow entering every i  Tv is at most xvi . Thus we get a feasible flow. Notice that by Lemma 2.1, pu , pv  2/3, and hence the scaling only increases any cost by a small factor. The flows in fpq are rerouted to terminals further away. Thus these flows contribute to an increase in cost, and this cost needs to be bounded. We first show that there isn't too much such flow. Consider any flow variable zij . For this flow to be in fpq or fqp (note that we now cannot assume that pu  pv ), it must be the case that exactly one of the two events dEM (u, i)   and dEM (v , j )   happen. In other words,  must fall between dEM (u, i) and dEM (v , j ). The probability of that happening is clearly at most |dEM (v , j ) - dEM (u, i)|/3 . Further, by the triangle inequality, |dEM (v , j ) - dEM (u, i)|  dEM (u, v ) + dEM (i, j ). Thus flow zij falls in the bad categories with probability at most min{1, (dEM (u, v ) + dEM (i, j ))/3 }. On the other hand, if a flow in fpq is rerouted, it now goes from i  Tu to some j  Tv (or is distributed amongst more than one such j ). The distance traveled by the flow is dEM (i, j )  dEM (i, u) + dEM (u, v ) + dEM (v , j )  dEM (u, v ) + 14 . Thus the expected cost of the rerouted flow (ignoring the scaling) is i zij ˇ Pr[zij is rerouted] ˇ (cost of rerouting)
,j


i

zij ˇ min{1, (dEM (u, v ) + dEM (i, j ))/3 }
,j

ˇ(dEM (u, v ) + 14 ) i 14 i ˇ zij (dEM (i, j ) + dEM (u, v ))  zij ˇ dEM (u, v ) + 3 ,j ,j

1083


Kleinb erg-Tardos rounding. We now go over the rounding algorithm given by Kleinberg and Tardos 29 [KT02] for rounding the 1 linear program they used = dEM (u, v ), 3 for metric labeling when the terminal metric is uniform. where in the second to last step, we use the fact Given a set of terminals 1, . . . , k , a set of vertices V and i distribution vectors xu for every u  V , their algorithm that ,j zij dEM (i, j ) is exactly dEM (xu , xv ). Since the scaling is by a factor no larger than 3/2 assigns each vertex u  V to some terminal. The algorithm is as follows: C d (1 + 29 ) ˇ 3 = 16, the lemma follows. an  dEM (u, v ) +
3 2

14 14 dEM (u, v ) + dEM (u, v ) 3 3

1. While there is an unassigned vertex do lustering and pro jection. Intuitively, we would like 1.1 Pick a terminal i uniformly at random from the clustering and pro jection steps to convert the metric T. on the host space T to an approximately uniform metric. 1.2 Pick a value c uniformly at random from [0, 1]. On this metric, the earthmover distance is almost the same as the 1 metric, and we can the proceed using the 1.3 Assign to i all u  V such that xui  c. Kleinberg-Tardos rounding. It is not difficult to show that the algorithm terWe now show that clustering of terminals and the pro jection does just this. More precisely, we relate the minates in expected polynomial time. Kleinberg and 1 distance between xu and xv with the earthmover Tardos show that the algorithm has the following nice properties. We repeat the proof here for completeness. distance dEM (xu , xv ). Lemma 2.3. For any u, v  Vs , E[||xu - xv ||1 ]   ˇ dEM (xu , xv ) 2s Lemma 2.4. The algorithm KT has the fol lowing properties: ˇ For any vertex u  V , the probability that u gets assigned to i is exactly xui .

Proof. Let the flow values zij be such that i i dEM (xu , xv ) = We will make use ˇ For a pair of vertices u, v  V , the probability that u ,j d(i, j )z j . of those values below. Let (i, j ) be 1 if terminals i and and v get assigned to different terminals is at most j are separated by the clustering step, and 0 otherwise. xu - xv 1. Let E M be the earthmover distance w.r.t. . Now we Proof. In any iteration, conditioned on u being unashave that signed, the probability that it gets assigned to i is ex1 (2.1) E[||xu - xv ||1 ] = E[E M (xu , xv )] actly |T | xui . Thus the first property follows. Moreover, i 1 it follows that each vertex has a probability |T | of being (2.2)  E[ (i, j )zij ] assigned to some terminal in each iteration. ,j i Now let us look at a pair of vertices u and v . The i (2.3) d(i, j )z j  probability that exactly one of u and v is assigned 2s ,j to some terminal in a particular iteration is exactly  iu 1 1 = dE M (xu , xv ). |x i - xvi | = |T | xu - xv 1. |T | 2s Thus, the probability that u and v are assigned to The first equality (2.1) holds because  is a uniform met- different terminals can be upper bounded by ric, for which the earthmover metric is 1. The following Pr[exactly one of u and v assigned in an iteration] inequality (2.2) holds because one (not necessarily optiu v Pr[at least one of u and v assigned in an iteration] mal) way to route flow from x to x is to take the flow induced by zij , i.e., the flow from one cluster representaPlugging in the values of these probabilities, the second tive to another equals the total flow that zij has routed I property follows. from all the nodes in the first cluster to all the nodes in the second cluster; in this case, flow originating from zij is associated with cost (i, j ). The last inequality (2.3) ntragroup summary. Since the Kleinberg-Tardos algorithm assigns a vertex u to a terminal i with nonis due to the requirements of Definition 1.2. A zero xui , our truncation step ensures that any vertex u lso note that since the clusters are of diameter  , is assigned to a terminal within distance 8 from u. the distribution xu for any terminal u is still concenWhen vertices u and v are assigned to different trated on terminals within distance 8 of u. terminals f (u) and f (v ), the distance d(f (u), f (v )) is

1084


steps above could in fact have been done at any scale larger than 2s ; the analysis for the intragroup edges still goes through. Thus, if the terminal metric has a better decomposition at some large scale, the intragroup edges do not overpay in expectation. (The bound on the distance of a vertex from its assigned terminal, however, goes up, increasing the expected cost of intergroup edges.) To take advantage of this, we pick a random i  Lemma 2.5. For every u, v  Vs , [0, 1 log log k ], and do the decomposition at scale 2s+i 2 instead. In expectation, this can be shown to be E[d(f (u), f (v ))] = O() ˇ dEM (xu , xv ). a worst-case O(log k / log log k )-decomposition, and the intergroup edges are also fine in expectation. Thus we Intergroup edges. We now argue about the inter- can improve the result for general graphs from O(log k ) to O(log k / log log k ), matching the result in [FHRT03]. group edges. It can be shown that the techniques of [CKR01] Lemma 2.6. For any edge uv  E , the contribution imply an O( ) approximation to the problem when the to the expected cost of uv due to intergroup cutting is terminal metric satisfies |B (x, 2r)|/|B (x, r)|  2 for O(dEM (xu , xv )). every x and r. (Here B (x, r) is the set of points within radius r from x in the metric.) Since any such metric Proof. As argued above, each vertex u  Vs is assigned has a  -decomposition, their results follow from ours. to a terminal closer than 8 ˇ 2s . Let uv  E be However, there are metrics where  is much larger than an edge that goes from u  Vs to v  Vs where  (e.g. d-dimensional Euclidean spaces have  = d  s  s. Then d(f (u), f (v )) is at most dEM (f (u), u) + but have d-decompositions) and thus our results are s+3 dEM (u, v ) + dEM (v , f (v )). Since dEM (u, f (u))  2 stronger than those implied by previous techniques. and dEM (f (v ), v )  2s+3 , it follows that d(f (u), f (v ))  dEM (u, v ) + 2s+4 . 3 Metric lab eling via tree-rounding Moreover, note that uv is an intergroup edge In this section we use the earthmover LP to obtain an only if Au  rs-1 < Av , which happens with O(log n)-approximation for the metric labeling problem. s-2 probability at most min{1, (Av - Au )/2 }  The basic idea of our algorithm is to use the LP to define s-2 min{1, dEM (u, v )/2 }. Thus the expected contribuan embedding of the nodes of V into the earthmover tion to the length of an edge uv due to the intergroup metric space defined from the label metric (L, d). We cutting is O(dEM (u, v )). W then approximate the metric space (V , dEM ) with a tree, e note that the proof of lemma 2.6 does not and round in a coordinated way based on the tree. The use any property of the earthmover distance, and this crux of this method is that we can round on a tree while grouping can be applied to the semi-metric relaxation as preserving both separation costs along tree edges and well. This leads to a much simpler proof of the Calinescu all assignment costs. We start off by showing how to do et al. [CKR01] result on decomposable graphs. We omit this. the details here. Combining lemmas 2.5 and 2.6, we get Theorem 2.1. The algorithm described above gives an O() approximation algorithm to the 0-extension problem when the terminal metric is -decomposable. We note that each of the random steps of the algorithm can be easily derandomized using standard techniques, leading to a deterministic algorithm with the same performance guarantee. We omit the details from this extended abstract. Discussion. Recall that it is always the case that  = O(log k ), so that our algorithm has an O(log k ) worstcase bound. However, the clustering and pro jection Co ordinated rounding. Given two distributions x and y over labels, we show how to randomly round x and y to labels in a way that obeys the specified distributions, but is coordinated so that the expected distance between the resulting labels is dEM (x, y ). For i-1 i  [k + 1], define the breakpoint Bi = =1 xi . Identify label i with the interval Ii = [Bi , Bi+1 ). Generate a uniform random variable Ux  [0, 1), and round x to the label into whose interval Ux falls. We now use the flow variables zij to determine how to round y . Further subdivide Ii into intervals Iij = [Bij , Bi(j +1) ), where j -1 Bij = Bi + =1 zij , j  [k + 1]. If Ux lands in the subinterval Iij , we round x to i (since Ux  Iij  Ii ) and y to j . We also generate a [0, 1) random variable Uy as a

bounded by dEM (f (u), u) + dEM (u, v ) + dEM (v , f (v )), where dEM (f (u), u)  8 and dEM (f (v ), v )  8 . Moreover, by Lemma 2.4, the probability that u and v are assigned different terminals is at most xu - xv 1. Thus, using Lemmas 2.2 and 2.3, the expected value of dEM (f (u), f (v )) is at most min{1, 16(/ ) ˇ dEM (u, v )} ˇ (dEM (u, v ) + 16 ) = O() ˇ dEM (u, v ). Thus we have shown the following.

1085


function of Ux (for use later in further propagating the rounding). Break [0, 1) into intervals Ij , and partition each Ij into subintervals Iij by the zij flow values, analogously to what we did for x. If Ux  Iij , then set Uy = Bij + (Ux - Bij ). That is, Uy is defined so that it lands in the same spot in interval Iij as Ux did in interval Iij . This process leads easily to the following proposition.

2. Define the metric dLP on V by dLP (u, v ) = dEM (xu , xv ). Use Theorems 3.1 and 3.2 to generate a random tree T spanning just V that probabilistically approximates dLP with dilation O(log n). 3. Compute the earthmover distances along each edge of T to obtain the flow variables. Select one node arbitrarily as the root, randomly round it, and propagate the results to the rest of the tree.

Proposition 3.1. In the rounding scheme above, x Let A and S denote the assignment and separation and y are each rounded to labels with probabilities costs of our solution, while A LP and SLP denote the matching their given distributions, and the expected corresponding costs in the optimal LP solution. distance between these labels is dEM (x, y ). Moreover, Theorem 3.3. Our algorithm achieves E[A] = ALP Uy is distributed uniformly in [0, 1). and E[S ]  O(log n)SLP . Thus, it provides O(log n)This rounding technique easily extends to trees, approximation for metric labeling. where each node is assigned a distribution over labels. Fix an arbitrary root node r, and generate a uniform Proof. The LP solution provides a lower bound on the [0, 1) random variable Ur . Round r according to Ur cost of OP T . By Proposition 3.1, the nodes of the tree using the procedure described above, and propagate this are each rounded to labels according to the distribution rounding to all of r's children, thereby generating an specified by the LP, so the assignment cost is preserved associated random rounding variable Uc for each child in expectation. The separation cost is preserved in c. In this way, we propagate the rounding all the way expectation along each tree edge, so by the triangle inequality, E[d(f (u), f (v ))]  dT (u, v ) for all u, v  V down the tree. (here the expectation is over the randomized rounding). Corollary 3.1. If the input graph is a tree, the earth- But ET [dT (u, v )]  O(log n)dEM (xu , xv ), so summing mover LP has an integrality gap of 1. over the edges of G yields the desired bound. W We note that the metric labeling problem is polye note that this algorithm can also be easily nomial time solvable on trees using dynamic programming. The above corollary then provides an alternative derandomized. We omit the details. approach. More importantly, this rounding technique leads to a new approach for the general metric labeling Acknowledgment We would like to thank Eran Halperin and Satish Rao for several helpful discussions. problem. Our algorithm. A probabilistic tree approximation of References a metric d with distortion c is a probability distribution over a collection of trees T such that for all u, v  V , we [AP90] B. Awerbuch and D. Peleg. Sparse partitions. have ET T [dT (u, v )]  cd(u, v ), while for every T  T In 31st Annual IEEE Symposium on Foundations of we have dT (u, v )  d(u, v ). In general, the trees may Computer Science, pages 503­513, 1990. contain Steiner points not in the original metric space. [Bar98] Y. Bartal. On approximating arbitrary metrics by Our algorithm uses the following two results. tree metrics. In 30th Annual ACM Symposium on
Theory of Computing, pages 161­168. ACM, 1998.

Theorem 3.1. (Fakcharoenphol et al. [FRT03]) [BVZ98] Y. Boykov, O. Veksler, and R. Zabih. Markov For every n-point metric d there is a probabilistic tree random fields with efficient approximations. In 7th approximation with distortion O(log n) from which we IEEE International Conference on Computer Vision, can sample in polynomial time. pages 377­384, 1998.
[BVZ01] Y. Boykov, O. Veksler, and R. Zabih. Fast ap-

Theorem 3.2. (Gupta [Gup01]) For every tree T = proximate energy minimization via graph cuts. IEEE (V , E , w) and a set of required vertices V  V , there Transactions on Pattern Analysis & Machine Intel ligence, 23(11):1222­1239, Nov. 2001. exists a tree T  = (V , E  , w ) such that for al l u, v  V , [CCG+ 98] M. Charikar, C. Chekuri, A. Goel, S. Guha, and dT (u, v )  dT  (u, v )  8dT (u, v ). Here is our algorithm: 1. Solve the earthmover LP.

S. Plotkin. Approximating a finite metric by a small number of tree metrics. In 39th Annual Symposium on Foundations of Computer Science, pages 379­388, 1998.

1086


[Cha00] M. Charikar. Personal communication, January 2000. [CKNZ01] C. Chekuri, S. Khanna, J. Naor, and L. Zosin. Approximation algorithms for the metric labeling problem via a new linear programming formulation. In 12th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 109­118, 2001. [CKR00] G. Calinescu, H. J. Karloff, and Y. Rabani. An improved approximation algorithm for multiway cut. JCSS, 60(3):564­574, 2000. [CKR01] G. Calinescu, H. Karloff, and Y. Rabani. Approximation algorithms for the 0-extension problem. In 12th Annual ACM-SIAM Symposium on Discrete Algorithms, 2001. [DJP+ 94] E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiterminal cuts. SIAM J. Comput., 23(4):864­894, 1994. [FH00] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient matching of pictorial structures. In IEEE Computer Vision and Pattern Recognition Conference, pages 66­ 73, 2000. [FHRT03] J. Fakcharoenphol, C. Harrelson, S. Rao, and K. Talwar. An improved approximation algorithm for the 0-extension problem. In 14th Annual ACMSIAM Symposium on Discrete Algorithms, pages 257­ 265, 2003. [FRT03] J. Fakcharoenphol, S. Rao, and K. Talwar. A tight bound on approximating arbitrary metrics by tree metrics. In 35th Annual ACM Symposium on Theory of Computing, pages 448­455, 2003. [FT03] J. Fakcharoenphol and K. Talwar. Improved decompositions of graphs with forbidden minors. In 6th International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 36­46, 2003. ´ [GT00] A. Gupta and E. Tardos. A constant factor ap-

proximation algorithm for a class of classification algorithms. In 32nd Annual ACM Symposium on Theory of Computing, pages 652­658, 2000. [Gup01] A. Gupta. Steiner points in tree metrics don't (really) help. In 12th Annual ACM-SIAMSymposium on Discrete Algorithms, pages 220­227, 2001. [IG98] H. Ishikawa and D. Geiger. Segmentation by grouping junctions. In IEEE Conference on Computer Vision and Pattern Recognition, 1998. [Kar98] A. Karzanov. Minimum 0-extensions of graph metrics. European Journal of Combinatorics, 19(1):71­ 101, 1998. [KKS+ 99] D. R. Karger, P. Klein, C. Stein, M. Thorup, and N. E. Young. Rounding algorithms for a geometric embedding of minimum multiway cut. In 31st Annual ACM Symposium on Theory of Computing, pages 668­ 678, 1999. [KPR93] P. Klein, S. A. Plotkin, and S. Rao. Excluded minors, network decomposition, and multicommodity flow. In 25th Annual ACM Symposium on Theory of Computing, pages 682­690, May 1993. ´ [KT02] J. M. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields. Journal of the Association for Computing Machinery, 49:616­639, 2002. [LN03] J. R. Lee and A. Naor. Metric decomposition, smooth measures, and clustering, 2003. Submitted. [LR99] F. T. Leighton and S. Rao. Multicommodity maxflow min-cut theorems and their use in designing approximation algorithms. JACM, 46(6):787­832, 1999. [LS93] N. Linial and M. Saks. Low diameter graph decompositions. Combinatorica, 13(4):441­454, 1993. [Vaz01] V. V. Vazirani. Approximation Algorithms. Springer-Verlag, New York, 2001.

1087