Efficiently Solving Convex Relaxations for MAP Estimation

M. Pawan Kumar Department of Engineering Science, University of Oxford P.H.S. Torr Department of Computing, Oxford Brookes University

pawan@robots.ox.ac.uk philiptorr@brookes.ac.uk

Abstract
The problem of obtaining the maximum a posteriori (map) estimate of a discrete random field is of fundamental importance in many areas of Computer Science. In this work, we build on the tree reweighted message passing (trw) framework of (Kolmogorov, 2006; Wainwright et al., 2005). trw iteratively optimizes the Lagrangian dual of a linear programming relaxation for map estimation. We show how the dual formulation of trw can be extended to include cycle inequalities (Barahona & Mahjoub, 1986) and some recently proposed second order cone (soc) constraints (Kumar et al., 2007). We propose efficient iterative algorithms for solving the resulting duals. Similar to the method described in (Kolmogorov, 2006), these algorithms are guaranteed to converge. We test our approach on a large set of synthetic data, as well as real data. Our experiments show that the additional constraints (i.e. cycle inequalities and soc constraints) provide better results in cases where the trw framework fails (namely map estimation for non-submodular energy functions).

It is therefore not surprising that a number of approximate map estimation approaches exist in the literature. One such class of approaches which provides a good approximation, both in theory and in practice, is based on convex relaxations (e.g. see (Kumar et al., 2007) for an overview). In this work, we focus on the issue of solving these relaxations efficiently with the goal of handling a large number of random variables, e.g. variables corresponding to pixels in an image. A discrete random field is defined over random variables v = {v0 , · · · , vn-1 }, each of which can take a label from the set l = {l0 , · · · , lh-1 }. Throughout this paper, we will assume a conditional random field (crf) while noting that all our results are applicable to the Markov random field framework. A crf describes a neighbourhood relationship E between the variables such that (a, b)  E if, and only if, va and vb are neighbours. A labelling of the crf is specified by a function f : {0, · · · , n - 1} - {0, · · · , h - 1} (i.e. variable va takes label lf (a) ). Given data D, the energy of the labelling is given by Q(f ; D, ) = v
1 a;f (a) +

a v

(

2 ab;f (a)f (b) , a,b)E

(1)

1. Introduction
The problem of obtaining the maximum a posteriori (map) estimate of a discrete random field plays a central role in various applications, e.g. stereo reconstruction (Szeliski et al., 2006) and protein side-chain prediction (Sontag & Jaakkola, 2007). Furthermore, it is closely related to many important combinatorial optimization problems such as maxcut (Goemans & Williamson, 1995) and 0-extension (Karzanov, 1998).
App earing in Proceedings of the 25 th International Conference on Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s).

1 2 where a;f (a) and ab;f (a)f (b) are the data-dependent unary and pairwise potentials respectively, and  denotes the parameter of the crf. The problem of map estimation is to obtain the labelling f  with the minimum energy (or equivalently the maximum posterior probability), i.e. f  = arg minf Q(f ; D, ).

Related Work: We build upon the linear programming (lp) relaxation of (Wainwright et al., 2005), which we call lp-s (since it was first proposed by (Schlesinger, 1976) for the special case of hard constraint pairwise potentials). Although the lp-s relaxation can be solved in polynomial time using Interior Point algorithms, the state of the art softwares can only handle up to a few hundred variables due to their large memory requirements. To overcome this prob-


Efficiently Solving Convex Relaxations for MAP Estimation

lem, two iterative algorithms were proposed by (Wainwright et al., 2005) for solving the dual of the lp-s relaxation. Similar to min-sum belief propagation (bp), these algorithms are not guaranteed to converge. The work of (Kolmogorov, 2006) addressed this problem by proposing a convergent sequential tree-reweighted message passing (trw-s) algorithm for solving the dual. Despite its strong theoretical foundation, it was observed that trw-s yields labellings with very high energies when the energy function contains nonsubmodular terms (Kolmogorov, 2006). This is not surprising since the lp-s relaxation provides an inaccurate approximation in such cases (e.g. see (Kumar et al., 2007)). In this work, we address this deficiency of trw-s by appending the lp-s relaxation with some useful constraints. Our Results: We show how the dual formulation of the lp-s relaxation can be extended to include linear cycle inequalities (Barahona & Mahjoub, 1986) (section 3). Furthermore, we incorporate the recently proposed second order cone (soc) constraints of (Kumar et al., 2007) within this framework (section 4). Note that although the importance of cycle inequalities and soc constraints is well-recognized, their use has been limited to a small number of random variables due to the lack of efficient algorithms (Sontag & Jaakkola, 2007). Our results on including these constraints within the trw formulation allow us to develop efficient convergent algorithms for solving the resulting duals. We successfully apply these algorithms to several synthetic and real problems containing a large number of variables which could not be handled by previous approaches (section 5). Our experiments indicate that incorporating these constraints provides a much better approximation for the map estimation problem within reasonable computational times compared to several state of the art algorithms. Additional experimental results and proofs are provided in (Kumar & Torr, 2008).

where the term D is dropped from the lhs to make the notation less cluttered. Reparameterization: A parameter  is called a reparameterization of the parameter  (denoted by   ) if, and only if, Q(f ; D, ) = Q(f ; D, ), f . (5)

Over-complete Representations: A labelling f can be represented using an over-complete set of boolean variables y defined as 1 if f (a) = i, ya;i = , yab;ij = ya;i yb;j . (6) 0 otherwise. We also define variables (x, X) such that xa;i = 2ya;i - 1, Xab;ij = 4yab;ij - 2ya;i - 2yb;j + 1. (7) We will sometimes specify the additional constraints (i.e. cycle inequalities and soc constraints) using variables (x, X), since they will allow us to write these constraints concisely. The lp-s Relaxation: The lp-s relaxation of the map estimation problem is given by y = arg minyLOC AL(v,E ) y ,   ya;i  [0l, 1], yab;ij  [0, 1], (8) LOC AL(v, E ) = l i l ya;i = 1,  yab;ij = ya;i . j l

The term LOC AL(v, E ) stands for local consistency polytope (Wainwright et al., 2005) and denotes the feasibility region of the lp-s relaxation (specified by the above constraints for all va  v, (a, b)  E , li , lj  l). Dual of the lp-s Relaxation: Let T denote a set of tree-structured crfs defined over subsets of the given random variables. For a crf T  T , we denote its random variables by vT , its neighbourhood relationship by ET and its parameter as T . The parameter T conT sists of unary potentials a;1 and pairwise potentials i T2 ab;ij . Let  = {(T ), T  T } be a set of non-negative real numbers which sum to one. Using the above notation, the dual of the lp-s relaxation can be written as follows (Kolmogorov, 2006; Wainwright et al., 2005): T (T )q (T ). (9) T max T (T )  T The trw-s Algorithm: Table 1 describes the trw-s algorithm (Kolmogorov, 2006) which attempts to solve the dual of the lp-s relaxation. In other words, it solves for the set of parameters  T , T  T , which maximize the dual (9). There are two main steps: (i)

2. Preliminaries
We begin by introducing some notation which would allow us to describe our results concisely. Optimal Energy and Min-Marginals: The energy of the optimal labelling and the min-marginals of random variables and neighbouring random variables is given by the following equations respectively: q ( ) qa;i ( ) = min Q(f : D),
f

(2) (3) (4)

= min Q(f ; D, ),
f ,f (a)=i f ,f (a)=i,f (b)=j

qab;ij ( ) =

min

Q(f ; D, ),


Efficiently Solving Convex Relaxations for MAP Estimation

reparameterization, which involves running one pass of bp on the tree structured crfs T; and (ii) averaging operation. trw-s is guaranteed not to decrease the value of the dual (9) at each iteration. Further, it can be shown that it converges to a solution which satisfies the weak tree agreement (wta) (Kolmogorov, 2006).
Initialization E 1. For every   v , find all trees T  T which contains  . T  (T ) T   . 2. Initialize  T such that 1 Typically, we set (T ) = |T | for all T  T . | T 1 Then we can initialize a;1 = a;i |TT | | for all T  Tva . i v
T2 2 Similarly, ab;ij = ab;ij |T|T | | for all T  T(a,b) . (a,b) Iterative Steps E 3. Pick an element   v . T 4. For all T  T , reparameterize  T to  such that T1 (i)  a;i = qa;i ( T ), if  = va  v, T1 T1 T2 (ii) a;i + b;j + ab;ij = qab;ij ( T ), if  = (a, b)  E . This step involves running one iteration of bp for T . 5. Averaging op eration: (i) If  = va  v, T T1  (T ) a ; i . (a) Compute a;i = 1 Ta a
a

(

Xak am ;ik im -
ak ,am )EF

(

Xak am ;ik im  2 - c,
ak ,am )EC -EF

(10)

where lik , lim  l . The variables Xak am ;ik im are defined in equation (7). It can be shown that adding cycle inequalities to lp-s, i.e. problem (8), provides a better relaxation than lp-s alone. Their importance is reflected in their wide use in recent literature such as (Sontag & Jaakkola, 2007; Zwick, 1999). In general, a set of NC cycle inequalities defined on a cycle C = (vC , EC ) (using different labels lik for variables vak  vC ) can be written as AC y  bC . In other words, for every cycle we can define up to hc cycle inequalities (where h = |l|), i.e. NC  {0, 1, · · · , hc }. Let C be a set of cycles in the given crf. Theorem 1 (given below) provides us with the dual of the lp relaxation obtained by appending problem (8) with cycle inequalities (defined over cycles in the set C ). We refer to the resulting relaxation as lp-c (where c denotes cycles). Theorem 1: The following problem is the dual of problem (8) appended with a set of cycle inequalities AC y  bC , for all C  C (hereby referred to as the lp-c relaxation): C T  (C )(bC ) uC , (T )q (T ) + max C T T  (C )(AC ) uC  , (T ) + s.t. uC  0, k  {1, 2, · · · , NC }, C  C . k (11) Here  = { (C ), C  C } is some (fixed) set of nonnegative real numbers which sum to one, and uC = {uC , k = 1, · · · , NC } are some non-negative slack varik ables. Similar to the dual (9), the above problem cannot be solved using standard software for a large number of variables v. In order to overcome this deficiency we propose a convergent algorithm (similar to trw-s) to approximately solve problem (11). We call our approach the trw-s(lp-c) algorithm. In order to describe trw-s(lp-c), we need the following definitions. We say that a tree structured random field T = (vT , ET )  T belongs to a cycle C = (vC , EC )  C (denoted by T  C ) if, and only if, there exists an edge (a, b)  ET such that (a, b)  EC . In other words, T  C if they share a common pair of neighbouring random variables (a, b)  E . We also define the following problem: T T  C C max C (T )q ( ) +  (C )(b ) u ,

1

(b) Set a;i + b;j + ab;ij = ab;ij , for all T  T(a,b) . 6. Rep eat steps 3, 4 and 5 till convergence.
T T2 Table 1. The trw-s algorithm. Recall that a;1 and ab;ij i are the unary and pairwise p otentials for the parameter T1 T2  T . Similarly, a;i and ab;ij are the unary and pairwise p otentials defined by the parameter  . The terms a = T T (T ) and ab = (T ) are the vari,va vT ,(a,b)ET able and edge app earance terms for va  v and (a, b)  E resp ectively. In step 3, the value of the dual (9) remains unchanged. Step 4, i.e. the averaging op eration, ensures that the value of the dual does not decrease. trw-s converges to a solution which satisfies the wta condition.

(b) Set a;i = a;i , for all T  Tva . (ii) If  = (a, b)  T , (a) Compute ab;ij = T T1 T1 T2 1 (T )(a;i + b;j + ab;ij ).  T
ab

T1

T1

(a,b)

T1

T2

3. Adding Linear Constraints
We now show how the results of (Kolmogorov, 2006; Wainwright et al., 2005) can be extended to include an arbitrary number of linear cycle inequalities (Barahona & Mahjoub, 1986; Kumar et al., 2007). This requires us to incorporate cycle inequalities into the dual (11). We begin by briefly describing cycle inequalities. Consider a cycle of length c in the graphical model of the given crf, which is specified over a set of random variables vC = {vb , b = a1 , a2 , · · · , ac } such that EC = {(a1 , a2 ), (a2 , a3 ), · · · , (an , a1 )}  E . Further, let EF  EC such that |EF | (i.e. the cardinality of EF ) is odd. Using these sets of edges, a cycle inequality can be specified as

1 Note that using the variable y would result in a less compact representation of cycle inequalities.


Efficiently Solving Convex Relaxations for MAP Estimation

s.t.

T

C

(T )T +  (C )(AC ) uC =  C , uC  0, k  {1, 2, · · · , NC }, (12) k

for some parameter C . The variables of the above C T T problem are rE stricted to uk , a;1 and ab2ij where e i ; (a, b)  ET C for some T  C . In other words, problem (12) has fewer variables and constraints than dual (11) and can be solved easily using standard Interior Point algorithms for small cycles C . As will be seen, even using cycles of size 3 or 4 results in much better approximations of the map estimation problem for non-submodular energy functions. Table 2 describes the convergent trw-s(lp-c) algorithm for approximately solving the dual (11). The algorithm consists of two main steps : (i) solving problem (12) for a cycle; and (ii) running steps 4 and 5 of the trw-s algorithm. Note that our approach is different from other generalizations of trw, e.g. (Wiegernick, 2005) which computes marginals. Specifically, we do not cluster random variables but include additional constraints to reduce the feasibility region of the relaxation. Our experiments in section 5 show that, unlike (Wiegernick, 2005), we always outperform bp. The properties of the trw-s(lp-c) algorithm are summarized below.
Initialization 1. Choose a set of tree structured random fields T . Choose a set of cycles C . For example, if the 4-neighb ourhood is employed, C can b e the set of all cycles of size 4. T  (T ) T   . 2. Initialize  T such that C Initialize uk = 0 for all C and k. Iterative Steps C 3. Pick an element   v . Find all cycles C  C which contains  . 4. For a cycle C  C , compute T (T )T +  (C )(AC ) uC C = C using the values of  T and uC obtained in the previous iteration. Solve problem (12) using an Interior Point method. Up date the values of  T and uC . 5. For all trees T  T which contain  , run steps 4 and 5 of the trw-s algorithm. 6. Rep eat steps 3 and 4 for all cycles C  C . 7. Rep eat steps 3 to 5 for all elements  till convergence. Table 2. The trw-s(lp-c) algorithm.

vector C of cycle C remains unchanged. Hence, after step 4 of the trw-s(lp-c) algorithm, the reparameterization constraint is satisfied. It was also shown that step 5 (i.e. running trw-s) provides a reparameterization of  (see Lemma 3.3 of (Kolmogorov, 2006) for details). This proves Property 1. Prop erty 2: At each step of the algorithm, the value of the dual (11) never decreases. Clearly, step 4 of the trw-s(lp-c) algorithm does not decrease the value of the dual (11) (since the ob jective function of problem (12) is part of the ob jective function of dual (11)). The work of (Kolmogorov, 2006) showed that step 5 (i.e. trw-s) also does not decrease this value. Note that the lp-c relaxation is guaranteed to be bounded since it dominates the lp-s relaxation (Kumar et al., 2007), which itself is bounded (Kolmogorov, 2006). Therefore, by the Bolzano-Weierstrass theorem (Fitzpatrick, 2006), it follows that trw-s(lp-c) will converge. Prop erty 3: Like trw-s, the necessary condition for convergence of trw-s(lp-c) is that the parameter vectors T of the trees T  T satisfy wta. This follows from the fact that trw-s increases the value of the dual in a finite number of steps as long as the set of parameters T , T  T , do not satisfy wta (see (Kolmogorov, 2006) for details). Prop erty 4: Unlike trw-s, wta is not the sufficient condition for convergence. One of the main drawbacks of the trw-s algorithm is that it converges as soon as the wta condition is satisfied. Experiments in (Kolmogorov, 2006) indicate that this results in high energy solutions for the map estimation problem when the energy function is non-submodular. Using a counterexample, it can be shown that wta is not the sufficient condition for the convergence of trw-s(lp-c) (Kumar & Torr, 2008). Obtaining the Lab elling: Similar to the trw-s algorithm, trw-s(lp-c) solves the dual (11) and not the primal problem. In other words, it does not directly provide a labelling of the random variables. In order to obtain a labelling, we use the same scheme as the one suggested in (Kolmogorov, 2006) for the trw-s algorithm. Briefly, we assign labels to the variables v = {v0 , v1 , · · · , vn-1 } in increasing order (i.e. we label variable T 0 , followed by variable v1 and so on). Let v (T )T . At each stage, a variable va is asT = signed the label lf (a) such that   b T T ab2i,f (b))  , (14) f (a) = arg min a;1 + i ;
i,li l <a,(a,b)E

3.1. Prop erties of the trw-s(lp-c) Algorithm. Prop erty 1: At each step of the algorithm, the reparameterization constraint is satisfied, i.e. T C (T )T +  (C )(AC ) uC  . (13) The constraint in problem (12) ensures that parameter

T T where a;1 and ab2i,f (b) are the unary and pairwise i ;

potentials corresponding to the parameter  T respec-


Efficiently Solving Convex Relaxations for MAP Estimation

tively. It can be shown that under certain conditions the above procedure provides the optimal labelling (Meltzer et al., 2005).

C Here uC and vk are some slack variables. k

4. Adding SOC Constraints
We now show how second order cone (soc) constraints can be added to the dual (9). Specifically, we consider the two soc constraints proposed in (Kumar et al., 2007) which result in the socp-c and socp-q relaxations described below. The socp-c Relaxation: Consider a set of random variables vC = {vb , b = a1 , · · · , ac }  v such that EC = {(a1 , a2 ), (a2 , a3 ), (ac , a1 )}  E (i.e. vC forms a cycle of length c). We define a vector xC whose k th element is given by xak ;ik and a matrix XC whose (k , m)th element is given by Xak am ;ik im (where lik , lim  l). socp-c specifies constraints ||U xC ||2  C·XC where C = Dc +c I = UU and (·) represents the Frobenius inner product. The c × c matrix Dc is given by  |i - j | = c - 1, if  (-1)c-1 1 if |i - j | = 1, Dc (i, j ) =  0 otherwise, (15) and c is the absolute value of the smallest eigenvalue of Dc . The socp-q Relaxation: Consider a set of random variables vC = {vb , b = a1 , · · · , ac }  v such that EC = {(ai , aj ), i, j = 1, · · · , c}  E (i.e. vC form a clique of size c). socp-q specifies constraints of the form ||U xC ||2  C · XC where C is a matrix whose elements are all 1. In general, a set of NC soc constraints on a cycle/clique can be defined as ||AC y + bC ||  y cC + dC , k  {1, 2, · · · , NC }. (16) k k k k Let C be a set of cycles/cliques in the graphical model of the given random field. The following theorem provides us with the dual of the socp relaxation obtained by appending problem (8) with soc constraints defined over the set C . Theorem 2: The following problem is the dual of problem (8) appended with a set of soc constraints ||AC y + bC ||  y cC + dC for k  {1, 2, · · · , NC } and k k k k C  C. T C kC max (T )q (T ) -  (C ) pk , T C kC T s.t. (T ) +  (C ) qk  , C ||uC ||  vk , k  {1, 2, · · · , NC }, C  C . (17) k where
C pC = (bC ) uC + dC vk , k k k k C C qk = (AC ) uC + cC vk . k k k

We can define up to hc soc constraints for a cycle/clique, where c is the size of the cycle/clique (i.e. NC  {0, 1, · · · , hc }). Before proceeding further, we also define the following problem: max s.t. kC pk , (T )q (T ) -  (C ) T kC T  qk = C , C (T ) +  (C ) C ||uC ||  vk , k  {1, 2, · · · , NC }, k T
C

(20)

where C is some parameter vector. The variables of C T the above problem are restricted to uC , vk , a;1 and i k E T ab2ij where (a, b)  ET C . Like problem (12), we ; can solve problem (20) using standard Interior Point algorithms for small cycles/cliques C . Similar to trw-s(lp-c), a convergent algorithm can now be described for solving the dual (17). This algorithm differs from trw-s(lp-c) in only step 4, where it solves problem (20) for a cycle/clique C instead of problem (12). We refer to this algorithm as either trw-s(socp-c) or trw-s(socp-q) depending upon the socp relaxation that we are solving. When using the trw-s(socp-q) algorithm, we include all slack variables corresponding to the cycle inequalities defined over the cycles in clique C . It can easily be shown that both trw-s(socp-c) and trw-s(socp-q) satisfy all the properties given in § 3.1. Note that, like trw-s and trw-s(lp-c), these algorithms do not directly provide a labelling for the random variables of the crf. Instead we use the procedure described in § 3.1 to obtain the final solution.

5. Experiments
We tested the approaches described in this paper using both synthetic and real data. For synthetic data experiments, we closely follow the setup of (Kolmogorov, 2006). We show that our algorithms overcome a wellknown deficiency of trw-s, namely that it does not provide good map estimates for non-submodular energy functions. Next, we consider the problem of segmentation using real data and show favourable comparison between our methods and several other standard map estimation techniques. 5.1. Synthetic Data Datasets: We conducted two sets of experiments using binary grid crfs (i.e. h = |l| = 2) of size 30 × 30. In the first experiment the edges of the graphical model, i.e. E , were defined using a 4-neighbourhood system while the second experiment used an 8-neighbourhood system. Similar to (Kolmogorov, 2006), the unary po1 1 tentials a;0 and a;1 were generated using the normal 2 distribution N (0, 1). The pairwise potentials ab;00 and

(18) (19)


Efficiently Solving Convex Relaxations for MAP Estimation
2 2 2 ab;11 were set to 0 while ab;01 and ab;10 were gener2 ated using N (0,  ). For both experiments, 50 crfs were generated using the method described above. All the crfs defined non-submodular energy functions (i.e. there exists an (a, b)  E such that ab;01 + ab;10 < 0) which are in general np-hard to minimize. As noted in (Kolmogorov, 2006), trw-s performs considerably worse than bp on such examples.

(a)

Implementation Details: We tested the lp-c and the socp-c relaxations in the first experiment. Constraints were defined on all cycles of size 4. The lp-c and socp-q relaxation were tested in the second experiment. Cycles inequalities were defined on all cycles of size 3. In addition, for socp-q, soc constraints were defined on all cliques of size 4. In both the experiments, our algorithms were tested using trees defined by individual edges of the graphical model for ease of implementation. In other words, a tree T = (vT , ET )  T such that vT = {va , vb } and ET = {(a, b)}  E . However, we note here that our algorithms are general and can be applied for any choice of trees. Although our current set of trees are quite restrictive, the results show that they outperform several state of the art algorithms. The trw-s algorithm, as well as other standard approaches, was tested using the publically available code which uses monotonic chains as trees. The terms (T ) and  (C ) were set to 1/|T | and 1/|C | respectively for all T  T and C  C . We found it sufficient to define one cycle inequality per cycle C using a set of labels {li1 , li2 , · · · , lic } which satisfies (
ak ,am )EF

(b)

ak am ;ik im - ak am ;jk jm -

(

ak ,am )EF

(

ak ,am )EC -EF

2 ak am ;ik im  2 ak am ;jk jm ,

Figure 1. Results of the synthetic data exp eriment. (a) First exp eriment. The x-axis shows the iteration numb er. The lower curves show the average value of the dual at each iteration over 50 random crfs while the upp er curves show the average energy of the b est lab elling found till that iteration. The additional constraints in the lp-c and socp-c relaxations enable us to obtain lab ellings with lower energy compared to trw-s and bp. Cycle inequalities provide a b etter approximation than the soc constraint of the socpc relaxation. (b) Second exp eriment. Note that the value of the dual obtained using socp-q is greater than the value of the dual of the lp-c relaxation.

(

ak ,am )EC -EF

for all sets of labels {lj1 , · · · , ljc }. Here EC = {(a1 , a2 ), · · · , (an , a1 )} and EF  EC such that |EF | = 3. As proposed in (Kumar et al., 2007), we also define only one soc constraint per cycle/clique when considering the socp-c and the socp-q relaxations. At each iteration, problems (12) and (20) were solved using the mosek software (available at http://www.mosek.com). Results: Figure 1 (a) shows the results obtained for 1 the first experiment using  = 0 (where d is the ded gree of the variables in the graphical model). Note that since the energy functions are non-submodular, trw-s provides labellings with higher energies than bp as observed in (Kolmogorov, 2006). However, the additional constraints in the lp-c and socp-c algorithm enable us to obtain labelling with lower energies than bp. Further, unlike bp, they also provide us with the value of the dual at each iteration. This value allows us to find out how close we are to the global optimum (since the energy of the optimal labelling cannot be less than the

value of the dual). Also note that the value of the lpc dual is greater than the value of the socp-c dual. This provides empirical evidence that lp-c dominates socp-c as conjectured in (Kumar et al., 2007). The results of the second experiment are shown in Fig1 ure 1 (b) using  = 0 . Again, bp outperforms trw-s, d while lp-c and socp-q provide better approximations. The soc constraints defined over cliques in socp-q provide a greater value of the dual compared to the lp-c relaxation. The complexity and timings for all the algorithms are given in tables 3 and 4. 5.2. Real Data - Segmentation We now present the results of our method on interactive segmentation (Boykov & Jolly, 2001) where, given some seed pixels for all the segments present in an image, we wish to obtain the segmentation of the image. Problem Formulation: The problem of obtaining the segmentation of an image can be cast within the crf framework. Specifically, we define a crf over random variables v = {v0 , · · · , vn-1 }, where each variable


Efficiently Solving Convex Relaxations for MAP Estimation Algorithm bp trw-s lp-c socp-c No. of Var. nh + |E |h2 nh + |E |h2 nh + |E |h2 No. of Cons. n + 2|E |h 2n + 2|E |h 2n + 2|E |h Time(sec) 0.0018 0.0018 7.5222 8.9091

Table 3. Complexity and timings of the algorithms for the first synthetic data exp eriment with a 4-neighb ourhood relationship. Recall that n = |v| is the numb er of random variables, h = |l| is the size of the lab el set and E is the neighb ourhood relationship defined by the crf. The second and third columns show the numb er of variables and constraints in the primal problem resp ectively. The fourth column shows the average time of the each algorithm for one iteration (in seconds). All timings are rep orted for a Pentium IV 3.3 GHz processor with 2GB RAM. Algorithm bp trw-s lp-c socp-q No. of Var. nh + |E |h2 nh + |E |h2 nh + |E |h2 No. of Cons. n + 2|E |h 5n + 2|E |h 6n + 2|E |h Time(sec) 0.0027 0.0027 7.7778 9.1170

Figure 2. Segmented keyframe of the `Garden' sequence. The left image shows the keyframe while the right image shows the corresp onding segmentation provided by the user. The four different colours indicate pixels b elonging to the four segments namely sky, house, garden and tree. Algorithm Avg. Time-1 (s) Avg. Time-2 (s) bp 0.1400 0.1740 trw-s 0.1400 0.1740  -swap 0.1052 0.1201 -expansion 0.1100 0.1240 lp-c 140.3320 142.2226 socp-c/socp-q 143.6365 144.9890 Table 5. Average timings of the algorithms (p er iteration) for the first exp eriment on video segmentation with a 4neighb ourhood relationship (column 2) and the second exp eriment with an 8-neighb ourhood relationship (column 3). Again, all timings are rep orted for a Pentium IV 3.3 GHz processor with 2GB RAM.

Table 4. Complexity and timings for the second synthetic data exp eriment with an 8-neighb ourhood relationship. Note that socp-q includes all the constraints of lp-c.

corresponds to a pixel of the frame. Each label in the set l = {l0 , · · · , lh-1 } corresponds to a segment (where h is the total number of segments). The unary potential of assigning a variable va to segment li is specified by the negative log-likelihood of the rgb value of pixel a given the seed pixels of the segment li . The pairwise potentials encourage continuous segments whose boundaries lie on image edges. For more details, we refer the reader to (Boykov & Jolly, 2001). The problem of obtaining the segmentation of a frame then boils down to that of finding the map estimate of the crf. Datasets and Implementation Details: We used the well-known `Garden' sequence to conduct our experiments (with frame size 120 × 175). The seed pixels were provided using the ground truth segmentation of a keyframe as shown in Fig. 2. Similar to the synthetic data experiment, we defined the trees as individual edges of the graphical model of the crf for our algorithms. Other algorithms were tested using publically available code (including trws which uses monotonic chains as trees). We specified one cycle inequality and one soc constraint for each cycle/clique (as described in the previous section). The terms (T ) and  (C ) were set to 1/|T | and 1/|C | respectively for all T  T and C  C . Once again, problems (12) and (20) were solved using mosek. Results: For the first set of experiments, we used a 4-neighbourhood system and tested the following algorithms: trw-s, lp-c, socp-c,  -swap, -expansion and bp. Fig. 3 shows the segmentations (of frames

other than the keyframe) and the values of the energy function obtained for all algorithms. Note that, by incorporating additional constraints using all cycles of length 4, lp-c and socp-c outperform other methods. Further, the cycle inequalities in lp-c provide better results than the soc constraints of socp-c. Table 5 provides the average time for all algorithms. The second set of experiments used an 8neighbourhood system and tested the following algorithms: trw-s, lp-c, socp-q,  -swap, expansion and bp. For the lp-c algorithm, cycle inequalities were specified for all cycles of size 3. In addition, the socp-q algorithm specifies soc constraints on all cliques of size 4. Fig. 4 shows the segmentations and energies obtained for all the algorithms. The average timings per iteration are shown in table 5. Note that, similar to the synthetic data examples, socp-q outperforms lp-c by incorporating additional soc constraints.

6. Discussion
We extended the lp-s relaxation based approach of (Kolmogorov, 2006; Wainwright et al., 2005) for the map estimation problem. Specifically, we showed how cycle inequalities and soc constraints can be incorporated within the trw framework. We also proposed convergent algorithms for solving the resulting duals. Our experiments indicate that these additional constraints provide a more accurate approximation for map estimation when the energy function is non-submodular. Although our algorithm is much faster than Interior Point methods, it is slower than trw-s and bp. An interesting direction for future re-


Efficiently Solving Convex Relaxations for MAP Estimation Input Input

bp 0380  0778  0571 trw-s 0151 lp-c 0000 socp-c 0026 0086 1044 Figure 3. Segmentations obtained for the `Garden' video sequence using 4-neighb ourhood. The corresp onding energy values (scaled up to integers for using  -swap and -expansion) of all the algorithms are shown b elow the segmentation. The following constant terms are subtracted from the energy values of all algorithms for the three frames resp ectively (to make minimum energy among all algorithms 0): 5139499, 5145234 and 5126941. 0000 0000 0126 1596 0094 0176 0433 0585 0047 6098

bp 8175  1187  2453 trw-s 6425 lp-c 0719 socp-q 0000 0000 0000 Figure 4. Segmentations obtained for the `Garden' video sequence using 8-neighb ourhood. The corresp onding energy values (reduced by 5304466, 5299756 and 5292224 for the three frames resp ectively) are also shown. Kumar, M. P., & Torr, P. H. S. (2008). Efficiently solving convex relaxations for MAP estimation (Technical Rep ort). Oxford Brookes University. Meltzer, T., Yanover, C., & Weiss, Y. (2005). Globally optimal solutions for energy minimization in stereo vision using reweighted b elief propagation. ICCV. Schlesinger, M. (1976). Sintaksicheskiy analiz dvumernykh zritelnikh singnalov v usloviyakh p omekh. Kibernetika, 4, 113­130. Sontag, D., & Jaakkola (2007). New outer b ounds on the marginal p olytop e. NIPS. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tapp en, M., & Rother, C. (2006). A comparative study of energy minimization methods for markov random fields. ECCV (pp. I I: 16­ 29). Wainwright, M., Jaakola, T., & Willsky, A. (2005). MAP estimation via agreement on trees: Message passing and linear programming. IEEE Trans. on Information Theory, 51, 3697­3717. Wiegernick, W. (2005). Approximations with reweighted generalized b elief propagation. AISTATS. Zwick, U. (1999). Outward rotations: A tool for rounding solutions of semidefinite relaxations, with applications to MAX CUT and other problems. STOC (pp. 679­687). 0264 0297 1309 0297 1266 1225 1368 1289 25620 18314

search would be to develop specialized algorithms for solving problems (12) and (20) (which are used in our approach).

References
Barahona, F., & Mahjoub, A. (1986). On the cut p olytop e. Mathematical Programming, 36, 157­173. Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal b oundary and region segmentation of ob jects in N-D images. ICCV (pp. I: 105­112). Fitzpatrick, P. (2006). Advanced calculus. Thompson Brooks/Cole. Goemans, M., & Williamson, D. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of ACM, 42, 1115­1145. Karzanov, A. (1998). Minimum 0-extension of graph metrics. European Journal of Combinatorics, 19, 71­101. Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. PAMI, 28, 1568­ 1583. Kumar, M. P., Kolmogorov, V., & Torr, P. H. S. (2007). An analysis of convex relaxations for MAP estimation. NIPS.