Local Algorithms for Approximate Inference in Minor-Excluded Graphs Anonymous Author(s) Affiliation Address email Abstract We present a new local approximation algorithm for computing MAP and logpartition function for arbitrary exponential family distribution represented by a finite-valued pair-wise Markov random field (MRF), say G. Our algorithm is based on decomposing G into appropriately chosen small components; computing estimates locally in each of these components and then producing a good global solution. We prove that the algorithm can provide approximate solution within arbitrary accuracy when G excludes some finite sized graph as its minor and G has bounded degree: all Planar graphs with bounded degree are examples of such graphs. The running time of the algorithm is (n) (n is the number of nodes in G), with constant dependent on accuracy, degree of graph and size of the graph that is excluded as a minor (constant for Planar graphs). Our algorithm for minor-excluded graphs uses the decomposition scheme of Klein, Plotkin and Rao (1993). In general, our algorithm works with any decomposition scheme and provides quantifiable approximation guarantee that depends on the decomposition scheme. 1 Introduction Markov Random Field (MRF) based exponential family of distribution allows for representing distributions in an intuitive parametric form. Therefore, it has been successful for modeling in many applications Specifically, consider an exponential family on n random variables X = (X 1 , . . . , Xn ) represented by a pair-wise (undirected) MRF with graph structure G = (V , E ), where vertices V = {1, . . . , n} and edge set E V × V . Each X i takes value in a finite set (e.g. = {0, 1}). The joint distribution of X = (X i ): for x = (xi ) n , i ( Pr[X = x] exp i (xi ) + ij (xi , xj ) . (1) V = i,j )E Here, functions i : R+ {x R : x 0}, and ij : 2 R+ are assumed to be arbitrary non-negative (real-valued) functions. 1 The two most important computational questions of interest are: (i) finding maximum a-posteriori (MAP) assignment x , where x = arg maxxn Pr[X = x]; and (ii) marginal distributions of variables, i.e. Pr[X i = x]; for x , 1 i n. MAP is equivalent to a minimal energy assignment (or ground state) whiere energy, E (x), of state x n is defined as E (x) = -H(x) + Constant, where H(x) = ( V i (xi ) + i,j )E ij (xi , xj ). Similarly, computing marginal is equivalent to computing logx i . ( partition function, defined as log Z = log i (xi ) + ij (xi , xj ) n exp V i,j )E 1 Here, we assume the positivity of i 's and ij 's for simplicity of analysis. 1 ^ In this paper, we will find -approximation solutions of MAP and log-partition function: that is, x ^ ^ ^ and log Z such that: (1 - )H(x ) H(x) H(x ), (1 - ) log Z log Z (1 + ) log Z. Previous Work. The question of finding MAP (or ground state) comes up in many important application areas such as coding theory, discrete optimization, image denoising.Similarly, log-partition function is used in counting combinatorial objects loss-probability computation in computer networks, etc. Both problems are NP-hard for exact and even (constant) approximate computation for arbitrary graph G. However, applications require solving this problem using very simple algorithms. A plausible approach is as follows. First, identify wide class of graphs that have simple algorithms for computing MAP and log-partition function. Then, try to build system (e.g. codes) so that such good graph structure emerges and use the simple algorithm or else use the algorithm as a heuristic. Such an approach has resulted in many interesting recent results starting the Belief Propagation (BP) algorithm designed for Tree graph [1].Since there a vast literature on this topic, we will recall only few results. Two important algorithms are the generalized belief propagation (BP) [2] and the tree-reweighted algorithm (TRW) [3,4].Key properties of interest for these iterative procedures are the correctness of fixed points and convergence. Many results characterizing properties of the fixed points are known starting from [2]. Various sufficient conditions for their convergence are known starting [5]. However, simultaneous convergence and correctness of such algorithms are established for only specific problems, e.g. [6]. Finally, we discuss two relevant results. The first result is about properties of TRW. The TRW algorithm provides provable upper bound on log-partition function for arbitrary graph [3]However, to the best of authors' knowledge the error is not quantified. The TRW for MAP estimation has a strong connection to specific Linear Programming (LP) relaxation of the problem [4]. This was made precise in a sequence of work by Kolmogorov [7], Kolmogorov and Wainwright [8] for binary MRF. It is worth noting that LP relaxation can be poor even for simple problems. The second is an approximation algorithm proposed by Globerson and Jaakkola [9] to compute log-partition function using Planar graph decomposition (PDC). PDC uses techniques of [3] in conjunction with known result about exact computation of partition function for binary MRF when G is Planar and the exponential family has specific form. Their algorithm provides provable upper bound for arbitrary graph. However, they do not quantify the error incurred. Further, their algorithm is limited to binary MRF. Contribution. We propose a novel local algorithm for approximate computation of MAP and logpartition function. For any > 0, our algorithm can produce an -approximate solution for MAP and log-partition function for arbitrary MRF G as long as G excludes a finite graph as a minor (precise definition later). For example, Planar graph excludes K 3,3 , K5 as a minor. The running time of the algorithm is (n), with constant dependent on , the maximum vertex degree of G and the size of the graph that is excluded as minor. Specifically, for a Planar graph with bounded degree, it takes C ()n time to find -approximate solution with log log C () = O(1/). In general, our algorithm works for any G and we can quantify bound on the error incurred by our algorithm. It is worth noting that our algorithm provides a provable lower bound on log-partition function as well unlike many of previous works. The precise results for minor-excluded graphs are stated in Theorems 1 and 2. The result concerning general graphs are stated in the form of Lemmas 2-3-4 for log-partition and Lemmas 5-6-7 for MAP. Techniques. Our algorithm is based on the following idea: First, decompose G into small-size connected components say G 1 , . . . , Gk by removing few edges of G. Second, compute estimates (either MAP or log-partition) in each of G i separately. Third, combine these estimates to produce a global estimate while taking care of the effect induced by removed edges. We show that the error in the estimate depends only on the edges removed. This error bound characterization is applicable for arbitrary graph. Klein, Plotkin and Rao [10]introduced a clever and simple decomposition method for minorexcluded graphs to study the gap between max-flow and min-cut for multicommodity flows. We use their method to obtain a good edge-set for decomposing minor-excluded G so that the error induced in our estimate is small (can be made as small as required). 2 In general, as long as G allows for such good edge-set for decomposing G into small components, our algorithm will provide a good estimate. To compute estimates in individual components, we use dynamic programming. Since each component is small, it is not computationally burdensome. However, one may obtain further simpler heuristics by replacing dynamic programming by other method such as BP or TRW for computation in the components. 2 Preliminaries Here we present useful definitions and previous results about decomposition of minor-excluded graphs from [10,11]. Definition 1 (Minor Exclusion) A graph H is called minor of G if we can transform G into H through an arbitrary sequence of the following two operations: (a) removal of an edge; (b) merge two connected vertices u, v : that is, remove edge (u, v ) as well as vertices u and v ; add a new vertex and make all edges incident on this new vertex that were incident on u or v . Now, if H is not a minor of G then we say that G excludes H as a minor. The explanation of the following statement may help understand the definition: any graph H with r nodes is a minor of K r , where Kr is a complete graph of r nodes. This is true because one may obtain H by removing edges from K r that are absent in H . More generally, if G is a subgraph of G and G has H as a minor, then G has H as its minor. Let Kr,r denote a complete bipartite graph with r nodes in each partition. Then K r is a minor of K r,r . An important implication of this is as follows: to prove property P for graph G that excludes H , of size r, as a minor, it is sufficient to prove that any graph that excludes K r,r as a minor has property P. This fact was cleverly used by Klein et. al. [10] to obtain a good decomposition scheme described next. First, a definition. Definition 2 ((, )-decomposition) Given graph G = (V , E ), a randomly chosen subset of edges B E is called (, ) decomposition of G if the following holds: (a) For any edge e E , Pr(e B ) . (b) Let S1 , . . . , SK be connected components of graph G = (V , E \B ) obtained by removing edges of B from G. Then, for any such component S j , 1 j K and any u, v S j the shortest-path distance between (u, v ) in the original graph G is at most with probability 1. The existence of (, )-decomposition implies that it is possible to remove fraction of edges so that graph decomposes into connected components whose diameter is small. We describe a simple and explicit construction of such a decomposition for minor excluded class of graphs. This scheme was proposed by Klein, Plotkin, Rao [10] and Rao [11]. DeC(G, r, ) (0) Input is graph G = (V , E ) and r, N. Initially, i = 0, G 0 = G, B = . (1) For i = 0, . . . , r - 1, do the following. i i (b) For each S j , 1 j ki , pick an arbitrary node v j Sj . i Create a breadth-first search tree T ji rooted at vj in Sj . Choose a number L i uniformly at random from {0, . . . , - 1}. j i Let Bj be the set of edges at level L i , + Li , 2 + Li , . . . in Tji . j j j i i (a) Let S1 , . . . , Ski be the connected components of G i . i i Update B = B k=1 Bj . j (c) set i = i + 1. (3) Output B and graph G = (V , E \B ). As stated above, the basic idea is to use the following step recursively (upto depth r of recursion): in each connected component, say S , choose a node arbitrarily and create a breadth-first search tree, say T . Choose a number, say L, uniformly at random from {0, . . . , - 1}. Remove (add to B ) all edges that are at level L + k , k 0 in T . Clearly, the total running time of such an algorithm is O(r(n + |E |)) for a graph G = (V , E ) with |V | = n; with possible parallel implementation across different connected components. 3 The algorithm DeC(G, r, ) is designed to provide a good decomposition for class of graphs that exclude Kr,r as a minor. Figure 1 explains the algorithm for a line-graph of n = 9 nodes, which excludes K2,2 as a minor. The example is about a sample run of DeC(G, 2, 3) (Figure 1 shows the first iteration of the algorithm). G0 1 2 3 4 5 5 6 6 7 8 9 L1 1 =1 1 2 3 4 7 8 T11 9 G1 1 S1 2 3 4 5 S2 6 7 8 S3 9 Figure 1: The first of two iterations in execution of DeC(G, 2, 3) is shown. Lemma 1 If G excludes Kr,r as a minor, then algorithm DeC(G, r, ) outputs B which is (r/, O())-decomposition of G. It is known that Planar graph excludes K 3,3 as a minor. Hence, Lemma 1 implies the following. Corollary 1 Given a planar graph G, the algorithm DeC(G, 3, ) produces (3/, O())decomposition for any 1. 3 Approximate log Z Here, we describe algorithm for approximate computation of log Z for any graph G. The algorithm uses a decomposition algorithm as a sub-routine. In what follows, we use term D E C O M P for a generic decomposition algorithm. The key point is that our algorithm provides provable upper and lower bound on log Z for any graph; the approximation guarantee and computation time depends on the property of D E C O M P. Specifically, for K r,r minor excluded G (e.g. Planar graph with r = 3), we will use DeC(G, r, ) in place of D E C O M P. Using Lemma 1, we show that our algorithm based on DeC provides approximation upto arbitrary multiplicative accuracy by tuning parameter . L O G PA RT I T I O N(G) (1) Use D E C O M P(G) to obtain B E such that (a) G = (V , E \B ) is made of connected components S 1 , . . . , SK . (2) For each connected component S j , 1 j K , do the following: (a) Compute partition function Z j restricted to Sj by dynamic programming(or exhaustive computation). L U (3) Let ij = min(x,x )2 ij (x, x ), ij = max(x,x )2 ij (x, x ). Then ( ( jK jK L U ^ ^ log ZLB = log Zj + ij ; log ZUB = log Zj + ij . =1 i,j )B =1 i,j )B ^ ^ (4) Output: lower bound log ZLB and upper bound log ZUB . In words, L O G PA RT I T I O N(G) produces upper and lower bound on log Z of MRF G as follows: decompose graph G into (small) components S 1 , . . . , SK by removing (few) edges B E using D E C O M P(G). Compute exact log-partition function in each of the components. To produce bounds ^ ^ log ZLB , log ZUB take the summation of thus computed component-wise log-partition function along with minimal and maximal effect of edges from B . Analysis of L O G PA RT I T I O N for General G : Here, we analyze performance of L O G PA RT I T I O N for any G. In the next section, we will specialize our analysis for minor excluded G when L O G PA RT I T I O N uses DeC as the D E C O M P algorithm. 4 ^ ^ Lemma 2 Given an MRF G described by (1), the L O G PA RT I T I O N produces log ZLB , log ZUB such that ( U . L ^ ^ ^ ^ log ZLB log Z log ZUB , log ZUB - log ZLB = ij - ij | + It takes O E |K |S | TDECOMP time to produce this estimate, where |S | = maxK 1 |Sj | with j= D E C O M P producing decomposition of G into S 1 , . . . , SK in time TDECOMP . ( . U L Lemma 3 If G has maximum vertex degree D then, log Z D1 1 ij - ij i,j )E + Lemma 4 If G has maximum vertex degree D and the D E C O M P(G) produces B that is (, )decomposition, then l ^ ^ (D + 1) log Z, E og ZUB - log ZLB w.r.t. the randomness in B , and L O G PA RT I T I O N takes time O(nD|| D i,j )B ) + TD E C O M P . Analysis of L O G PA RT I T I O N for Minor-excluded G : Here, we specialize analysis of L O G PA R T I T I O N for minor exclude graph G. For G that exclude minor K r,r , we use algorithm DeC(G, r, ). Now, we state the main result for log-partition function computation. Theorem 1 Let G exclude K r,r as minor and have D as maximum vertex degree. Given > 0, use L O G PA RT I T I O N algorithm with DeC(G, r, ) where = r(D+1) . Then, l ^ ^ ^ ^ E og ZUB - log ZLB log ZLB log Z log ZUB ; log Z. Further, algorithm takes (nC (D, ||, )), where constant C (D, ||, ) = D|| DO(rD/) . We obtain the following immediate implication of Theorem 1. Corollary 2 For any > 0, the L O G PA RT I T I O N algorithm with DeC algorithm for constant degree ^ ^ Planar graph G based MRF, produces log ZLB , log ZUB so that ^ ^ (1 - ) log Z log ZLB log Z log ZUB (1 + ) log Z, in time O(nC ()) where log log C () = O(1/). 4 Approximate MAP Now, we describe algorithm to compute MAP approximately. It is very similar to the L O G PA R T I T I O N algorithm: given G, decompose it into (small) components S 1 , . . . , SK by removing (few) edges B E . Then, compute an approximate MAP assignment by computing exact MAP restricted to the components. As in L O G PA RT I T I O N, the computation time and performance of the algorithm depends on property of decomposition scheme. We describe algorithm for any graph G; which will be specialized for K r,r minor excluded G using DeC(G, r, ). M O D E(G) (1) Use D E C O M P(G) to obtain B E such that (a) G = (V , E \B ) is made of connected components S 1 , . . . , SK . (2) For each connected component S j , 1 j K , do the following: (a) Through dynamic programming (or exhaustive computation) find exact MAP x ,j for component S j , where x,j = (x,j )iSj . i (3) Produce output x , which is obtained by assigning values to nodes using x ,j , 1 j K . Analysis of M O D E for General G : Here, we analyze performance of M O D E for any G. Later, we will specialize our analysis for minor excluded G when it uses DeC as the D E C O M P algorithm. Lemma 5 G(iven an RF G described by (1), the M O D E algorithm produces ou+ uts x such that M tp | U L H(x ) - H(x ) H(x ). It takes O E |K |S | TDECOMP time to ij - ij i,j )B produce this estimate, where |S | = maxK 1 |Sj | with D E C O M P producing decomposition of G into j= S1 , . . . , SK in time TDECOMP . 5 Lemma 6 If G has maximum vertex degree D, then 1 ( 1 ( U H(x ) ij D+1 D+1 i,j )E U L ij - ij . i,j )E Lemma 7 If G has maximum vertex degree D and the D E C O M P(G) produces B that is (, )decomposition, then H E (x ) - H(x ) (D + 1)H(x ), where expectation is w.r.t. the randomness in B . Further, M O D E takes time O(nD|| D ) + TDECOMP . Analysis of M O D E for Minor-excluded G : Here, we specialize analysis of M O D E for minor exclude graph G. For G that exclude minor K r,r , we use algorithm DeC(G, r, ). Now, we state the main result for MAP computation. Theorem 2 Let G exclude K r,r as minor and have D as the maximum vertex degree. Given > 0, use M O D E algorithm with DeC(G, r, ) where = r(D+1) . Then, (1 - )H(x ) H(x ) H(x ). Further, algorithm takes n · C (D, ||, ) time, where constant C (D, ||, ) = D|| DO(rD/) . We obtain the following immediate implication of Theorem 2. Corollary 3 For any > 0, the M O D E algorithm with DeC algorithm for constant degree Planar graph G based MRF, produces estimate x so that (1 - )H(x ) H(x ) H(x ), in time O(nC ()) where log log C () = O(1/). 5 Experiments Our algorithm provides provably good approximation for any MRF with minor excluded graph structure, with planar graph as a special case. In this section, we present experimental evaluation of our algorithm for popular synthetic model. Setup 1.2 Consider binary (i.e. = {0, 1}) MRF on an n × n lattice G = (V , E ): 0 Pr(x) exp @ X iV i x i + X (i,j )E 1 ij xi xj A , for x {0, 1}n . 2 Figure 2 shows a lattice or grid graph with n = 4 (on the left side). There are two scenarios for choosing parameters (with notation U [a, b] being uniform distribution over interval [a, b]): (1) Varying interaction. i is chosen independently from distribution U [-0.05, 0.05] and ij chosen independent from U [-, ] with {0.2, 0.4, . . . , 2}. (2) Varying field. ij is chosen independently from distribution U [-0.5, 0.5] and i chosen independently from U [-, ] with {0.2, 0.4, . . . , 2}. The grid graph is planar. Hence, we run our algorithms L O G PA RT I T I O N and M O D E, with decomposition scheme DeC(G, 3, ), {3, 4, 5}. We consider two measures to evaluate performance: 1 1 error in log Z , defined as n2 | log Z alg - log Z |; and error in H(x ), defined as n2 |H(xalg - H(x )|. We compare our algorithm for error in log Z with the two recently very successful algorithms ­ Tree re-weighted algorithm (TRW) and planar decomposition algorithm (PDC). The comparison is plotted in Figure 3 where n = 7 and results are averages over 40 trials. The Figure 3(A) plots error with respect to varying interaction while Figure 3(B) plots error with respect to varying field 2 Though this setup has i , ij taking negative values, they are equivalent to the setup considered in the paper as the function values are lower bounded and hence affine shift will make them non-negative without changing the distribution. 6 strength. Our algorithm, essentially outperforms TRW for these values of and perform very competitively with respect to PDC. The key feature of our algorithm is scalability. Specifically, running time of our algorithm with a given parameter value scales linearly in n, while keeping the relative error bound exactly the same. To explain this important feature, we plot the theoretically evaluated bound on error in log Z in Figure 4 with tags (A), (B) and (C). Note that error bound plot is the same for n = 100 (A) and n = 1000 (B). Clearly, actual error is likely to be smaller than these theoretically plotted bounds. We note that these bounds only depend on the interaction strengths and not on the values of fields strengths (C). Results similar to of L O G PA RT I T I O N are expected from M O D E. We plot the theoretically evaluated bounds on the error in MAP in Figure 4 with tags (A), (B) and (C). Again, the bound on MAP relative error for given parameter remains the same for all values of n as shown in (A) for n = 100 and (B) for n = 1000. There is no change in error bound with respect to the field strength (C). Setup 2. Everything is exactly the same as the above setup with the only difference that grid graph is replaced by cris-cross graph which is obtained by adding extra four neighboring edges per node (exception of boundary nodes). Figure 2 shows cris-cross graph with n = 4 (on the right side). We again run the same algorithm as above setup on this graph. For cris-cross graph, we obtained its graph decomposition from the decomposition of its grid sub-graph. graph Though the cris-cross graph is not planar, due to the structure of the cris-cross graph it can be shown (proved) that the running time of our algorithm will remain the same (in order) and error bound will become only 3 times weaker than that for the grid graph ! We compute these theoretical error bounds for log Z and MAP which is plotted in Figure 5. This figure is similar to the Figure 4 for grid graph. This clearly exhibits the generality of our algorithm even beyond minor excluded graphs. References [1] J. Pearl, "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference," San Francisco, CA: Morgan Kaufmann, 1988. [2] J. Yedidia, W. Freeman and Y. Weiss, "Generalized Belief Propagation," Mitsubishi Elect. Res. Lab., TR2000-26, 2000. [3] M. J. Wainwright, T. Jaakkola and A. S. Willsky, "Tree-based reparameterization framework for analysis of sum-product and related algorithms," IEEE Trans. on Info. Theory, 2003. [4] M. J. Wainwright, T. S. Jaakkola and A. S. Willsky, "MAP estimation via agreement on (hyper)trees: Message-passing and linear-programming approaches," IEEE Trans. on Info. Theory, 51(11), 2005. [5] S. C. Tatikonda and M. I. Jordan, "Loopy Belief Propagation and Gibbs Measure," Uncertainty in Artificial Intelligence, 2002. [6] M. Bayati, D. Shah and M. Sharma, "Maximum Weight Matching via Max-Product Belief Propagation," IEEE ISIT, 2005. [7] V. Kolmogorov, "Convergent Tree-reweighted Message Passing for Energy Minimization," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006. [8] V. Kolmogorov and M. Wainwright, "On optimality of tree-reweighted max-product message-passing," Uncertainty in Artificial Intelligence, 2005. [9] A. Globerson and T. Jaakkola, "Bound on Partition function through Planar Graph Decomposition," NIPS, 2006. [10] P. Klein, S. Plotkin and S. Rao, "Excluded minors, network decomposition, and multicommodity flow," ACM STOC, 1993. [11] S. Rao, "Small distortion and volume preserving embeddings for Planar and Euclidian metrics," ACM SCG, 1999. Grid Cris Figure 2: Example of grid graph (left) and cris-cross graph (right) with n = 4. 7 (1-A) Grid, N=7 TRW PDC 3 4 1-B) Gird, n=7 T RW PDC 3 4 5 Z Error igure 3: 0 .9 0 .8 0 .7 Z Error F 5 ( I F nteraction Strength ield Strength Comparison of TRW, PDC and our algorithm for grid graph with n = 7 with respect to error in log Z . Our algorithm outperforms TRW and is competitive with respect to PDC. (2-A) Grid, n=100 0.9 0.8 0.7 (2-B) Grid, n=1000 2.5 (2-C) Grid, n=1000 5 2 5 0 10 2 Z Error Bound Z Error Bound Z Error Bound MAP Error Bound F 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6 1 .8 2 0.6 0.5 0.4 0.3 0.2 0.1 0 0. 2 0 .4 0 .6 0.8 1 1. 2 1 .4 1.6 1.8 2 1.5 1 0.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Interaction Strength Interaction Strength (3-B) Grid, n=1000 0.9 2 .5 (3-A) Grid, n=100 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Field Strength (3-C) Grid, n=1000 2 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6 1 .8 2 0 .5 2 MAP Error Bound MAP Error Bound 0 10 1 .5 1 Interaction Strength Interaction Strength Field Strength The theoretically computable error bounds for log Z and MAP under our algorithm for grid with n = 100 and n = 1000 under varying interaction and varying field model. This clearly shows scalability of our algorithm. igure 4: 2.5 (4-A) Cris Cross, n=100 2 .5 (4-B) Cris Cross, n=1000 0.6 (4-C) Cris Cross, n=1000 5 2 2 2 0.5 ( 1.5 0 10 Z Error Bound Z Error Bound 0.4 Z Error Bound MAP Error Bound F 1 .5 0.3 1 1 0.2 0 .5 0.5 0.1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0 .4 0 .6 0. 8 1 1.2 1 .4 1 .6 1. 8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 5-A) Criss Cross, n=100 2.5 Interaction Strength 2.5 (5-B) Cris Cross, n=1000 Interaction Strength 0.6 0.5 (5-C) Cris Cross, n=1000 Field Strength 5 2 2 2 MAP Error Bound MAP Error Bound 0 1.5 10 1.5 0.4 0.3 0.2 0.1 0 1 1 0.5 0.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Interaction Strength Interaction Strength Field Strength The theoretically computable error bounds for log Z and MAP under our algorithm for cris-cross with n = 100 and n = 1000 under varying interaction and varying field model. This clearly shows scalability of our algorithm and robustness to graph structure. igure 5: 8