Orbit-Product Representation and Correction of Gaussian Belief Propagation Jason K. Johnsona jasonj@lanl.gov Vladimir Y. Chernyakb,a chernyak@chem.wayne.edu Michael Chertkova chertkov@lanl.gov a Theoretical Division and Center for Nonlinear Studies, LANL, Los Alamos, NM 87545, USA b Department of Chemistry, Wayne State University, Detroit, MI 48202, USA Abstract We present a new view of Gaussian belief propagation (GaBP) based on a representation of the determinant as a product over orbits of a graph. We show that the GaBP determinant estimate captures totally backtracking orbits of the graph and consider how to correct this estimate. We show that the missing orbits may be grouped into equivalence classes corresponding to backtrackless orbits and the contribution of each equivalence class is easily determined from the GaBP solution. Furthermore, we demonstrate that this multiplicative correction factor can be interpreted as the determinant of a backtrackless adjacency matrix of the graph with edge weights based on GaBP. Finally, an efficient method is proposed to compute a truncated correction factor including all backtrackless orbits up to a specified length. 2008; Moallemi & Van Roy, 2009)), which also provides new insights into the algorithm by interpreting it as computing weighted sums of walks (walk-sums) within the graph. Our aim in this present paper is to extend this graphical/combinatorial view of GaBP to include estimation of the determinant (partition function) of the Gaussian graphical model. This work is also inspired by the loop-series correction method for belief propagation (Chertkov & Chernyak, 2006) that was recently extended to Gaussian graphical models (Chernyak & Chertkov, 2008). Our present study leads to a new perspective on GaBP having close ties to graphical zeta functions (Stark & Terras, 1996). We find that for walk-summable models the determinant may be represented as a product over all orbits (cyclic walks) of the graph. The estimate of the determinant provided by GaBP only captures totally backtracking orbits, which can be embedded as orbits in the computation tree (universal cover) of the graph. The missing orbits may then be grouped into equivalence classes corresponding to backtrackless orbits. The orbit-product over each such equivalence class may be simply computed from the solution of GaBP. Also, the product over all backtrackless orbits may be interpreted as the determinant of a backtrackless adjacency matrix of the graph with appropriately defined edge weights based on the GaBP solution. Finally, we propose a simple, efficient method to compute truncated orbit-products including all orbits up to some specified length and provide an error-bound on the resulting estimates. In certain classes of graphs (e.g., grids), this leads to an efficient method with complexity linear in the number of nodes and the required precision of the determinant estimate. This paper differs fundamentally from (Chernyak & Chertkov, 2008) in that we rely heavily on the walksummable property to develop multiplicative expansions using infinitely many orbits of the graph, whereas 1. Introduction Belief Propagation is a widely used method for inference in graphical models. We study this algorithm in the context of Gaussian graphical models. There have been several studies of Gaussian belief propagation (GaBP) (Weiss & Freeman, 2001; Rusmevichientong & Van Roy, 2001; Plarre & Kumar, 2004) as well as numerous applications (Moallemi & Van Roy, 2006; Bickson et al., 2008b; Bickson et al., 2008a). The best known sufficient condition for its convergence is the walk-summable condition (Johnson et al., 2006; Malioutov et al., 2006) (see also (Cseke & Heskes, Appearing in Proceedings of the 26 th International Conference on Machine Learning, Montreal, Canada, 2009. Copyright 2009 by the author(s)/owner(s). Orbit-Products & Gaussian BP (Chernyak & Chertkov, 2008) develops additive expansions over a finite number of "generalized loops" (which may be disconnected) using methods of Grassman calculus. Our present approach leads naturally to approximation methods (with accuracy guarantees in walk-summable models) based on truncated orbitproducts. { L|() = } for (L). In particular, L denotes the class of totally backtracking orbits, which plays a special role in our interpretation of GaBP. We will use to denote a generic orbit and reserve to denote irreducible orbits. Example. Orbits [1231], [1231451], [123421561] are backtrackless; [1234321], [1232421], [1231321] are totally backtracking; [123241] is both reducible and nontrivial (neither backtrackless nor totally backtracking). 2.2. Gaussian Belief Propagation A Gaussian graphical model is a probability distribution 1 (1) p(x) = Z -1 exp - 2 xT Jx + hT x 2. Preliminaries 2.1. Walks and Orbits of a Graph Let G be a graph on vertices (nodes) V = {1, . . . , n} with undirected edges {i, j} G. We may also treat each undirected edge {i, j} as a symmetric pair of directed edges (ij) and (ji). A walk w is a sequence of adjacent vertices (w0 . . . wL ) (wt V for t = 0, . . . , L and {wt , wt+1 } G for t = 1, . . . , L - 1) where |w| L is the length of the walk. A walk may be equivalently specified as a sequence of steps w = ((w0 w1 )(w1 w2 ) . . . (wL-1 wL )) such that each step is a (directed) edge of the graph that ends where the following step begins. A walk may visit the same node or cross the same edge multiple times and may also backtrack --that is, it may step back to the preceding vertex. A walk is closed if it begins and ends at the same node w0 = wL . A closed walk is primitive if it is not a multiple of some shorter walk (e.g., the walk (1231) is primitive but (1231231) is not). We define an orbit = [w] to be an equivalence class of closed primitive walks, where two walks are considered equivalent [w] = [w ] if one is a cyclic shift of the other (i.e., if wt = wt+s(mod ) for some s and t = 1, . . . , L). Hence, there is a one-to-one correspondence between orbits and (non-terminating) cyclic walks. The following classification of walks and orbits plays an essential role in our analysis: A walk (or orbit) is said to be reducible if it contains a backtracking pair of consecutive steps . . . (ij)(ji) . . . , otherwise the walk is irreducible. By repeatedly deleting backtracking pairs until none remain one obtains the (unique) irreducible core = (w) of the walk w. For closed walks it may happen that = , where () denotes the trivial (empty) walk. We then say that the walk is totally reducible. We say that a walk is non-trivial if it is not totally reducible. Totally reducible walks have been called backtracking (Malioutov et al., 2006), although totally backtracking is perhaps a better description. Irreducible walks have been called backtrackless (or nonbacktracking) elsewhere in the literature. Notation. L denotes the set of all orbits of G, (L) denotes the irreducible (backtrackless) orbits, and we partition L into (disjoint) equivalence classes L of random variables x Rn where J is a sparse, symmetric, positive-definite matrix. The fill-pattern of J defines a graph G with vertices V = {1, . . . , n} and edges (ij) for all Jij = 0. The partition function is defined by Z(h, J) exp{- 1 xT Jx + hT x}dx = 2 1/2 1 T -1 (2)n det J -1 e 2 h J h so as to normalize the distribution. Given such a model, we may compute the mean vector µ p(x)xdx = J -1 h and covariance matrix K p(x)(x - µ)(x - µ)T dx = J -1 . This generally requires O(n3 ) computation in dense graphs using Gaussian elimination. If G is sparse and only certain elements of K are required (the diagonal and edge-wise covariances), then the complexity of Gaussian elimination may be substantially reduced (e.g., O(n3/2 ) in planar graphs using nested dissection) but still generally has complexity growing as O(w3 ) in the tree-width w of the graph. Gaussian belief propagation (GaBP) is a simple, distributed, iterative message-passing algorithm to estimate the marginal distribution p(xi ) of each variable, which is specified by its mean µi and variance Kii . GaBP is parameterized by a set of messages 2 1 mij (xj ) = e 2 ij xj +ij xj defined on each directed edge (ij) of the graph (mij is regarded as a message being passed from i to j). The GaBP equations are: mij (xj ) i (xi ) ki\j mki (xi )ij (xi , xj )dxi where i = ij = e-Jij xi xj and i denotes the set of neighbors of i in G. This reduces to the following rules for computing (, )-messages: ij ij 2 = Jij (Jii - i\j )-1 1 2 e- 2 Jii xi +hi xi , = -Jij (Jii - i\j )-1 (hi + i\j ) (2) where i\j ki\j ki . ki\j ki and i\j These equations are solved by iteratively recomputing Orbit-Products & Gaussian BP each message from the other messages until convergence. The marginal distribution is then estimated as pbp (xi ) i (xi ) ki mki (xi ), which gives variance bp estimates Ki = (Jii - k ki )-1 and mean estimates bp bp µi = Ki (hi + k ki ). In trees, this method is equivalent to Gaussian elimination, terminates after a finite number of steps and then provides the correct marginals. In loopy graphs, it may be viewed as performing Gaussian elimination in the computation tree (universal cover) of the graph (Plarre & Kumar, 2004; Malioutov et al., 2006) (obtained by "unrolling" loops) and may therefore fail to converge due to the infinite extent of the computation tree. If it does converge, the mean estimates are still correct but the variances are only approximate. We also obtain an estimate of the pairwise covariance matrix on edges {i, j} G: bp K(ij) = Jii - i\j Jij Jij Jjj - j\i -1 where (R) denotes the spectral radius of the matrix R (the maximum modulus of the eigenvalues of R). This allows us to interpret Kij as a sum over all walks in the graph G which begin at node i and end at node j where the weight of a walk is defined nij (w) and nij (w) is a count of as Rw = (ij)w rij how many times step (ij) occurs in the walk. We write this walk-sum as Kij = w:ij Rw . However, in order for the walk-sum to be well-defined, it must converge to the same value regardless of the order in which we add the walks. This is equivalent to requiring that it converges absolutely. Thus, we say that R is walk-summable if w:ij |Rw | converges for all i, j V . This is equivalent to the spectral condition that (|R|) < 1 where |R| (|rij |) is the element-wise absolute-value matrix of R. A number of other equivalent or sufficient conditions are given in (Malioutov et al., 2006). In walk-summable models it then holds that variances correspond to closed walk-sums Kii = w:ii Rw and means correspond to a (reweighted) walk-sum over all walks which end at a specific node µi = w:i h Rw (here denotes the arbitrary starting point of the walk). Moreover, we may interpret the GaBP message parameters (, ) as recursively computing walksums within the computation tree (Malioutov et al., 2006). This implies that GaBP converges in walksummable models and converges to the same "walksum" solution independent of the order in which we update messages. This interpretation also shows that GaBP computes the correct walk-sums for the means but only computes a subset of the closed walks needed bp for the variances. Specifically, Ki only includes totally backtracking walks at node i. This is seen as a walk is totally backtracking if and only if it can be embedded as a closed walk in the computation tree of the graph and it is these closed walks of the computation tree which GaBP captures in its variance estimates. In this paper, we are concerned with the GaBP estimate of the determinant Z = det K = det J -1 (which is closely linked to computation of the partition function Z). We obtain an estimate of Z from the GaBP solution as: Z bp = iV bp Zi {i,j}G bp Zij bp bp Zi Zj (3) bp bp bp bp where Zi = Ki and Zij = det K(ij) . The motivation for this form of estimate is that it becomes exact if G is a tree. In loopy graphs, there may generally be no stable solution to GaBP (or multiple unstable solutions). The main objective of this paper is to interpret the estimate Z bp in the context of walksummable models (described below), for which there is a well-defined stable solution, and to suggest methods to correct this estimate. Note that the variance and covariance estimates (and hence the determinant estimate Z bp ) are independent of h and the -messages in (2), they are determined solely by the -messages determined by J. Since GaBP correctly computes the means in walk-summable models, we are mainly concerned with how to correct Z bp (and hence its derivatives, which correspond to the GaBP estimates of variances/covariances). 3. Orbit-Product Interpretation of Gaussian BP 3.1. Determinant Z as Orbit-Product Let Z(R) det(I - R)-1 . In walk-summable models, we may give this determinant another graphical interpretation as a product over orbits of a graph, one closely related to the so-called zeta function of a graph (Stark & Terras, 1996). Theorem 1 If (|R|) < 1 then it holds that Z(R) = -1 (1 - R ) Z where the product is taken over n () all orbits of G and R = (ij) rijij where nij () is Walk-Sum Interpretation Our approach in this paper may be considered as as extension of the walksum interpretation of GaBP (Malioutov et al., 2006). Let J be normalized to have unit-diagonal, such that J = I - R with R having zeros along its diagonal. The walk-sum idea is based on the series K = k (I - R)-1 = k R , which converges if (R) < 1 Orbit-Products & Gaussian BP the number of times step (ij) occurs in orbit . Proof. log det(I - R)-1 = tr log(I - R)-1 = tr k R k (Rw )m Rw = = = closed w |w| primitive w m=1 m|w| 1 w -1 = orbits log(1 - primitive w |w| log(1 - R ) R )-1 = log (1 - R )-1 . We have used the identity log det A = tr log A and the series expansion Ak log(I - A)-1 = k=1 k . Each closed walk is expressed as a multiple of a primitive walk. Every primitive walk w has exactly |w| distinct cyclic shifts. We emphasize that (|R|) < 1 is necessary to insure that the the orbit-products we consider are welldefined. This condition is assumed throughout the remainder of the paper. 3.2. Z bp as Totally Backtracking Orbits Totally backtracking walks play an important role in the walk-sum interpretation of the GaBP variance estimates. We now derive an analogous interpretation of Z bp defined by (3): Theorem 2 Z bp = L Z where the product is taken over the set of totally backtracking orbits of G. Although this result seems intuitive in view of prior work, its proof is non-trivial involving arguments not used previously. To prove the theorem, we first summarize some useful lemmas. Consider a block matrix A = (A11 A12 ; A21 A22 ). The Schur complement of block A11 is A A22 - A21 A-1 A21 . It holds 22 11 that det A = det A11 det A and (A )-1 = (A-1 )22 . 22 22 Using these well-known identities, it follows: Lemma 1 Let R = (R11 R12 ; R21 R22 ) and K = (I - Z(R) R)-1 = (K11 K12 ; K21 K22 ). Then det K11 = Z(R22 ) . For walk-summable models, we then have det K11 = Z = G2 Z G k bp Zij = Tij |i or j Z where the product is over all orbits of Tij that include either endpoint of the marked edge {i, j}. Proof of Theorem 2. Using Lemma 2 and the correspondence between orbits of the computation tree and totally backtracking orbits of G, we may expand (3) to express Z bp entirely as a product over totally N where N is backtracking orbits Z bp = L Z the count of how many times appears in the orbitproduct--the number of times it appears in the numerator of (3) minus the number of times in appears in the denominator. It remains to show that N = 1 for each totally backtracking orbit. This may be seen by considering the subtree T of the computation tree T traced out by orbit . Let v and e respectively denote the number of nodes and edges of T (hence, e = v - 1) and let c denote the number of edges of T with exactly one endpoint in T . First, we count how many powers bp of Z appear in the orbit product i Zi . For each vertex i T we may pick this as the marked node in the computation tree and this shows one way that can be embedded in Ti so as to include its marked node. Thus, v gives the total number of multiples bp of Z in i Zi . Similarly, we could mark any edge {i, j} T with one or both endpoints in T and this gives one way to embed into Tij so as to intersect the bp marked edge. Thus, the product ij Zij contributes bp bp e + c powers of Z . Lastly, the product ij Zi Zj contains 2e + c powers of Z . This represents the number of ways we may pick a directed edge (ij) of T such that at least one endpoint is in T . The total count is then N = v + (e + c) - (2e + c) = v - e = 1. Combining Theorems 1 and 2, we obtain the following orbit-product correction to Z bp : Corollary 1 Z = Z bp × L Z . Z G| intersects G1 where the final orbit-product is taken over all orbits of G which include any node of subgraph G1 (corresponding to submatrix R11 ). Next, using this result and the interpretation of GaBP as inference on the computation tree, we are led to the following interbp bp pretation of the quantities Zi and Zij appearing in (3). Let Ti denote the computation tree of the graph G with one copy of node i marked. Let Tij denote the computation tree with one copy of edge {i, j} G marked. Then, bp Lemma 2 Zi = Ti |i Z where the product is over all orbits of Ti that include the marked node i. This formula includes a correction for every missing orbit, that is, for every non-trivial orbit. This implies that Z = Z bp for trees since all orbits of trees are totally backtracking. 3.3. Z bp Error Bound One useful consequence of the orbit-product interpretation of Z bp is that it provides a simple error bound on GaBP. Let g denote the girth of the graph G, defined as the length of the shortest cycle of G. We note that the missing orbits L must all have length greater than or equal to g. Then, Theorem 3 1 n log Z bp Z (|R|)g g(1-(|R|)) . Orbit-Products & Gaussian BP Proof. log ng g Z Z bp We derive the chain of inequalities: (a) G = K4 r14 4 r24 r34 1 r12 r13 2 r23 3 r31 2 3 4 4 1 3 r23 1 2 4 1 4 2 = L log Z -1 (c) ||g log(1 - |R| ) k=0 = tr k = |R| )-1 . that tr tr | log Z | = ng g(1-) . (R )k k=1 k |R|k k=g k ||g | log Z | (b) (a) Corollary 1. (|R| )k k=1 k n k k=g k (b) 1 3 2 r13 4 2 (a) G[(12)(23)(31)] |R|k kg k (c) The proof of Theorem 1 shows Rk -1 = . Similarly, log(1 - R ) k1 k = -1 . ||g log(1 - |R| ) = log(1 - r14 4 3 r12 3 1 This is consistent with the usual intuition that belief propagation is most accurate in large girth graphs with weak interactions. (b) T1\2 T2\3 r31 T3\1 1 r12 2 r23 3 4. Backtrackless Orbit Correction In this section we show that the set of orbits omitted in the GaBP estimate can be grouped into equivalence classes corresponding to backtrackless orbits and that the orbit-product over each such equivalence class is simply computed with the aid of the GaBP solution: Theorem 4 Z = Z bp × = Z where the product is over all backtrackless orbits of G and we define Z = (1 - r (rij )nij () )-1 (ij) ki\j ki is com- (c) 1\2 = 31 + 41 2\3 = 12 + 42 3\1 = 23 + 43 31 r31 = 1- 3\1 r 1 r 2 12 r12 = 1- 1\2 23 r23 = 1- 2\3 r 3 (d) Figure 1. Illustration of construction to combine equivalent orbits. (a) The graph G = K4 . (b) The computation graph G for = [(12)(23)(31)]. (c) Finite graph with self-loops at each node to capture totally backtracking walks. (d) Equivalent graph with modified edge weights to capture totally backtracking walks. ij where rij 1-i\j and i\j = puted from the solution of GaBP. In comparison to Corollary 1, here the correction factor is expressed as an orbit-product over just the backtrackless orbits (whereas Corollary 1 uses a separate correction for each non-trivial orbit). However, all orbits are still correctly accounted for because we modify the edge weights of the graph so as to include a factor (1 - i\j )-1 (computed by GaBP) which serves to "factor in" totally-backtracking excursions at each point along the backtrackless orbit, thereby generating all non-trivial orbits. The basic idea underlying this construction is depicted in Figure 1. For each backtrackless orbit we define an associated computation graph G as follows. First, we start with a single directed cycle based on = [1 2 · · · L ] (any duplicated nodes of the orbit map to distinct nodes in this directed cycle). Then, for each node k of this graph, we attach a copy of the computation tree Tk \k+1 , obtained by taking the full computation tree Tk rooted at node k and deleting the branch (k , k+1 ) incident to the root. This construction is illustrated in Figure 1(a,b) for the graph G = K4 and orbit = [(12)(23)(31)]. The cycle has "one-way" directed edges whereas each computation tree has "two-way" undirected edges. This is understood to mean that walks are allowed to backtrack within the computation tree but not within the cycle. The importance of this graph is based on the following lemma (the proof is omitted): Lemma 3 Let be a backtrackless orbit of G. Then, there is a one-to-one correspondence between the class of orbits L of G and the non-trivial orbits of G . Next, we demonstrate how to compute all of the orbits within an equivalence class as a simple determinant calculation based on the backtrackless orbit and the GaBP solution. Let R be defined as the edge-weight matrix of a simple single-loop graph based on with r ,k+1 edge-weights defined by rk ,k+1 = 1-k \ . This k k+1 construction is illustrated in Figure 1(d). Then, Lemma 4 Z = det(I - R )-1 = L Z . Proof. Using Lemma 3, we see that the orbit-product L Z is equal to the product over all non-trivial orbits of the graph G , that is, the product over all orbits in G which intersect the subgraph corresponding to . Using Lemma 1, this is equivalent to computing the determinant of the corresponding submatrix Orbit-Products & Gaussian BP of KG = (I - RG )-1 where RG is the edge-weight matrix of the computation graph. This is equivalent to first eliminating each computation tree (by Gaussian elimination/GaBP) attached to each node of to obtain a reduced graphical model I - R and then computing det(I - R )-1 . Using the GaBP solution, the effect of eliminating each computation tree is to add a "self-loop" (diagonal element) to R with edgeweight k \k+1 = v=k+1 v,k , obtained by summing the incoming messages to node k from each of its neighbors in the subtree Tk \k+1 . This elimination step is illustrated if Figure 1(b,c). We may use the orbit-product formula to compute the determinant. However, there are infinitely many orbits in this graph due to the presence of a self-loop at each of the remaining nodes. At each node, an orbit may execute any number of steps m around this self-loop each with edge-weight k \k+1 . Summing these, we m -1 obtain . Hence, m=0 k \k+1 = (1 - k \k+1 ) we can delete each self-loop and multiply the following edge's weight by (1 - k \k+1 )-1 and this preserves the value of the determinant. This final reduction step is illustrated in Figure 1(c,d). Then, the orbit-product L Z is equal to det(I -R )-1 (e.g., based on the graph seen in Figure 1(d)). It is straightforward to compute the resulting determinant with respect to the single (directed) cycle graph with edge weights R . There is only one orbit in this graph and hence det(I - R )-1 = Z (1 - (R ) )-1 where nij () (R ) = (ij) (rij ) and rij = rij (1 - i\j )-1 . Proof of Theorem 4. Using these results, it is now simple to show ZZ = L Z = = L Z = bp det(I - R )-1 = = (1 - (R ) )-1 . = 21 1 2 3 12 14 41 54 4 5 6 45 47 74 87 7 8 9 58 85 25 52 32 23 36 65 63 56 69 98 96 (a) (b) 78 89 Figure 2. (a) 3 × 3 grid G. (b) Graph G representing the backtrackless adjacency matrix R . Each node ij represents a directed edge of G, directed edges are drawn between nodes ij and jk which are non-backtracking (k = i). backtrackless walks of the graph G. The weight of an edge ((ij)(jk)) in R is defined as the (modified) edge weight rjk of the endpoint (jk). The weight of an orbit in R may then be equivalently defined as the product of node weights rij taken over the orbit in R , which is equal to the weight of the corresponding backtrackless orbit of G (using the modified edge weights rij ). Theorem 5 Z = Z bp × Z where Z det(I - R )-1 , that is, det(I - R)-1 = Z bp × det(I - R )-1 . Before providing the proof, we establish that walk-summability with respect to R implies walksummability with respect to R : Lemma 5 If (|R|) < 1 then (|R |) (|R|). Proof. Once the -parameters converge, the parameters follow a linear system k+1 = R k + b (Moallemi & Van Roy, 2009). Hence, the asymptotic convergence rate of GaBP is (R ). Compare this to the Gauss-Jacobi (GJ) iteration µk+1 = µk + (h - k+1 t Jµk ) = t=0 R h (µ0 = 0), which has convergence rate (R). It is clear that the GaBP iteration captures a superset of those walks computed by GJ at each iteration (because the depth-k computation tree includes all k-length walks). Hence, for non-negative models (R 0 and h 0) it must hold that the error in the GaBP estimate of µ is less than or equal to the error of GJ (at every iteration). This implies (R ) (R) if R 0, from which we conclude (|R |) (|R|) in walk-summable models. Proof of Theorem 5. By construction, there is a oneto-one weight-preserving correspondence between orbits of G and backtrackless orbits of G. The result then follows from the orbit-product representation of Z over G (Theorem 1, Lemma 5), which is equivalent to the backtrackless orbit-product of Theorem 4. One useful consequence of this result is that the error bp 1 bound of Theorem 3 can be improved to n log ZZ = 5. Backtrackless Determinant Correction Next, we show that the correction factor Z = Z bp Z = = = det(I - R )-1 may also be calculated as a single determinant based on the following backtrackless adjacency matrix of the graph. We define R R2|G|×2|G| as follows. Let the rows and columns of R be indexed by directed edges (ij) of the graph G. Then, the elements of R are defined R(ij),(kl) = rkl , 0, j = k and i = l otherwise. (4) This construction is illustrated in Figure 2. Note that the walks generated by taking powers R correspond to Orbit-Products & Gaussian BP 1 n |log Z | (|R |)g g(1-(|R |)) . 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 (|R|) (|R|) 0.25 0.2 It is impractical to compute the complete correction factor Z = det(I - R )-1 , as this is not easier than directly computing Z = det(I -R)-1 . However, because R is itself walk-summable, we can use this representation as a starting point for constructing approximate corrections such as the one considered in the next section. n-1 log Ztrue n-1 log Z bp n-1 log Z (L=2,4,8,...) B 0.15 0.1 0.05 0.05 0.1 r 0.15 0.2 0.25 0 0 0.05 0.1 r 0.15 0.2 0.25 (a) 0.25 0.2 (b) 10 -2 6. Block-Resummation Method Next, we consider an efficient method to approximate Z(A) = det(I - A)-1 for walk-summable models (|A|) < 1. This method can be used to either directly approximate Z(R) (A = R) or to approximate the GaBP-correction Z (A = R ). Given a graph G based on vertices V , we specify a set of blocks B = (Bk V, k = 1, . . . , |B|) chosen such that: (1) Every short orbit || < L is covered by some block B B, and (2) If B, B B then B B B. We also define block weights wB as follows: wB = 1 for maximal blocks (not contained by another block) and wB = 1 - B B wB for non-maximal blocks (these weights may be negative). This insures that B B wB = 1 for each B B. Then, we define our estimate ZB B w ZB B B n-1 log Ztrue -1 n log Zbp -1 n log Zbp ZB 10 -4 0.15 10 -6 0.1 10 -8 0.05 10 -10 n-1|log Z-1 Z | true B n-1|log Z-1 Z Z | true L bp 20 B 0 0 0.05 0.1 r 0.15 0.2 0.25 10 -12 5 10 15 25 30 (c) (d) Figure 3. Demonstration of determinant approximation method for 256 × 256 periodic grid with uniform edge weights r (0, .25). Plots of (a) (|R|) and (|R |) vs 1 r; (b) ( n log of) Z, Z bp and ZB (with L = 2, 4, 8, 16, 32) 1 1 vs r; (c) n log(Z bp ZB ) vs r; and (d) n | log(Z -1 ZB )| and -1 bp 1 | log(Z Z ZB )| vs L = 2, 4, 8, 16, 32 for r = .23. In (b) n and (c) the estimates for L = 8, 16, 32 are all nearly exact (and therefore hard to distinguish in the plot) and the errors are largest near the walk-summable threshold r = .25. Estimates (b) are not based on GaBP and are actually worst than Z bp for L = 2. However, the GaBP-corrections (c) are strictly better than Z bp . (det(I - AB )-1 )wB (5) where AB denotes the |B| × |B| principle submatrix of A corresponding to B. This approximation method is similar in spirit to approximations used elsewhere (e.g., Kikuchi approximations to free-energy (Yedidia et al., 2005)). However, the new insights offered by the orbit-product view allows us to give our estimate a precise interpretation in walk-summable Gaussian models: BB LB Theorem 6 ZB = LB Z where LB and LB is the set of all orbits covered by B. w Proof. ZB = B B Z B = LB Z B = Z where B wB = 1 follows from the defiLB nition of the block weights. P wB determinant as the parameter L is made large with error decaying exponentially in L. The estimate ZB (R) includes all orbits that are covered by some block. The improved GaBP-based estimate Z bp ZB (R ) includes all orbits such that = () is covered by some block. Thus, the GaBP-based correction includes many more orbits. We also note that the error-bound using the GaBP-based estimate is typically smaller as we have shown that (|R |) (|R|) (if (|R|) < 1). Construction of B for Grids To achieve an error 1 bound n | log ZB | we must choose L log -1 . Z Then, the computation needed to achieve this precision will depend on both the number of blocks and the block size needed to cover all orbits up to this length. In certain classes of sparse graphs, it should be possible to control the complexity of the method. As an example, we demonstrate how to choose blocks for 2D grids. Consider the n× n square grid in which each vertex is connected to its four nearest neighbors. We may cover this graph by L × L blocks shifted (both vertically and horizontally) in increments of L (let L 2 be even). It can be seen that this set of blocks covers all loops shorter than L. To include all intersections of blocks, we add L × L , L × L and L × L blocks. The 2 2 2 2 block weights are wL×L = 1, wL×L/2 = wL/2×L = -1 Moreover, we can then bound the error of the estimate. Noting that LB includes all short orbits of the graph, we can derive the following result by a similar proof as for Theorem 3: Corollary 2 1 n log ZB Z (|A|)L L(1-(|A|)) . Thus, for the class of models with < 1, we obtain an approximation scheme which converges to the correct Orbit-Products & Gaussian BP and wL/2×L/2 = 1. The complexity of computing the determinant of an L × L block is O(L3 ) and the total number of blocks is O(n/L2 ). Hence, the total complexity is O(nL) = O(n log -1 ). We test our approach numerically on a 256×256 square grid (with periodic boundary conditions). We set all edge weights to r and test the quality of approximation using both estimates ZB (R) and Z bp ZB (R ) for r (0, .25) (J = I - R becomes indefinite for larger values of r) and block sizes L = 2, 4, 8, 16, 32. The results are shown in Figure 3. As expected, accuracy rapidly improves with increasing L in both methods and the GaBP-correction approach is more accurate. References Bickson, D., Dolev, D., & Yom-Tov, E. (2008a). A Gaussian belief propagation solver for large scale SVMs. 5th Europ. Conf. Complex Systems. Bickson, D., Shental, O., Siegel, P., Wolf, J., & Dolev, D. (2008b). Gaussian belief propagation based multiuser detection. IEEE Int. Symp. on Inform. Th. (pp. 1878­1882). Chernyak, V., & Chertkov, M. (2008). Fermions and loops on graphs I: Loop calculus for determinant. J. Statistical Mechanics: Theory and Experiments. Chertkov, M., & Chernyak, V. (2006). Loop series for discrete statistical models on graphs. J. Statistical Mechanics: Theory and Experiments. Cseke, B., & Heskes, T. (2008). Bounds on the Bethe free energy for Gaussian networks. Uncertainty in Artificial Intelligence (pp. 97­104). Johnson, J., Malioutov, D., & Willsky, A. (2006). Walk-sum interpretation and analysis of Gaussian belief propagation. Adv. in Neural Inform. Processing Systems 18 (pp. 579­586). Malioutov, D., Johnson, J., & Willsky, A. (2006). Walk-sums and belief propagation in Gaussian graphical models. J. of Machine Learning Research, 7, 2031­2064. Moallemi, C., & Van Roy, B. (2006). Consensus propagation. IEEE Trans. on Inform. Th., 52, 4753­4766. Moallemi, C., & Van Roy, B. (2009). Convergence of min-sum message passing for quadratic optimization. IEEE Trans. on Inform. Th., 55, 2413­2423. Plarre, K., & Kumar, P. (2004). Extended message passing algorithm for inference in loopy Gaussian graphical models. Ad Hoc Networks, 2, 153­169. Rusmevichientong, P., & Van Roy, B. (2001). An analysis of belief propagation on the turbo decoding graph with Gaussian densities. IEEE Trans. on Inform. Th., 47, 745­765. Stark, H., & Terras, A. (1996). Zeta functions of finite graphs and coverings. Adv. in Math., 121, 124­165. Weiss, Y., & Freeman, W. (2001). Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13, 2173­2200. Yedidia, J., Freeman, W., & Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inform. Th., 51, 2282­2312. 7. Conclusion and Future Work We have demonstrated an orbit-product representation of the determinant (the partition function of the Gaussian model) and interpreted the estimate obtained by GaBP as corresponding to totally backtracking orbits. Furthermore, we have shown how to correct the GaBP estimate in various ways which involve incorporating backtrackless orbits (e.g. cycles) of the graph. In particular, we demonstrated an efficient block-resummation method to compute truncated orbit-products in sparse graphs (demonstrated for grids). These methods also extend to address estimation of the matrix inverse (the covariance matrix of the Gaussian model), which may in turn be used as an efficient preconditioner for iterative solution of linear systems. We leave these extensions for a longer report. In future work, we plan to extend the method of constructing an efficient set of blocks to other classes of sparse graphs. It may also be fruitful to extend our analysis to generalized belief propagation (Yedidia et al., 2005) in Gaussian models. In a related direction, we intend to explore methods to "bootstrap" GaBP using the factorization Z(R) = k=0 Z(-R2 )) -1 k -1 , which follows from the formula k k (I - R) = k (I + R2 ). By computing Z bp (-R2 ) for small values of k we may capture short backtrackless orbits of the graph. Another direction is to investigate generalization of the formula Z = Z bp Z to non-walksummable models, perhaps using methods of (Chernyak & Chertkov, 2008). A related idea is to approximate a non-walksummable model by a walksummable one and then correct estimates obtained from the walk-summable model to better approximate the non-walksummable model.