Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation Lukas Kroc Ashish Sabharwal Bart Selman Department of Computer Science Cornell University, Ithaca NY 14853-7501, U.S.A. {kroc,sabhar,selman}@cs.cornell.edu Abstract We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature. This methodology scales up to several hundred variables. 1 Introduction Message passing algorithms, in particular Belief Propagation (BP), have been very successful in efficiently computing interesting properties of succinctly represented large spaces, such as joint probability distributions. Recently, these techniques have also been applied to compute properties of discrete spaces, in particular, properties of the space of solutions of combinatorial problems. For example, for propositional satisfiability (SAT) and graph coloring (COL) problems, marginal probability information about the uniform distribution over solutions (or similar combinatorial objects) has been the key ingredient in the success of BP-like algorithms for these problems. Most notably, the survey propagation (SP) algorithm utilizes this information to solve very large hard random instances of these problems [3, 11]. Earlier work on random ensembles of Constraint Satisfaction Problems (CSPs) has shown that the computationally hardest instances occur near phase boundaries, where instances go from having solutions to having no globally satisfying solution (a "solution focused picture"). In recent years, this picture has been refined and it was found that a key factor in determining the hardness of instances in terms of search algorithm (or sampling algorithm) is the question of how are the solutions distributed throughout the search space. This has made the structure of the solution space in terms of its clustering properties a key factor in determining the performance of combinatorial search methods ("cluster focused picture"). Can BP-like algorithms be used to provide such cluster-focused information? For example, how many clusters are there in a solution space? How big are the clusters? Answers to such questions will shed further light into our understanding of these hard combinatorial problems and lead to better algorithmic approaches for reasoning about them, be it for finding one solution or answering queries of probabilistic inference about the set of solutions. The study of the solution space geometry has indeed been the focus of a number of recent papers [e.g. 1, 2, 3, 7, 9, 11], This work was supported by IISI, Cornell University (AFOSR grant FA9550-04-1-0151), DARPA (REAL grant FA8750-04-2-0216), and NSF (grant 0514429). 1 especially by the statistical physics community, which has developed extensive theoretical tools to analyze such spaces under certain structural assumptions and large size limits. We provide a purely combinatorial method for counting the number of clusters, which is applicable even to small size problems and can be approximated very well by message passing techniques. Solutions can be thought of as `neighbors' if they differ in the value of one variable, and the transitive closure of the neighbor relation defines clusters in a natural manner. Counting the number of clusters is a challenging problem. To begin with, it is not even clear what is the best succinct way to represent clusters. One relatively crude but useful way is to represent a cluster by the set of `backbone' variables in that cluster, i.e., variables that take a fixed value in all solutions within the clusters. Interestingly, while it is easy (polynomial time) to verify whether a variable assignment is indeed a solution of CSP, the same check is much harder for a candidate cluster represented by the set of its backbone variables. We propose one of the first scalable methods for estimating the number of clusters of solutions of graph coloring problems using a belief propagation like algorithm. While the našve method, based i on enumeration of solutions and pairwise distances, scales to graph coloring problems with 50 or so nodes and a recently proposed local search based method provides estimates up to a few hundred node graphs [7], our approach--being based on BP--easily provides fast estimates for graphs with 100, 000 nodes. We validate the accuracy of our approach by also providing a fairly non-trivial exact counting method for clusters, utilizing advanced knowledge compilation techniques. Our approach works with the factor graph representation of the graph coloring problem. Yedidia et al. [12] showed that if one can write the so-called "partition function", Z , for a quantity of interest in a factor graph with non-negative weights, then there is a fairly mechanical variational method derivation that yields belief propagation equations for estimating Z . Under certain assumptions, we derive a partition function style quantity, Z(-1) , to count the number of clusters. We then use the variational method to obtain BP equations for estimating Z(-1) . Our experiments with random graph coloring problems show that Z(-1) itself is an extremely accurate estimate of the number of clusters, and so is its approximation, ZBP(-1) , obtained from our BP equations. 2 Preliminaries The graph coloring problem can be expressed in form of a factor graph, a bipartite graph with two kinds of nodes. The variable nodes, x = (x1 , . . . , xn ), represent the variables in the problem (vertices to be colored) with their discrete domain Dom = {c1 , . . . , ck } (set of k colors). The factor nodes, , . . ., with associated factor functions f , . . . , represent the constrains of the problem (no two adjacent vertices have the same color). Each factor function is a Boolean function with arguments x (a subset of variables from x) and range {0, 1} and evaluates to 1 iff the associated constraint is satisfied. An edge connects a variable xi with factor f iff the variable appears in the constraint represented by the factor node, which we denote by i . In the graph coloring problem, each factor function has exactly two variables. The factor representation weights all variable assignments x according to the product of values of all f (x ). In our case, the weight of an assignment x factors. We denote this product by F (x) := evaluates to 1 iff all of the factors have value of 1, otherwise it evaluates to 0. The assignments with weight 1 correspond exactly to legal colorings, or solutions to the problem. The number of solutions can thus be expressed as the weighted sum across all possible assignments. We denote this quantity by Z , the so called partition function: x x Z := F (x) = f (x ) (1) D omn D omn We define the solution space of a graph coloring problem to be the set of all its legal colorings. Two legal colorings (or solutions) are called neighbors if they differ in the color of one vertex. Definition 1 (Solution Cluster). A set of solutions C S of a solution space S is a cluster if it is a maximal subset such that any two solutions in C can be connected by a sequence from C where consecutive solutions are neighbors. (In other words, clusters are connected components of a "solution graph" where nodes correspond to solutions and edges exists between solutions that differ in a value of exactly one variable.) 2 3 A Partition Function Style Expression for Counting Clusters In this section we consider a method for estimating the number of solution clusters of a graph coloring problem. We briefly describe the concepts here; a more in-depth treatment, including formal results, may be found in [8]. First let us extend the definition of the function F so that it can be evaluated on an extended domain DomE xt := P ({c1 , . . . , ck }) \ where c1 , . . . , ck are the k domain values (colors) of each of the problem variables, and P is the power set operator (so |DomE xt| = 2k - 1). Each generalized assignment y DomE xtn thus associates a (nonempty) set of values with each original variable, defining a hypercube in the search space for x F . We generalize F and f to this extended domain in the natural way, F (y) := y F (x), and x f (y ) := f (x ), where the relation is applied point-wise, as will be the case with any y relational operators used on vectors in this text. This means that the function evaluates to 1 on a hypercube iff it evaluates to 1 on all points within that hypercube. Let us first assume that the solution space we work with decomposes into a set of separated hypercubes, so clusters correspond exactly to the hypercubes; by separated hypercubes, we mean that points in one hypercube differ from points in the other in at least two values. E.g. y 1 = ({c1 } , {c1 } , {c1 }) and y2 = ({c2 } , {c2 } , {c1 , c2 }) are separated hypercubes in three dimensions. This allows us to develop a surprisingly simple expression for counting the number of clusters, and we will later see that the same expression applies with high precision also to solution spaces of much more complex random instances of coloring problems. Consider the indicator function (y) for the property that y DomE xtn is a maximal solution hypercube contained in the solution space: iv n 1 (y) := F (y) · - F (y[yi yi {vi }]) y is legal / i yi o point-wise generalization is legal Where y[yi yi ] denotes a substitution of yi into yi in y. Note that if the solution clusters are in fact hypercubes, then variables that can be extended independently can be all extended at once (that is F (y[yi yi {vi }]) = 1 and F (y[yj yj {vj }]) = 1 implies F (y[yi yi {vi } , yj yj {vj }]) = 1). Moreover, any F (y[yi yi {vi }]) implies F (y). Using these observations, the can be reformulated by factoring out the product as follows (#o (y) is the number of odd-size elements of y, and #e (y) the number of even-size ones): z y )iv F (y[yi yi {vi }]) (-1)#o (y (y) = F (y) (P (D om))n \y i y i :=yy = z = (-1) y #o (z\y) F (yy ) by hypercube assumption F z) = (-1) ( #e (y) z (-1)#e (z) F (z) y Now to count the number of maximal hypercubes fitting into the set of solutions, we sum the indicator function (y) across all vectors y DomE xtn : =y z z y (-1)#e (y) (-1)#e (z) F (z) (-1)#e (z) F (z) = (-1)#e (y) (y) = y yz / z (-1)#e (z) F (z) i (-1)e (yi ) = 1 yi zi / =z (-1)#e (z) F (z) The expression above is important for our study, and we denote it by Z(-1) : z y Z(-1) := (-1)#e (z) F (z) = (-1)#e (y) f (y ) D omE xtn D omE xtn (2) The chosen notation is to emphasize its relatedness to the partition function ( 1) denoted by Z , and indeed the expressions differ only in the (-1) term. If the solution space comprises of a set of separated hypercubes, then Z(-1) exactly captures the number of clusters (each hypercube is a cluster). Surprisingly, this number is remarkably accurate even for random coloring problems as we will see in Section 6, Figure 1. 3 4 Exact Computation of the Number of Clusters and Z(-1) Obtaining the exact number of clusters for reasonable size problems is crucial for evaluating our proposed approach based on Z(-1) and the corresponding BP equations to follow in Section 5. A našve way is to explicitly enumerate all solutions, compute their pairwise Hamming distances, and i infer the cluster structure. Not surprisingly, this method does not scale well because the number of solutions typically grows exponentially as the number of variables of the graph coloring problems increases. We discuss here a much more scalable approach that uses two advanced techniques to this effect: disjunctive negation normal form (DNNF) and binary decision diagrams (BDDs). Our method scales to graph coloring problems with a few hundred variables (see experimental results) for computing both the exact number of clusters and the exact value of Z(-1) . Both DNNF [5] and BDD [4] are graph based data structures that have proven to be very effective in "knowledge compilation", i.e., in converting a 0-1 function F into a (potentially exponentially long, but often reasonably sized) standard form from which various interesting properties of F can be inferred easily, often in linear time in the size of the DNNF formula or BDD. For our purposes, we use DNNF to succinctly represent all solutions of F and a set of BDDs to represent solution clusters that we create as we traverse the DNNF representation. The only relevant details for us of these two representations are the following: (1) DNNF is represented as an acyclic directed graph with variables and their negations at the leaves and two kinds of internal nodes, "or" and "and"; "or" nodes split the set of solutions such that they differ in the value of the variable labeling the node but otherwise have identical variables; "and" nodes partition the space into disjoint sets of variables; (2) BDDs represent arbitrary sets of solutions and support efficient intersection and projection (onto a subset of variables) operations on these sets. We use the compiler c2d [6] to obtain the DNNF form for F . Since c2d works on Boolean formulas and our F often has non-Boolean domains, we first convert F to a Boolean function F using a unary encoding, i.e., by replacing each variable xi of F with domain size t with t Boolean variables xi,j , 1 j t, respecting the semantics: xi = j iff xi,j = 1. In order to ensure that F and F have similar cluster structure of solutions, we relax the usual condition that only one of x i,1 , . . . , xi,t may be 1, thus effectively allowing the original xi to take multiple values simultaneously. This yields a generalized function: the domains of the variables of F correspond to the power sets of the domains of the respective variables of F . This generalization has the following useful property: if two solutions x(1) and x(2) are neighbors in the solution space of F , then the corresponding solutions x (1) and x (2) are in the same cluster in the solution space of F . Computing the number of clusters. Given F , we run c2d on it to obtain an implicit representation of all solutions as a DNNF formula F . Next, we traverse F from the leaf nodes up, creating clusters as we go along. Specifically, with each node U of F , we associate a set SU of BDDs, one for each cluster in the sub-formula contained under U . The set of BDDs for the root node of F then corresponds precisely to the set of solution clusters of F , and thus of F . These BDDs are computed as follows. If U is a leaf node of F , it represents a Boolean variable or its negation and SU consists of the single one-node BDD corresponding to this Boolean literal. If U is an internal node of F labeled with the variable xU and with children L and R, the set of BDDs SU is computed as follows. If U is an "or" node, then we consider the union SL SR of the two sets of BDDs and merge any two of these BDDs if they are adjacent, i.e., have two solutions that are neighbors in the solution space (since the DNNF form guarantees that the BDDs in SL and SR already must differ in the value of the variable xU labeling U , the adjacency check is equivalent to testing whether the two BDDs, with xU projected out, have a solution in common; this is a straightforward projection and intersection operation for BDDs); in the worst case, this leads to |SL | + |SR | cluster BDDs in SU . Similarly, if U is an "and" node, then SU is constructed by considering the cross product {bL and bR | bL SL , bR SR } of the two sets of BDDs and merging adjacent resulting BDDs as before; in the worst case, this leads to |SL | · |SR | cluster BDDs in SU . Evaluating Z(-1) . The exact value of Z(-1) on F can also be evaluated easily once we have the DNNF representation F . In fact, as is reflected in our experimental results, evaluation of Z(-1) w is a much more scalable process than counting clusters because it requires a simple traversal of F ithout the need for maintaining BDDs. With each node U of F , we associate a value VU which equals precisely the difference between the number of solutions below U with an even number of positive literals and those with an odd number of positive literals; Z(-1) then equals (-1)N 4 times the value thus associated with the root node of F . These values are computed bottomup as follows. If U is a leaf node labeled with a positive (or negative) literal, then V U = -1 (or 1, resp.). If U is an "or" node with children L and R, then VU = VL + VR . This works because L and R have identical variables. Finally, if U is an "and" node with children L and R, then VU = VL VR . This last computation works because L and R are on disjoint sets of variables e and because of the following observation. Suppose L has VL solutions with an even number of o positive literals and VL solutions with an odd number of positive literals; similarly for R. Then ee oo eo oe e o e o VU = (VL VR + VL VR ) - (VL VR + VL VR ) = (VL - VL )(VR - VR ) = VL VR . 5 Belief Propagation Inference for Clusters We present a version of the Belief Propagation algorithm that allows us to deal with the alternating signs of Z(-1) . The derivation follows closely the one given by Yedidia et al. [12] for a standard BP, i.e. we will write equations for a stationary point of KL divergence of two sequences (not necessarily probability distributions in our case). Since the Z(-1) expression involves both positive and negative terms, we must appropriately generalize some of the steps. Given a function p(y) (the target function, with real numbers as its range) on DomE xt n that is known up to a normalization constant but with unknown marginal sums, we seek a function b(y) (the trial function) to approximate p(y), such th t b's marginal sums are known. The target function a p(y) is defined as p(y) := Z(1 1) (-1)#e (y) f (y ). We adopt previously used notation [12]: - y are values in y of variables that appear in factor (i.e. vertex) f ; y-i are values of all variables in y except yi . The marginal sums can be extended in a similar way to allow for any number of variables fixed in y, specified by the subscript. When convenient, we treat the symbol as a set of indexes of variables in f , to be able to index them. We begin by listing the assumptions used in the derivation, both the ones that are used in the "standard" BP, and two additional ones needed for the generalization. An assumption on b(y) is legitimate if the corresponding condition holds for p(y). Assumptions: The standard assumptions, present in the derivation of standard BP [ 12], are: y y · Marginalization: bi (yi ) = b(y) and b (y ) = b(y). This condition is legitimate, -i - but cannot be enforced. Moreover, it might happen that the solution found by BP does not satisfy it, which is a known problem with BP [10]. y y · Normalization: bi (yi ) = b (y ) = 1. It is legitimate, and explicitly enforced. i y · Consistency: , i , yi : bi (yi ) = b (y ). This is a legitimate assumption and \i explicitly enforced. · Tree-like decomposition: says that the weights b(y) of each configuration can be obtained from the marginal sums as follows (di is a degree of the variable node yi in the factor graph): Q |b (y )| |b(y)| = Q | i (yi )|di -1 (The standard assumption is without the absolute values.) This assumpib tion is not legitimate, and it is built-in, i.e. it is used in the derivation of the BP equations. To appropriately handle the signs of b(y) and p(y), we have two additional assumptions. These are necessary for the BP derivation applicable to Z(-1) , but not for standard BP equations. · Sign-correspondence: b(y) and p(y) have the same signs for all configurations y (zero, being a singular case, is treated as having a positive sign). This is a built-in assumption and legitimate. · Sign-alternation: bi (yi ) is negative iff |yi | is even, and b (y ) is negative iff #e (y ) is odd. This is also a built-in assumption, but not necessarily legitimate, whether or not it is legitimate depends on the structure of the solution space of a particular problem. The Sign-alternation assumption can be viewed as an application of the inclusion-exclusion principle, and is easy to illustrate on a graph coloring problem with only two colors. In such case if F (y) = 1, then yi = {c1 } means that yi can have color 1, yi = {c2 } that yi can have color 2, and yi = {c1 , c2 } that yi can have both colors. The third event is included in the first two, and its probability must thus appear with a negative sign if the sum of probabilities is to be 1. Kullback-Leibler divergence: The KL-divergence is traditionally defined for probability distributions, for sequences of non-negative terms in particular. We need a more general measure, as our sequences p(y) and b(y) have alternating signs. But using the Sign-correspondence assumption, we 5 observe that the usual definition of KL-divergence is still applicable, since the term in the logarithm y y b(y) |b(y)| is non-negative: D(b p) := D omE xtn b(y) log p(y) = D omE xtn b(y) log |p(y)| . Moreover, the following Lemma shows that the two properties of KL-divergence that make it suitable for distance-minimization are still valid. Lemma 1. Let b(.) and p(.) be (possibly negative) weight functions on the same domain D, with the property that they agree on signs for ayl states (i.e.yy D : sig n(b(y)) = sig n(p(y))), and that l they sum to the same constant (i.e. b(y) = p(y) = c). Then the KL-divergence D(b p) satisfies D(b p) 0 and D(b p) = 0 b p. The proof is essentially identical to the equivalent statement made about KL-divergence of probability distributions. Minimizing D(b p): We write p(y) = sig n(p(y)) · |p(y)|, and analogously for b(y). This allows us to isolate the signs, and the minimization follows exactly the steps of standard BP derivation, namely we write a set of equations characterizing stationary points of D(b p). At the end, using the Sign-alternation assumption, we are able to implant the signs back. BP equations: The resulting modified BP updates (denoted BP(-1) ) are, for yi DomE xt: ni (yi ) = m i (yi ) i\ (3) (4) mi (yi ) y \i f (y ) D omE xt||-1 j (-1)(|yj | is even) nj (yj ) \i (Almost equivalent to standard BP, except for the (-1) term.) One would iterate these equations from a suitable starting point to find a fixed point, and then obtain the beliefs bi (yi ) and b (y ) (i.e. estimates of marginal sums) using the Sign-alternation assumption and the standard BP relations: i bi (yi ) (-1)(|yi | is even) mi (yi ) b (y ) (-1)#e (y ) f (y ) ni (yi ) (5) i To approximately count the number of clusters in large problems for which exact cluster count or exact Z(-1) evaluation is infeasible, we employ the generic BP(-1) scheme derived above. We substitute the extended factor (3) into Equations (3, 4), iterate from a random initial starting point to find a fixed point, and then use Equations (5) to compute the beliefs. The actual estimate of Z(-1) is obtained with the standard BP formula (with signs properly taken care of), where d i is a degree of the variable node yi in the factor graph: y i y log ZBP(-1) := - b (y ) log |b (y )| + (di - 1) bi (yi ) log |bi (yi )| (6) i 6 Experimental Evaluation We empirically evaluate accuracy of our Z(-1) and ZBP(-1) approximations on an ensemble of random graph 3-coloring problems. The results are discussed in this section. Z(-1) vs. the number of clusters. The left panel of Figure 1 compares the number of clusters (on the x-axis, log-scale) with Z(-1) (on the y-axis, log-scale) for 2500 solvable random 3-COL problems on graphs with 20, 50, and 100 vertices with average vertex degree ranging between 1.0 and 4.7 (the threshold for 3 colorability). As can be seen, the Z(-1) expression captures the number of clusters almost exactly. The inaccuracies come mostly from low graph density regions; in all instances we tried with density > 3.0 the Z(-1) expression is exact. We remark that although uncolorable instances were not considered in this comparison, Z(-1) = 0 = num-clusters by construction. It is worth noting that for tree graphs (with more than one vertex), the Z(-1) expression gives 0 for any k 3 colors although there is exactly one solution cluster. Moreover, given a disconnected graph with at least one tree partition, the Z(-1) also evaluates to 0 due to the symmetry of colorings in the tree partition. We have thus removed all tree partitions from the generated graphs prior to computing Z(-1) (removing tree partition does not change the number of clusters). For low graph 6 5000 1000 20 50 Average log(Z)/N 0.05 0.10 0.15 0.10 Z(-1)-marginals |V|= 20 |V|= 50 |V|= 100 Z(-1) 200 0.20 0.30 0.20 ZBP(-1), |V|=100K ZBP(-1), |V|=100 Z(-1), |V|=100 0.00 0.00 1 5 5 20 50 200 1000 Number of clusters 5000 0.00 0.10 0.20 Cluster marginals 0.30 2 3 4 Average vertex degree Figure 1: Left: Z(-1) vs. number of clusters in random 3-COL problems with 20, 50 and 100 vertices, and average vertex degree between 1.0 - 4.7. Right: cluster marginals vs. Z(-1) -marginals for one instance of random 3-COL problem with 100 vertices. Figure 2: Average ZBP(-1) and Z(-1) for 3-COL vs. average vertex degrees for small and large random graphs. densities, there are still some instances for which Z(-1) evaluates to 0 even after removing tree partitions (these instances are not visible in Figure 1 due to the log-log scale). In fact, all our instances with fewer than 5 clusters have Z(-1) = 0. This is because of other substructures for which Z(-1) evaluates to 0, e.g. cordless cycle of length not divisible by 3 (for k = 3 coloring) with attached trees. These structures become more rare as the density increases. Z(-1) marginals vs. clusters marginals. For a given problem instance, we can define a cluster marginal of a variable xi to be the fraction of solution clusters in which xi only appears with one particular value (i.e. xi is a backbone of the clusters). Since Z(-1) counts well the number of clusters, it is natural to ask whether it is also possible to obtain the marginal information. Indeed, Z(-1) does provide an estimate for cluster marginals, and we call them Z(-1) marginals. Recall that the semantics of factors in the extended domain is such that a variable can assume as set of values only if any value in the set yields a solution to the problem. This extends to Z(-1) estimate of the number of clusters, and one can therefore use the principle of inclusion-exclusion to compute the number of clusters where a variable can only assume one particular value. The definition of Z (-1) conveniently provides for correct signs, and the number of clusters where x i is fixed to vi is thus y Z(-1) (yi ), where Z(-1) (yi ) is a marginal sum of Z(-1) . The Z(-1) -marginal estimated by i vi is obtained by dividing this number by Z(-1) . The right panel of Figure 1 shows results of such comparison on one random 3-COL problem with 100 vertices. The axis show cluster marginals and Z(-1) -marginals for one color, points correspond to individual variables. The Z(-1) -marginals are close to perfect. This is a typical situation, although it is important to mention that Z(-1) -marginals are not necessarily correct, or even non-negative. They are merely an estimate of the true cluster marginals, and how well they work depends on the solution space structure at hand. They are exact if the solution space decomposes into separated hypercubes and, as the figure shows, remarkably accurate also for random coloring instances. The number of clusters vs. ZBP(-1) . Figure 3 shows comparison between ZBP(-1) and Z(-1) for the 3-COL problem on colorable random graphs of various sizes and graph densities. It compares Z(-1) (on the x-axis, log-scale) with ZBP(-1) (y-axis, log-scale) for 1300 solvable 3-COL problems on random graphs with 50,100, and 200 vertices, with average vertex degree ranging from 1.0 to 4.7. The plots shows that BP is quite accurate in estimating Z(-1) for individual instances, which in turn captures the number of clusters. Instances which are not 3-colorable are not shown, and BP in general estimates a non-zero number of clusters. Estimates on very large graphs and for various graph densities. Figure 2 shows similar data from a different perspective: what is shown is a rescaled average estimate of the number of clusters (y-axis) for average vertex degrees 1.0 to 4.7 (x-axis). The average is taken across different colorable instances of a given size, and the rescaling assumes that the number of clusters = exp(|V | · ) where is independent of the number of vertices [3]. The three curves show, respectively, BP's estimate for graphs with 100, 000 vertices, BP's estimate for graphs with 100 vertices, and Z (-1) for same graphs of size 100. The averages are done across 3000 instances of the small graphs, and only 10 instances of the large ones, where the instance-to-instance variability is practically nonexistent. The 7 1e+09 1e+09 1e+06 1e+06 ZBP(-1) ZBP(-1) 1e+03 1e+03 1e+00 1e+00 1e+00 1e+03 1e+06 Z(-1) 1e+09 1e+00 1e+03 1e+06 Z(-1) 1e+09 1e+00 1e+03 ZBP(-1) 1e+06 1e+09 |V|= 50 |V|= 100 |V|= 200 1e+00 1e+03 1e+06 Z(-1) 1e+09 Figure 3: ZBP(-1) compared to Z(-1) for 3-COL problem on random graphs with 50, 100 and 200 vertices and average vertex degree in the range 1.0 - 4.7. fact that the curves nicely overlay shows that BP(-1) computes Z(-1) very accurately on average for colorable instances (where we can compare it with exact values), and that the estimate remains accurate for large problems. Note that the Survey Propagation algorithm developed by Braunstein et al. [3] also aims at computing the number of certain clusters in the solution space. However, that algorithm counts only the number of clusters with a "typical size", and would show non-zero values in Figure 2 only for average vertex degrees between 4.42 and 4.7. Our algorithm counts clusters of all sizes, and is very accurate in the entire range of graph densities. 7 Conclusion We discuss a purely combinatorial construction to estimate the number of solution clusters in graph coloring problems with very high accuracy. The technique is based on a inclusion-exclusion argument coupled with solution counting, and lends itself to an application of a modified belief propagation algorithm. This way, the number of clusters in huge random combinatorial problems can be accurately and efficiently estimated, as we show for the case of graph colorability. Moreover, due to the nature of the construction, combinatorial arguments can be used to argue that the counts are exact on certain kinds of solution spaces. This is the direction we are currently pursuing, with a hope that these insights could lead into new techniques for solving hard coloring problems, and bounding solvability transitions in random ensembles. References [1] D. Achlioptas and F. Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction problems. In 38th STOC, pages 130­139, Seattle, WA, May 2006. [2] J. Ardelius, E. Aurell, and S. Krishnamurthy. Clustering of solutions in hard satisfiability problems. J. Statistical Mechanics, P10012, 2007. [3] A. Braunstein, R. Mulet, A. Pagnani, M. Weigt, and R. Zecchina. Polynomial iterative algorithms for coloring and analyzing random graphs. Physical Review E, 68:036702, 2003. [4] R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, 35(8):677­691, 1986. [5] A. Darwiche. Decomposable negation normal form. J. ACM, 48(4):608­647, 2001. [6] A. Darwiche. New advances in compiling CNF into decomposable negation normal form. In 16th European Conf. on AI, pages 328­332, Valencia, Spain, Aug. 2004. [7] A. Hartmann, A. Mann, and W. Radenback. Clusters and solution landscapes for vertex-cover and SAT problems. In Workshop on Physics of Distributed Systems, Stockholm, Sweden, May 2008. [8] L. Kroc, A. Sabharwal, and B. Selman. Counting solution clusters of combinatorial problems using belief propagation, 2008. (in preparation). [9] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, and L. Zdeborova. Gibbs states and the set of solutions of random constraint satisfaction problems. PNAS, 104(25):10318­10323, June 2007. [10] D. Mackay, J. Yedidia, W. Freeman, and Y. Weiss. A conversation about the Bethe free energy and sum-product, 2001. URL citeseer.ist.psu.edu/mackay01conversation.html. Ž [11] M. Mezard, G. Parisi, and R. Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812­815, 2002. [12] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7):2282­2312, 2005. 8