Fast mixing for independent sets, colorings and other models on trees
Fabio Martinelli
Abstract

Alistair Sinclair

Dror Weitz

spin. Note that setting U (s1 , s2 ) =  corresponds to a hard constraint, i.e., spin values s1 , s2 are forbidden to be adjacent. We denote by   S V the set of all valid spin configurations, i.e., those for which ľ( ) > 0. We will give several concrete examples in a moment. Usually G is a finite portion of some regular lattice, such as Zd . In this paper we concentrate on what is known in statistical physics as the Bethe lattice Tb , i.e., G is a complete rooted tree in which each interior vertex has b  2 children. Spin systems on trees are not only a useful simplification of their more classical counterparts on Zd , but have recently attracted a lot of attention in their own right as the canonical example of models on "non-amenable" graphs (i.e., those whose boundary is of comparable size to their volume) -- see, e.g., [1, 2, 3, 8, 15]. 1 Introduction A boundary condition corresponds to fixing the spins 1.1 Spin systems on trees at the leaves of G to some specified values (e.g., all Spin systems capture a wide range of probabilistic modare set to the same value s). This allows us to formalels studied in statistical physics, applied probability, arize the central concept of a phase transition. If we let tificial intelligence and elsewhere. A (nearest neighthe size of the tree G grow to infinity, the Gibbs disbor) spin system on a graph G = (V , E ) is specified by tribution tends to a limit known as the (infinite-volume) a finite set S of spin values, a symmetric pair potenGibbs measure. This limit may or may not depend on the tial U : S × S  R  {}, and a singleton potential boundary condition: i.e., there may be either a unique W : S  R. A configuration   S V of the system asGibbs measure, or multiple Gibbs measures ("phases") signs to each vertex (site) x  V a spin value x  S . corresponding to different boundary conditions. InforThe probability of finding the system in configuration  mally, the existence of multiple Gibbs measures correis determined by the Gibbs distribution sponds to the fact that the spin configuration on the - x x . leaves can have a non-zero influence on the spin at the ľ( )  exp U (x , y ) + W (x ) root even as the depth of the tree tends to infinity. A y E V phase transition occurs when an infinitesimal change in Thus the pair potential specifies the likelihood of seeing the potentials leads to a switch from a unique Gibbs a given pair of spins at adjacent sites, while the single- measure to multiple Gibbs measures. See, e.g., [7] for ton potential specifies the likelihood of seeing a given more background. We now illustrate the above ideas with some con Department of Mathematics, University of Roma Tre, Largo San crete examples. The following four spin systems are Murialdo 1, 00146 Roma, Italy. Email: martin@mat.uniroma3.it. among the most widely studied in the literature, and This work was done while this author was visiting the Departments of EECS and Statistics, University of California, Berkeley, supported will serve as the motivating examples in this paper: We study the mixing time of the Glauber dynamics for general spin systems on bounded-degree trees, including the Ising model, the hard-core model (independent sets) and the antiferromagnetic Potts model at zero temperature (colorings). We generalize a framework, developed in our recent paper [18] in the context of the Ising model, for establishing mixing time O(n log n), which ties this property closely to phase transitions in the underlying model. We use this framework to obtain rapid mixing results for several models over a significantly wider range of parameter values than previously known, including situations in which the mixing time is strongly dependent on the boundary condition.
in part by a Miller Visiting Professorship.  Computer Science Division, University of California, Berkeley, CA 94720-1776, U.S.A. Email: {sinclair,dror}@cs.berkeley.edu. Supported in part by NSF Grant CCR-0121555 and DARPA cooperative agreement F30602-002-0601.
 Strictly speaking, in the Bethe lattice all vertices (including the root) have degree b + 1; for convenience we assume that the root has degree b. This difference is purely technical and does not affect our results.

Copyright Š 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688

456


A. The (ferromagnetic) Ising model There are two spin values S = {ą1}, and the potentials are U (s1 , s2 ) = - s1 s2 , W (s) = 0, where  is the inverse temperature. Thus  = {ą1}V . The Gibbs distribution ľ assigns higher weight to configurations in which many neighboring spins are aligned with one another. This effect increases with  , so that at high temperatures (low  ) the spins behave almost independently, while at low temperatures (high  ) large connected regions of equal spins tend to form. In fact, as is well known [23], a phase transition 1 occurs at the critical value  = 0 = 2 ln( b+1 ); in other b-1 words, in the "high temperature" region   0 there is a unique Gibbs measure independent of the boundary condition, but as soon as  > 0 we get (at least) two different Gibbs measures corresponding to the +1 and -1 boundary conditions respectively; i.e., if the leaves are +1 then the root has probability bounded away 1 from 2 of being +1. B. The hard-core model (independent sets) This has been used in statistical physics as a model of lattice gases [7], and also in other areas such as the modeling of communication networks [13]. Again there are just two spins S = {0, 1}, and we refer to a site as occupied if it has spin value 1 and unoccupied otherwise. The potentials are U (1, 1) = , U (1, 0) = U (0, 0) = 1, W (1) = L and W (0) = 0, where L  R. The hard constraint here means that no two adjacent sites may be occupied, so  can be identified with the set of all independent sets in G. Also, the aggregated potential of a valid configuration is proportional to the number of occupied sites. Hence the Gibbs distribution takes the simple form ľ( )  N () , where N ( ) is the number of occupied sites and the parameter  = exp(-L) > 0, which controls the density of occupation, is referred to as the "activity." The hard-core model also undergoes a phase tranb sition at a critical activity  = 0 = (b-b )b+1 (see, e.g., 1 [24, 13]). For   0 there is a unique Gibbs measure independent of the boundary condition, while for  > 0 there are (at least) two distinct phases, corresponding to the "odd" and "even" boundary conditions respectively. The even (odd) boundary condition is obtained by making the leaves of the tree all occupied if the depth is even (odd), and all unoccupied otherwise. For  > 0 , the probability of occupation of the root in the infinite-volume Gibbs measure differs for odd and even boundary conditions. C. The (ferromagnetic) Potts model This model was introduced by Potts [22] as a generalization of the Ising model to more than two spin values; see [28] for a survey. Here S = {1, 2, . . . , q } and the po-

tentials are U (s1 , s2 ) = -2 s1 ,s2 , W (s) = 0. Thus the spin at each site can take one of q possible values, and the aggregated potential of any configuration depends on the number of adjacent pairs of equal spins. Note that the Ising model is the special case q = 2. There are no hard constraints, so  = S V . Qualitatively the behavior of this model is similar to that of the Ising model, though less is known in precise quantitative terms. Again there is a phase transition at a critical  = 0 , which depends on b and q , so that for  > 0 (and indeed for   0 when q > 2) there are multiple phases. This value 0 does not in general have q 1 a closed form, but it is known [8] that 0 < 2 ln( b+--1 ) b1 for all q > 2. (For q = 2, this value is exactly 0 for the Ising model as quoted earlier.) D. The antiferromagnetic Potts model (colorings) In this model S = {1, 2, . . . , q }, and the potentials are U (s1 , s2 ) = 2 s1 ,s2 , W (s) = 0, where  is the inverse temperature. This is analogous to the Ising and Potts models except that the interactions are antiferromagnetic, i.e., neighbors with unequal spins are favored. The most interesting case of this model is when  =  (i.e., zero temperature), which introduces hard constraints. Thus if we think of the q spin values as colors,  is the set of proper colorings of G, i.e., assignments of colors to vertices so that no two adjacent vertices receive the same color. The Gibbs distribution is uniform over proper colorings. In this model it is q that provides the parameterization. This model has been widely studied not only in statistical physics, but also in computer science because of its connection to graph coloring. See, e.g., [3] for an informative account. For colorings on the b-ary tree it is well known that, when q  b + 1, there are multiple Gibbs measures; this follows immediately from the existence of "frozen configurations," i.e., colorings in which the color of every internal vertex is forced by the colors of the leaves (see, e.g., [3]). Recently it has been proved that, as soon as q  b + 2, the Gibbs measure is unique [12]. 1.2 Glauber dynamics While classical statistical physics focused on static questions about the infinite volume Gibbs measure (such as the existence of a phase transition), the emphasis in recent years has shifted towards the study of the Glauber dynamics, a local Markov chain on the set of spin configurations  of a finite graph G. For definiteness, we describe the "heat-bath" version of Glauber dynamics: at each step, pick a site x u.a.r., and replace the spin at x by a random spin drawn from the conditional distribution given the spins of its neighbors. It is easy to check that this dynamics converges to ľ as its stationary distribution, providing both an algorithm for sampling from

457


the Gibbs distribution and a plausible model for the actual evolution of the physical system. The key question in the study of the dynamics is to determine the mixing time, i.e., the number of steps until the distribution is close to ľ, starting from an arbitrary configuration. Recent developments in statistical physics have revealed a remarkable connection between the mixing time and phase transitions in the special case of the Ising model: for the Ising model on an n-vertex square in Z2 (which also has a phase transition at a critical 0 ), when  < 0 the mixing time with arbitrary boundary conditions is O(n log n) (which is optimal), but as soon as  > 0 there are boundary conditions for which  the mixing time jumps to exp(( n )) [17]. This conforms with the intuition that the existence of multiple phases creates a "bottleneck" which dramatically slows down mixing. For the Ising model on trees the situation is even more interesting (see [2, 18]): on an nvertex b-ary tree, the mixing time remains O(n log n) for all boundaries, not only for   0 but in fact for b+ > 1 0 . As soon as all  < 1 , where 1 = 2 ln b-1 1  > 1 , the mixing time for certain boundaries becomes n1+(1) , and the exponent is unbounded as   . Thus there is again a sharp transition in mixing behavior, but it occurs inside the multiple-phase region.§ Finally, in the low temperature region   1 , the mixing time is heavily influenced by the boundary condition: recently, we proved that with the +1 (or, symmetrically, -1) boundary condition, the mixing time remains O(n log n) throughout this region (and hence for all values of  ) [18]. This formalizes the intuition that the boundary condition breaks the symmetry between the two phases at low temperatures, thus eliminating the bottleneck and enabling rapid mixing. The above discussion of the Ising model highlights two central issues in the study of the Glauber dynamics for general models. Firstly, for which range of parameter values is the mixing time "optimal" (i.e., O(n log n)) for arbitrary boundary conditions? In particular, does this range extend throughout the region of uniqueness of the Gibbs measure, and even beyond (as it does for the Ising model)?ś And secondly, for parameter values outside this range, are there natural boundary conditions for which the mixing time is still optimal? In this paper we present a unified framework for
a mixing time of O (n log n) hides constants that depend only on the potentials and on the degree b. § This second critical value  has other interpretations in terms 1 of extremality of the Gibbs measure and the threshold for noisy data transmission on the tree [6]. ś Far into the uniqueness region -- e.g., for the Ising model at very high temperatures -- it is easy to see using the Dobrushin Uniqueness Condition that the mixing time is O (n log n) (see, e.g., [27]).
 Throughout,

answering such questions for general spin systems on trees. The framework is adapted and extended from our earlier paper [18] on the Ising model, where we bounded the log-Sobolev constant (and hence the mixing time) in terms of two simple quantities derived from the Gibbs measure, leading to a clean criterion for O(n log n) mixing time. Using this framework, we are able to substantially extend the range of parameter values for which O(n log n) mixing time is known for the models described above, and explicitly relate this to properties of the Gibbs measure. In the case of specific boundaries, these are apparently the first results of this kind for any of these models. Beyond providing new results for specific models, our goal is also to demonstrate the wide applicability of our framework. 1.3 Our results In this subsection we state our results and explain how they relate to previous work. A. The Ising model Our results for this model are outlined above, and were derived in our recent paper [18]. B. The hard-core model (independent sets) The Glauber dynamics for the hard-core model on trees is known to have mixing time polynomial in n at all activities  > 0 with arbitrary boundaries [11]. Moreover, a rather general result of Luby and Vigoda [14, 26] en2 sures a mixing time of O(n log n) when  < b-1 , with arbitrary boundaries. This latter result actually holds for any graph G of maximum degree b + 1. In this paper, we prove results for the hard-core model that mirror those stated above for the Ising model. First, we show that the mixing time is O(n log n) for all activities   0 (and indeed beyond), with arbitrary boundary conditions. Second, for the even (or odd) boundary condition, we get the same result for all activities : T H E O R E M 1 . 1 . For the hard-core model on the n-vertex b-ary tree, the mixing time of the Glauber dynamics is O(n log n) in both of the following situations: (i) the boundary condition is arbitrary, and    ; max 0 , b1 1 - (ii) the boundary condition is even (or odd), and  > 0 is arbitrary.
 A recent paper of Jerrum et al. [11] provides alternative tools based on decomposition ideas for bounding the log-Sobolev constant; those tools work in much more general settings, but give weaker bounds than ours for the scenarios discussed in this paper. In particular, it seems unlikely that those methods are sensitive enough to isolate the regime where the mixing time is O (n log n).

458


1 for all b  5, and indeed for large b it grows as ( b ) compared to the ( 1 ) growth of 0 . Thus for b  5 we b establish O(n log n) mixing time in a region above the critical value 0 . To the best of our knowledge this is the first such result. (Note that the result of [14, 26] mentioned earlier establishes O(n log n) mixing time only 2 for  < b-1 , which is less than 0 for all b and so does not even cover the whole uniqueness region.) We also mention that our analysis in this region has consequences for the infinite volume Gibbs measure itself, implying that when   b1 1 any Gibbs measure that - is the limit of finite Gibbs distributions for some fixed boundary configuration is extremal, again a new result. (For results on extremality with specific boundary conditions, see [4, 16].) We elaborate on this point in the full version of the paper, where we also derive similar results for an arbitrary system with two spin values.

Part (ii) of this theorem is analogous to our earlier result that the mixing time for the Ising model with +1-boundary is O(n log n) at all temperatures. This is in line with the intuition that the even boundary eliminates the only bottleneck in the dynamics. Part (i) identifies a region in which the mixing time is insensitive to the boundary condition. We would expect this to hold throughout the low-activity region   0 , and indeed, by analogy with the Ising model, also in some intermediate region beyond this. Our bound in part (i) confirms this behavior: note that the quantity b1 1 exceeds 0 -

 q ln( b+--1 ) > 0 when q  2( b + 1), this result exb1 tends into the multiple phase region for many combinations of b and q . Part (ii) of the theorem is an analog of our earlier result that the mixing time of the Ising model with +1-boundaries is O(n log n) at all temperatures. Part (iii) is of interest for two reasons. First, since 1 > 0 always, it exhibits a natural boundary condition under which the mixing time is O(n log n) beyond the uniqueness region (but not for arbitrary  ) for all combinations of b and q . Second, because of an intimate connection between the free boundary case and so-called "reconstruction problems" on trees [20] (in which the edges are noisy channels and the goal is to reconstruct a value transmitted from the root), we obtain an alternative proof of the best known value of the noise parameter under which reconstruction is impossible [21]. As we observe later, a slight strengthening of part (iii) marginally improves on this threshold.
1 2

D. The antiferromagnetic Potts model (colorings) The sharpest result known for the Glauber dynamics on colorings is due to Vigoda [25], who shows that for arbitrary boundary conditions the mixing time is 1 O(n log n) provided q > 16 (b + 1). This result actually holds not only for trees but for any n-vertex graph G of maximum degree b + 1. In this paper, we extend this rapid mixing result throughout the uniqueness region, i.e., all the way down to the "critical" value q = b + 2 for which the Gibbs measure remains unique. T H E O R E M 1 . 3 . The mixing time of the Glauber dynamics for colorings on the n-vertex b-ary tree is O(n log n) for arbitrary boundary conditions and q  b + 2. Note that this result is optimal: when q = b + 1, it is not too hard to construct boundary conditions under which the Glauber dynamics is not connected. The remainder of the paper is organized as follows. After some basic definitions in Section 2, in Section 3 we extend the analytic framework from our previous paper to general spin systems on trees, defining the quantities  and  and relating them to the mixing time. Then in Sections 4, 5 and 6 we specialize the analysis to the hard-core, colorings and Potts models respectively. Owing to space limitations, some details are left for the full version.

C. The (ferromagnetic) Potts model Little is known about the Glauber dynamics for the Potts model on trees, beyond the facts that the mixing time is O(n log n) for arbitrary boundaries at very high temperatures (by the Dobrushin Uniqueness Condition), and is (n1+ ) for some boundaries at very low temperatures (combining results in [2, 21]). In this paper, we prove: T H E O R E M 1 . 2 . The mixing time of the Glauber dynamics for the Potts model on an n-vertex b-ary tree is O(n log n) in all of the following situations: (i) the boundary conditi;on is arbitrary and    1 max 0 , 2 ln( b+1 ) b-1 <

(ii) the boundary condition is constant (e.g., all sites on the boundary have spin 1) and  is arbitrary; (iii) the boundary condition is free (i.e., the boundary spins are unconstrained) and  < 1 , where 1 is e21 e21 1 the solution to the equation e21 +--1 ˇ e21 -1 = 1 . b q +1 Part (i) of this theorem shows that we get O(n log n) mixing time for arbitrary boundaries throughout  1 the uniqueness region; also, since 2 ln( b+1 )  b-1

recent sequence of papers [5, 19, 9] have reduced the required number of colors further for general graphs, under the assumption that the maximum degree is (log n). The current state of the art requires q  (1 + )(b + 1), for arbitrarily small > 0 [10], but these results do not apply in our setting where the degree b + 1 is fixed.

A

459


2 Preliminaries For b  2, let Tb denote the infinite rooted b-ary tree (in which every vertex has b children). We will be concerned with complete finite subtrees T that are initial portions of Tb , i.e., share the same root; if T has depth m then it has n = (bm+1 - 1)/(b - 1) vertices, and its boundary  T consists of the children (in Tb ) of its leaves, i.e., | T | = bm+1 . We identify subgraphs of T with their vertex sets, and write E (A) for the edges within a subset A, and  A for the boundary of A (i.e., the neighbors of A in (T   T ) \ A). Consider a spin system on T specified by spin values S , pair potential U and singleton potential W as b in the Introduction. Let   S T be a spin configuration b on the infinite tree T . We denote by  the set of T configurations   S T  T that agree with  on  T ; i.e.,  specifies a boundary condition on T . The spin at x is denoted x . For any    and any subset T A  T , the Gibbs distribution on A conditional on the configuration outside A being  is denoted ľ and is A defined as follows: if    agrees with  outside A then - x ; x U (x , y ) + (1) W (x ) ľ ( )  exp A

stationary distribution ľ. We measure the rate of convergence by the mixing time: tmix = min{t : P t (, ˇ ) - ľ 
1 2e

for all   }, (2)

where P t (, ˇ ) denotes the distribution of the dynamics after t steps starting from configuration  , and ˇ is 1 variation distance. (The constant 2e in this definition is for algebraic convenience only.) When we say that the mixing time of the Glauber dynamics is O(n log n) for some boundary condition  , we mean that for all finite T , the mixing time for ľ is T  cn log n for a constant c that depends only on b and the potentials. 3 A criterion for O (n log n) mixing time

In [18] we developed a criterion for O(n log n) mixing time of the Glauber dynamics on trees, and used it to analyze the Ising model both for arbitrary boundary conditions and in the important special case of +1boundaries. This criterion generalizes immediately to arbitrary spin systems, as we describe in this section along with some useful extensions. The key ingredients are two quantities, which we call  and  , that bound the rate of percolation of A y E (A A) disagreements down and up the tree respectively. Both otherwise, ľ ( ) = 0. In particular, when A = T , are properties of the collection of Gibbs distributions A  ľ = ľ is simply the Gibbs distribution on the whole {ľT }, where the boundary condition  is fixed and T T T of T with boundary condition  . We will assume that ranges over all finite complete subtrees of Tb . To ľ is well defined (i.e., that the expression in (1) is define  and  we need a little notation. For a vertex A positive for at least one  ) for every A,  and  (even x  T , write Tx for the (maximal) subtree rooted at x. when ľ ( ) = 0); we call the spin system permissive in When x is not the root of T , let ľs x denote the Gibbs T T this case. (Note that this is an issue only for systems distribution in which the parent of x has its spin fixed with hard constraints.) All the examples in this paper to s and the configuration on the bottom boundary of Tx are clearly permissive. We usually abbreviate  and is specified by  (the global boundary condition on T )§ . T ľ to  and ľ respectively. When there are hard For two distributions ľ1 , ľ2 on , ľ1 -ľ2 x denotes the T o constraints (i.e., U (s1 , s2 ) =  for some s1 , s2 ) we variation distance between the projections s f ľ1 and ľ2 1 remove invalid configurations (i.e., those for which onto the spin at x, i.e., ľ1 - ľ2 x = 2 S |ľ1 (x = s) - ľ2 (x = s)|. Recall that  x,s is the configuration  ľ( ) = 0) from . The (heat-bath) Glauber dynamics is the following with the spin at x set to s. Markov chain on  =  . In configuration   , T D E F I N I T I O N 3 . 1 . For a collection of Gibbs distributions transitions are made as follows: {ľ } as above, define   ({ľ }) and    ({ľ }) by T T T (i) pick a vertex x  T u.a.r., and a spin value s chosen T s s from the distribution of the spin at x conditional on (i)  = supT maxz,s,s ľTz - ľ z z; the spins of its neighbors (i.e., s has the distribution A ,s y  y,s (ii)  = supT max ľA - ľ z, where the maximum of the spin at x in ľx} ); { is taken over all subsets A  T , all boundary (ii) go to configuration  x,s obtained from  by setting configurations  , all sites y   A, all neighbors z  the spin at x to s. A of y , and all spins s, s  S . It is a well-known fact (and easily checked) that the Glauber dynamics is ergodic and reversible w.r.t. the Gibbs distribution ľ = ľ , and so converges to the T
do not specify the configuration in the rest of T \ Tx as it has no influence on the distribution inside Tx once the spin at the parent of x is fixed.
§ We

460


Theorem 3.3 tells us that, to prove O(n log n) mixing time, it is enough to estimate the quantities  and  for the spin system and boundary condition in question. As we shall see, this can be done using calculations The intuition for these definitions comes from the specific to the situation at hand. In [18] we carried following claim, which relates  and  to the rate of out these calculations for the Ising model with various disagreement percolation in the tree. For any T and boundary conditions; our goal in this paper is to persite x  T , write Tx for Tx \ {x}, the subtree Tx form analogous calculations for some other important s excluding its root, and ľT for the Gibbs distribution models, thus demonstrating the utility of the approach. x when the spin at x is fixed to s. Also, for  For some of these other models, we will require two height(x) + 1 write Bx, for the subtree (or "block") of minor but useful generalizations of the above frameheight - 1 rooted at x (i.e., Bx, has levels). For work, which we now describe. Both generalizations two configurations ,   , let | -  |x, denote the stem from the observation that the role of the defininumber of sites levels below x (i.e., on the bottom tions of  and  is to obtain the bounds on disagreement boundary of Bx, ) at which  and  differ. Note that percolation stated in Claim 3.2. In fact, in Theorem 3.3 | -  |  b . we can replace  and  by any two values  and  for which the upper bounds in parts (i) and (ii) of Claim 3.2 C L A I M 3 . 2 . For every x  T and all  height(x) + 1 are O(( b) ) and O( ) respectively (where the O( ˇ ) the following hold: hides constants independent of ). The arguments lead, s,s o s ing to Theorem 3.3 are easily seen to hold in this slightly (i) For all s, s there is a coupling  =  f ľT x looser setting. T s | . and ľ for which E | -  x,  (b) Our first generalization (which will be particularly x useful for "non-attractive" systems, including systems (ii) For any  ,    that have the same spin value at B with hard constraints) is to consider two levels of the parent of x, ľ x, - ľ x, x   ˇ | -  |x, . B the tree at a time, rather than a single level as in Definition 3.1. Accordingly, define The proof of this claim follows from a standard reľ cursive coupling along paths in the tree: see [18, s1 s2 s1 s2 2 = sup max Claim 4.4]. Part (i) shows that  bounds the probability Tz - ľ Tz z ˇ ľ Tw - ľ Tw w , z ,w z, T of a disagreement percolating down the tree: i.e., when s1 ,s1 ,s2 ,s2 we fix a disagreement at x and recursively couple the (3) distributions on the children of x, the expected propor- where w z denotes "w is a child of z ." In fact, we tion of disagreements after levels is at most  . Simi- may restrict the maximization to sites z of even (or larly, from part (ii) we see that  bounds the probability odd) height. With this definition, it is easy to see that of a disagreement percolating up the tree: i.e., when we the upper bound in Claim 3.2(i) for the probability of fix a single disagreement at level below x, the proba- disagreement percolating down the tree can be replaced bility of this disagreement reaching x is at most  . by (2 )2( /2-1) b = O((2 b) ). We therefore get the We now state a theorem that will be our main following generalization of Theorem 3.3: analytical tool. The theorem gives a sufficient condition . for O(n log n) mixing time in terms of the quantities  T H E O R E M 3 . 3 In the setting of Theorem 3.3, if 2 and  satisfy max{ 2 b,  } < 1 then the mixing time of the and  . associated Glauber dynamics is O(n log n). T H E O R E M 3 . 3 . Consider an arbitrary (permissive) spin system and a boundary condition  (a configuration ś These theorems are stated for the special case of the Ising model. on Tb ). If   ({ľ }) and    ({ľ }) satisfy T T However, it is easily seen that their proofs make no use of the specific max{ b,  } < 1 then the mixing time of the associated form of the Ising potentials, and thus apply to arbitrary permissive Glauber dynamics is O(n log n). spin systems on trees.

Remark: Note that  is the same as  , except that the maximization is restricted to A = Tz and the boundary vertex y being the parent of z ; hence always    . Since  involves Gibbs distributions only on maximal subtrees Tz , it may depend on the boundary condition  at the bottom of the tree. By contrast,  bounds the worst-case probability of disagreement for an arbitrary subset A and arbitrary boundary configuration around A, and hence depends only on the potentials of the system and not on  . It is the dependence of  on  that opens up the possibility of an analysis that is specific to the boundary condition.

Proof: The proof follows by combining Theorems 3.4 and 5.1 of [18]ś , which together imply that, under the above conditions on  and  , the logarithmic Sobolev 1 constant of the dynamics is ( n ). By standard facts relating the log-Sobolev constant to the mixing time, this implies that the mixing time is O(n log n).

461


Our second generalization exploits the fact that, when deriving the bound on upward percolation in part (ii) of Claim 3.2, it is enough to control the probability of a disagreement percolating upwards one level from y to z only when z is sufficiently far from the boundary and the root of Bx, . Let  be defined in the same way as  , but with the maximization restricted to sets A that include the full subtree of depth d rooted at z under the orientation in which y is the parent of z ; here d is an implicit parameter whose value may change from model to model, but will in each case be a constant independent of the size of T . Then the probability of disagreement percolating one level upwards to z , where z is at distance at least d from the boundary and root of Bx, , is bounded by . Thus it is easy to modify the proof of Claim 3.2 so that the factor  in part (ii) is replaced by  -2d = O( ). Similarly, we define  as before but with the maximization restricted to z that are at distance at least d from the boundary of T . Whenever we use , we will also use  with the same value of d so that we still have   . This leads to our second generalization of Theorem 3.3: T H E O R E M 3 . 3 . In the setting of Theorem 3.3, if  and  satisfy max{ b, } < 1 then the mixing time of the associated Glauber dynamics is O(n log n).

of part (ii) (when  > 0 ) follows immediately from part (ii) of the above lemma, using Theorem 3.3 and the fact that  < 1. (Note that analyzing the 0-boundary for all depths of T handles both odd and even boundary conditions.) Recall that the calculation of  and  involves bounding the effect of changing the spin of a boundary site on an inside neighbor (measured in terms of variation distance). Consider a subset A, an arbitrary boundary configuration  , and sites y   A and z  A as in Definition 3.1. We need to compare two distribu tions, ľA and ľ . In the first, the site y is occupied (spin value s = 1) and in the second y is unoccupied (spin value s = 0). Now, from the definition of the hard-core model, in the first distribution the site z is unoccupied with certainty, and hence the variation distance between the two distributions at z is exactly the probability that z is occupied in the second distribution (where y is unoccupied, or equivalently, where the edge connecting y and z is removed). Let pz stand for this last probability. To summarize, for any subset A, any two adjacent sites y   A and z  A, and any boundary configuration  ,
y,s A ,s y

ľ A

y,1

 - ľA

y,0

z = ľ (z = 1)  pz . A

y,0

(4)

4 Independent sets (hard-core model) In this section, we will prove that the mixing time of the Glauber dynamics for sampling independent sets is O(n log n) in all the situations covered by Theorem 1.1 in the Introduction. In light of Theorem 3.3, in order to show O(n log n) mixing time it is enough to bound the quantities  and  such that max { b,  } < 1, where  and  can also be replaced by their variants as explained in Section 3. Thus, Theorem 1.1 will follow from:

Our main goal in the rest of this section is to bound the probability of occupation pz for the relevant values of  and global boundary condition  .

Proof of Lemma 4.1 part (i) We start with the easy observation that, for any subset A, any boundary configuration  and any site z  A,  ľ (z = 1)  1+ , simply because the r.h.s. is the probA ability of z being occupied when all its neighbors are unoccupied, and if one of its neighbors is occupied than z is unoccupied with certainty. Using (4), we deduce that     1+ . We can strengthen this to  < 1+ by noticL E M M A 4 . 1 . For the independent sets model with activity ing that equality is achieved in the above only when all parameter : neighbors of z are unoccupied with certainty, which can happen only if all neighbors of z are in  A or adjacent  (i) For all values of , we have  < 1+ (for some choice to  A. So by taking d = 3 in the definition of  we are of the implicit parameter d in ). In addition, for done.   0 and every  > 0, we have   1+ (again for b We go on to consider the case   0 , i.e., when some d = d()). the infinite-volume Gibbs measure is unique. Here, in (ii) For  > 0 , when  is the 0-boundary condition (i.e., order to bound , we consider subsets A that include the full subtree Bz,d of height d rooted at z , where d all sites are unoccupied), we have 2  1 . b is a large enough constant; in fact, w.l.o.g. we may Recall that   , so from part (i) of this lemma we take A to be exactly such a subtree. Recall that we are  conclude that  b < 1 when ( 1+ )2  1 , i.e., when interested in p , the probability that z is occupied when b z   b1 1 , and also whenever   0 . Since also its parent y is unoccupied, or equivalently, when Bz,d is -  < 1 for all finite , part (i) of Theorem 1.1 follows disconnected at its root z from the rest of T . We need using Theorem 3.3 . This also dispenses with part (ii) to show that pz  1+ for every boundary condition  b of Theorem 1.1 in the region   0 . The remainder on the bottom of Bz,d .

462


Now since we only consider the case   0 , where by definition the Gibbs measure is unique, if we let the depth d of Bz,d grow then the corresponding sequence of probabilities pz converges to some value p0 independent of the boundary configuration  . It is therefore enough to show that p0  1 , since then for b large enough d we will have pz  1+ , as required. b The calculation of p0 can be done by recursively calculating the sequence pz as was done in, e.g., [13] and as we did for the Ising model in [18]. Let Rz = pz 1-pz stand for the ratio of probabilities that the site z is occupied and unoccupied respectively, given that its parent is unoccupied. A, simple calculation verifies that 1 w Rz =  where Rw is computed w.r.t. z 1+Rw A = Bw,d-1 and the same  . For each , we thus define the function 1 b J (a) =  1+a (5)
p and conclude that R0  1+0 0 must be a fixed point of J . p Notice that since J is monotonically decreasing and J (0) =  > 0, it has a unique fixed point for every ; we denote this fixed point by a0 = a0 (). Now uniqueness of the Gibbs measure is equivalent to the fixed point a0 being attractive (which corresponds to the existence of the limit p0 ). This in turn corresponds to the derivative of J at a0 being at least -1; indeed, when  = 0 is critical the derivative at a0 = a0 (0 ) is exactly -1. Now let J (a) =  J (a) . We will make use of a the fact that J (a0 ) = J (R0 )  -1 for   0 in order to establish that p0  1 . First note that, by a b straightforward calculation,

this time the sequence Rz does not converge since a0 is not an attractive fixed point. In fact, the sequence oscillates around a0 . It hence makes sense to look at two separate sequences, one for even heights and one for odd heights. This leads us to analyze the function J2 (a)  J (2) (a)  J (J (a)), which corresponds to jumping two levels at a time. Notice that for a site z () at height 2 from the bottom, Rz = J2 (0). Let us now describe some properties of the function J2 (see Fig. 1): 1. J2 is continuous and increasing on [0, ), with J2 (0) = /(1 + )b and supa J2 (a) = . 2. a0 (the unique fixed point of J ) is a fixed point of J2 . 3. If   0 then a0 is the unique fixed point of J2 . If  > 0 then J2 has three fixed points a1 < a0 < a2 , where J (a1 ) = a2 and J (a2 ) = a1 .
2( 4. The derivative J2 (a)   J aa) is continuous, and 2 for  > 0 , J (a) is monotonically increasing on [0, a1 ] with J2 (a1 ) < 1.

a J2 (a)

J (a) = -b ˇ

J (a) . 1+a

(6)
a1 a0 a2

a

Thus, since R0 is a fixed point of J , we have -J (R0 ) = (R0 R0 b ˇ J+R0) = b ˇ 1+R0 = bp0 . On the other hand, since Figure 1: Curve of the function J2 (a), used in the proof 1 -J (R0 ) = -J (a0 )  1, we conclude that p0  1 , of Lemma 4.1, for  > 0 . The points a1 , a0 , a2 are the b as required. This completes the proof of part (i) of fixed points of J2 in increasing order. Lemma 4.1. Proof of Lemma 4.1 part (ii) In this part, the boundary condition  is set to all-0 and  > 0 , meaning that the Gibbs measure is not unique or equivalently, as we saw in the previous part, J (a0 ) < -1. Once again, our aim is to calculate the probabilities of occupation pz . Here, however, A is a maximal subtree Tz and the bottom boundary condition is  (rather than arbitrary). We start by noticing that since  is the all-0 configuration, then Rz = J ( ) (0), where J ( ) stands for the -fold composition of J and is the distance of z from the bottom boundary of T . However, It is thus easy to see that, for example, for  > 0 , () J2 (0) converges to a1 from below, i.e, for all sites z whose height is even, Rz  a1 . Similarly, if the height of z is odd, Rz  a2 . We will use the properties of J2 in order to bound 2 by 1 . Recall that 2 = 2 b maxw z ľ1 z - ľ0 z z ˇ ľ1 w - ľ0 w w = maxw z pz pw , T T T T and that it is enough to consider sites z whose height in T is odd. We will show that, for every site z of odd height, and every child w of z , pz pw  b1 . 2 Again we show a connection between the derivative, this time of J2 , and the relevant probabilities of occupation, i.e., pz pw . Using (6) to calculate J2 (a) gives:

463


= J (a) = J J (a)) ˇ J (a) = b
2 (

(J (a)) 1 + J (a)

J

ˇ

b

(a) 1+a

J

This can be shown as follows. Notice that for every  y,s color s that differs from both s and s , ľA (z = s ) =
 1-ľA y,s

(z =s

)

b2 ˇ

J2 (a) J (a) a ˇ ˇ . a 1 + J (a) 1 + a

Recalling that z is the parent of w, together with the fact that since  is uniformly 0 we have Rw = Rw for any child w of z , we observe that Rz = J (Rw ). Therefore, ( ( Rz Rw J2 (Rw ) = b2 ˇ J2RRw ) ˇ 1+Rz ˇ 1+Rw = b2 ˇ J2RRw ) ˇ w w pz pw . Now, since z is of odd height then w is of even height and therefore Rw  a1 . Notice that this means J2 (Rw )  Rw , because J is continuous with J (0) > 0 and a1 is the smallest fixed point of J . Combining this with the fact that J2 (Rw ) < 1 (since Rw  a1 ), we conclude that pz pw  b1 . This completes the proof of 2 part (ii) of Lemma 4.1. 5 Colorings (antiferromagnetic Potts model at zero temperature) In this section we will prove Theorem 1.3: that the mixing time of the Glauber dynamics for sampling proper colorings is O(n log n) for all boundary conditions provided that the number of colors q  b + 2, i.e., whenever the infinite-volume Gibbs measure is unique. In order to again make use of the machinery from Section 3 involving  and  , we prove the following:

ľ (z = s), then ľ (z = s )  ľ (z = s ) for A every s = s and so the event E = {z = s } maximizes the expression |ľ (E ) - ľ (E )| over all events E that A only depend on z . It is now convenient to consider the distribution induced by removing the edge from z to y (i.e., with y, a "free" condition at y ). Call this distribution ľ . A y, Let pz (s) = ľ (z = s), and notice that for the A  y,s (s ) colorings model ľA (z = s ) = 1pzpz (s) simply because - ľ ( ˇ ) = ľ (ˇ | z = s) in this model. A A
y,s maxs,s ľ - A (s ) maxs,s 1pzpz (s) . - y,s y, y,s A ,s y

1-ľ
A

A ,s y

(z =s)

ˇ ľ

A ,s y

(z = s ). Hence, if ľ (z = s )  A
y,s A ,s y

y,s

 y,s

Thus,

ľ

 y,s

A

z=

y,s maxs,s ľ (z A

= s) =

The fact that  < 1 means not only that the b influence of any boundary configuration on the spin at the root decays with the distance of the boundary from the root (as is already implied by the fact the Gibbs measure is unique), but that it decays exponentially fast. This fact is of independent interest, and to the best of our understanding was not obvious from the proof of uniqueness given in [12].

Since in [12] it was shown that the Gibbs measure is unique for all q  b + 2, we conclude that for these 1 values of q ,   q-+- < 1 . Theorem 1.3 now follows 1 b from Theorem 3.3 and   . Remark:

L E M M A 5 . 1 . For the colorings model with q colors, if the infinite-volume Gibbs measure is unique then for maxs,s 1 every  > 0 we have   q-+- (for a suitable choice 1 d = d() of the implicit constant in ).

So to obtain the claimed bound on  we have to show that, for all sets A as above, and all boundary (s ) 1 configurations  , maxs,s 1pzpz (s)  q-++ . It is at this - 1 point that we use the assumption that the Gibbs measure is unique. This means that, if d (the depth of the full subtree contained in A) is large enough, the distribution pz ( ˇ ) is arbitrarily close to the uniform distribution, regardless of the boundary configuration  . Thus, for every  > 0 there exists a (large enough) constant d such that pz (s)  1+ for all colors s. Hence, q
) z (s p 1-pz (s)


1+ q -1- ,

as required.

6 Ferromagnetic Potts model

In this section we will prove that, for the Potts model, the mixing time is O(n log n) in all the situations described in Theorem 1.2. As in the discussion of other models in previous sections, we use the machinery from Section 3. The following lemma sets out the relevant properties of  and  ; the proof uses similar ideas to those in the proofs of Lemmas 4.1 and 5.1 and is left for the full version. L E M M A 6 . 1 . For the Potts model with q colors at inverse temperature  the following hold:
e -1 (i)   e2 +1 -  , where  =  (b, q ,  )  0 with equality if and only if q = 2. Furthermore,  (b, q ,  ) is increasing in q and decreasing in b and  . [The exact definition of  (b, q ,  ) is rather involved and will be given in the full version of the paper].
2

Proof of Lemma 5.1: Recall that in order to bound , we need to consider a subset A that includes the full subtree of depth d rooted at z , and bound the variation z for an arbitrary bounddistance maxs,s ľ - ľ A ary configuration  , where y is the (unique) neighbor of z in  A. Now it is easy to see that for the colorings model, ľ - ľ A
y,s A ,s y y,s A ,s y

z = max{ľ (z = s ), ľ A

y,s

A ,s y

(z = s)}.

(ii) If  < 0 (i.e., the Gibbs measure is unique) then  < 1 (for an appropriate choice of d, the implicit b constant in ).

464


(iii) If   0 and the boundary condition  is constant, then   1 . b (iv) If the boundary condition  is free, then  = e2 -1 . e2 +q -1 Part (i) of Theorem 1.2 follows from parts (i) and (ii) of Lemma 6.1 and the fact that for any boundary condition    . Part (ii) of Theorem 1.2 follows from parts (ii) and (iii) of the lemma and the fact that  < 1 (as is apparent from part (i) of the lemma). Finally, part (iii) of Theorem 1.2 follows from parts (i) and (iv) of the lemma. In fact, in light of the bound on  from part (i), the range of  in part (iii) of Theorem 1.2 can be improved slightly by letting 1 be the so21 1 e21 1 lution to the equation e21 +--1 ( e21 -1 -  ) = 1 , where b e q +  =  (b, q ,  ) is as in part (i) of Lemma 6.1. We note that this modified definition of 1 is only marginally larger than the original definition of 1 in Theorem 1.2, and we mention it only in order to show that we can go further than the original threshold, a fact that is interesting due to its implication for the reconstruction problem [20] as mentioned in the Introduction. We elaborate on this point in the full version of the paper. References
[1] R . J . B A X T E R, Exactly solved models in statistical mechanics, Academic Press, London, 1982. [2] N . B E R G E R , C . K E N Y O N , E . M O S S E L and Y. P E R E S, "Glauber dynamics on trees and hyperbolic graphs," preprint, 2003. Preliminary version: C . K E N Y O N , E . M O S S E L and Y. P E R E S, "Glauber dynamics on trees and hyperbolic graphs," Proc. 42nd IEEE Symposium on Foundations of Computer Science, 2001, pp. 568­578. [3] G . R . B R I G H T W E L L and P W I N K L E R, "Random colorings . of a Cayley tree," Contemporary combinatorics, Bolyai Soc. Math. Studies 10, Janos Bolyai Math. Soc., Bu´ dapest, 2002, pp. 247­276. [4] G . R . B R I G H T W E L L and P W I N K L E R, "A second threshold . for the hard-core model on a Bethe lattice," INI Report NI03002-CMP To appear in Rand. Struc. & Alg. . [5] M . D Y E R and A . F R I E Z E, "Randomly colouring graphs with lower bounds on girth and maximum degree," Proc. 42nd IEEE Symposium on Foundations of Computer Science, 2001, pp. 579­587. [6] W. E VA N S , C . K E N Y O N , Y. P E R E S and L . J . S C H U L M A N, "Broadcasting on trees and the Ising model," Ann. Appl. Probab. 10 (2000), pp. 410­433. [7] H . - O. G E O R G I I, Gibbs measures and phase transitions, Walter de Gruyter & Co., Berlin, 1988. ¨ ¨ [8] O. H A G G S T R O M, "The random-cluster model on a homogeneous tree," Probab. Theory Related Fields 104 (1996), pp. 231­253. [9] T. P H AY E S, "Randomly coloring graphs with girth five," . Proc. 35th ACM Symposium on Theory of Computing, 2003, pp. 269­278.

[10] T. P H AY E S and E . V I G O D A, "A non-Markovian coupling . for randomly sampling colorings," to appear in Proc. 44th IEEE Symp. on Found. of Comp. Sci., 2003. [11] M . J E R R U M , J . - B . S O N , P T E TA L I and E . V I G O D A, "Ele. mentary bounds on Poincare and log-Sobolev constants ´ for decomposable Markov chains," INI Report NI03006CMP To appear in Ann. Appl. Probab. . [12] J . J O N A S S O N "Uniqueness of uniform random colorings of regular trees," Statist. Probab. Lett. 57 (2002), pp. 243­248. [13] F. P K E L LY, "Stochastic models of computer commu. nication systems," J. Royal Statist. Soc. B 47 (1985), pp. 379­395. [14] M . L U B Y and E . V I G O D A, "Fast convergence of the Glauber dynamics for sampling independent sets," Rand. Struct. & Algo. 15 (1999), pp. 229­241. [15] R . LY O N S, Phase transitions on non amenable graphs, J. Math. Phys. 41 (2000), pp. 1099­1127. [16] J . B . M A RT I N, "Reconstruction thresholds on regular trees," http://arxiv.org/abs/math.PR/0305400, to appear in DMTCS volume Proc. Discrete Random Walks 2003. [17] F. M A RT I N E L L I, "Lectures on Glauber dynamics for discrete spin models," Lectures on Probability Theory and Statistics (Saint-Flour, 1997), Lecture notes in Mathematics 1717, pp. 93­191, Springer, Berlin, 1998. [18] F. M A RT I N E L L I , A . S I N C L A I R and D . W E I T Z, "The Ising model on trees: Boundary conditions and mixing time," Tech. Report UCB//CSD -031256, Dept. of EECS, UC Berkeley, July 2003. http://sunsite.Berkeley.EDU/NCSTRL/ Extended abstract to appear in Proc. 44th IEEE Symposium on Foundations of Computer Science, 2003. [19] M . M O L L O Y, "The Glauber dynamics on colorings of a graph with high girth and maximum degree," Proc. 34th ACM Symp. on Th. of Comp., 2002, pp. 91­98. [20] E . M O S S E L, "Survey: Information flow on trees," Preprint, October 2002, to appear in DIMACS volume Graphs, Morphisms and Statistical Physics. [21] E . M O S S E L and Y. P E R E S, "Information flow on trees," Ann. Appl. Probab. 13 (2003), pp. 817­844. [22] R . B . P O T T S, "Some generalized order-disorder transformations," Proc. Cambridge Philos. Soc. 48, pp. 106­ 109. [23] C . J . P R E S T O N, Gibbs states on countable sets, Cambridge University Press, London, 1974. [24] F. S P I T Z E R, "Markov random fields on an infinite tree," Ann. Probab. 3 (1975), pp. 387­398. [25] E . V I G O D A, "Improved bounds for sampling colorings," J. Math. Phys. 41 (2000), pp. 1555­1569. [26] E . V I G O D A, "A note on the Glauber dynamics for sampling independent sets," Electronic Journal of Combinatorics 8(1) (2001). [27] D . W E I T Z "Combinatorial criteria for uniqueness of the Gibbs measure," preprint, 2003. [28] F. Y. W U, "The Potts model," Reviews of Modern Physics 54 (1982), pp. 235­268.

465