Adaptive Bayesian Inference ¨¨¨ Ozgur Sumer Uni. of Chicago Chicago, IL osumer@cs.uchicago.edu Umut A. Acar Toyota Tech. Inst. Chicago, IL umut@tti-c.org Alexander T. Ihler U.C. Irvine Irvine, CA ihler@ics.uci.edu Ramgopal R. Mettu Univ. of Massachusetts Amherst, MA mettu@ecs.umass.edu Abstract Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1 Introduction Graphical models [14, 8] are a powerful tool for probabilistic reasoning over sets of random variables. Problems of inference, including marginalization and MAP estimation, form the basis of statistical approaches to machine learning. In many applications, we need to perform inference under dynamically changing conditions, such as the acquisition of new evidence or an alteration of the conditional relationships which make up the model. Such changes arise naturally in the experimental setting, where the model quantities are empirically estimated and may change as more data are collected, or in which the goal is to assess the effects of a large number of possible interventions. Motivated by such applications, Delcher et al. [6] identify dynamic Bayesian inference as the problem of performing Bayesian inference on a dynamically changing graphical model. Dynamic changes to the graphical model may include changes to the observed evidence, to the structure of the graph itself (such as edge or node insertions/deletions), and changes to the conditional relationships among variables. To see why adapting to dynamic changes is difficult, consider the simple problem of Bayesian inference in a Markov chain with n variables. Suppose that all marginal distributions have been computed in O(n) time using the sum-product algorithm, and that some conditional distribution at a node u is subsequently updated. One way to update the marginals would be to recompute the messages computed by sum-product from u to other nodes in the network. This can take (n) time because regardless of where u is in the network, there always is another node v at distance (n) from u. A similar argument holds for general tree-structured networks. Thus, simply updating sum-product messages can be costly in applications where marginals must be adaptively updated after changes to the model (see Sec. 5 for further discussion). In this paper, we present a technique for efficient adaptive inference on graphical models. For a treestructured graphical model with n nodes, our approach supports the computation of any marginal, updates to conditional probability distributions (including observed evidence) and edge insertions U. A. Acar is supported by a gift from Intel. R. R. Mettu is supported by a National Science Foundation CAREER Award (IIS-0643768). 1 and deletions in expected O(log n) time. As an example of where adaptive inference can be effective, consider a computational biology application that requires computing the state of the active site in a protein as the user modifies the protein (e.g., mutagenesis). In this application, we can represent the protein with a graphical model and use marginal computations to determine the state of the active site. We reflect the modifications to the protein by updating the graphical model representation and performing marginal queries to obtain the state of the active site. We show in Sec. 5 that our approach can achieve a speedup of one to two orders of magnitude over the sum-product algorithm in such applications. Our approach achieves logarithmic update and query times by mapping an arbitrary tree-structued graphical model into a balanced representation that we call a cluster tree (Sec. 3­4). We perform an O(n)-time preprocessing step to compute the cluster tree using a technique known as tree contraction [13]. We ensure that for an input network with n nodes, the cluster tree has an expected depth of O(log n) and expected size O(n). We show that the nodes in the cluster tree can be tagged with partial computations (corresponding to marginalizations of subtrees of the input network) in way that allows marginal computations and changes to the network to be performed in O(log n) expected time. We give simulation results (Sec. 5) that show that our algorithm can achieve a speedup of one to two orders of magnitude over the sum-product algorithm. Although we focus primarily on the problem of answering marginal queries, it is straightforward to generalize our algorithms to other, similar inference goals, such as MAP estimation and evaluating the likelihood of evidence. We note that although tree-structured graphs provide a relatively restrictive class of models, junction trees [14] can be used to extend some of our results to more general graphs. In particular, we can still support changes to the parameters of the distribution (evidence and conditional relationships), although changes to the underlying graph structure become more difficult. Additionally, a number of more sophisticated graphical models require efficient inference over trees at their core, including learning mixtures of trees [12] and tree-reparameterized max-product [15]. Both these methods involve repeatedly performing a message passing algorithm over a set of trees with changing parameters or evidence, making efficient updates and recomputations a significant issue. Related Work. It is important to contrast our notion of adapting to dynamic updates to the graphical model (due to changes in the evidence, or alterations of the structure and distribution) with the potentially more general definition of dynamic Bayes' nets (DBNs) [14]. Specifically, a DBN typically refers to a Bayes' net in which the variables have an explicit notion of time, and past observations have some influence on estimates about the present and future. Marginalizing over unobserved variables at time t - 1 typically produces increased complexity in the the model of variables at time t. However, in both [6] and this work, the emphasis is on performing inference with current information only, and efficiency is obtained by leveraging the similarity between the previous and newly updated models. Our work builds on previous work by Delcher, Grove, Kasif and Pearl [6]; they give an algorithm to update Bayesian networks dynamically as the observed variables in the network change and compute belief queries of hidden variables in logarithmic time. The key difference between their work and ours is that their algorithm only supports updates to observed evidence, and does not support dynamic changes to the graph structure (i.e., insertion/deletion of edges) or to conditional probabilities. In many applications it is important to consider the effect of changes to conditional relationships between variables; for example, to study protein structure (see Sec. 5 for further discussion). In fact, Delcher et al. cite structural updates to the given network as an open problem. Another difference includes the use of tree contraction: they use tree contractions to answer queries and perform updates. We use tree contractions to construct a cluster tree, which we then use to perform queries and all other updates (except for insertions/deletions). We provide an implementation and show that this approach yields significant speedups. Our approach to clustering factor graphs using tree contractions is based on previous results that show that tree contractions can be updated in expected logarithmic time under certain dynamic changes by using a general-purpose change-propagation algorithm [2]. The approach has also been applied to a number of basic problems on trees [3] but has not been considered in the context of statistical inference. The change-propagation approach used in this work has also been extended to provide a general-purpose technique for updating computations under changes to their data and applied to a number of applications (e.g. [1]). 2 2 Background Graphical models provide a convenient formalism for describing the structure of a function g defined over a set of variables x1 , . . . , xn (most commonly a joint probability distribution over the xi ). Graphical models use this structure to organize computations and create efficient algorithms for many inference tasks over g , such as finding a maximum a-posteriori (MAP) configuration, marginalization, or computing data likelihood. For the purposes of this paper, we assume that each variable xi takes on values from some finite set, denoted Ai . We write the operation of x marginalizing over xi as , and let Xj represent an index-ordered subset of variables and f (Xj ) i a function defined over those variables, so that for example if Xj = {x2 , x3 , x5 }, then the function f (Xj ) = f (x2 , x3 , x5 ). We use X to indicate the index-ordered set of all {x1 , . . . , xn }. Factor Graphs. A factor graph [10] is one type of graphical model, similar to a Bayes' net [14] or Markov random field [5] used to represent the factorization structure of a function g (x1 ,j. . . , xn ). In particular, suppose that g decomposes into a product of simpler functions, g (X ) = fj (Xj ), for some collection of real-valued functions fj , called factors, whose arguments are (index-ordered) sets Xj X . A factor graph consists of a graph-theoretic abstraction of g 's factorization, with vertices of the graph representing variables xi and factors fj . Because of the close correspondence between these quantities, we abuse notation slightly and use xi to indicate both the variable and its associated vertex, and fj to indicate both the factor and its vertex. Definition 2.1. A factor graph is a bipartite graph G = (X + F, E ) where X = {x1 , x2 , . . . , xn } is a set of variables, F = {f1 , f2 , . . . , fm } is a set of factors and E X × F . A factor tree is a factor graph G where G is a tree. The neighbor set N (v ) of a vertex v is the (ijndex-ordered) set of vertices adjacent to vertex v . The graph G represents the function g (X ) = fj (Xj ) if, for each factor fj , the arguments of fj are its neighbors in G, i.e., N (fj ) = Xj . Other types of graphical models, such as Bayes' nets [14], can be easily converted into a factor graph representation. When the Bayes' net is a polytree (singly connected directed acyclic graph), the resulting factor graph is a factor tree. The Sum-Product Algorithm. The factorization of g (X ) and its structure as represented by the graph G can be used to organize various computations about g (X ) efficiently. For example, the X marginals of g (X ), defined for each i by g i (xi ) = \{xi } g (X ) can be computed using the sum­product algorithm. Sum-product is best described in terms of messages sent between each pair of adjacent vertices in the factor graph. For every pair of neighboring vertices (xi , fj ) E , the vertex xi sends a message µxi fj as soon as it receives the messages from all of its neighbors except for fj , and similarly for the message from fj to xi . The messages between these vertices take the form of a real-valued function of the variable xi ; for discrete-valued xi this can be represented as a vector of length |Ai |. The message µxi fj sent from a variable vertex xi to a neighboring factor vertex fj , and the message µfj xi from factor fj to variable xi are given by f X x µxi fj (xi ) = µf xi (xi ) µfj xi (xi ) = fj (Xj ) µxfj (x) N (xi )\fj j \xi Xj \xi Once all the messages (2 |E | in total) are sent, we can fcalculate the marginal g i (xi ) by simply multiplying all the incoming messages, i.e., g i (xi ) = N (xi ) µf xi (xi ). Sum­product can be thought of as selecting an efficient elimination ordering of variables (leaf to root) and marginalizing in that order. Other Inferences. Although in this paper we focus on marginal computations using sum­product, similar message passing operations can be generalized to other tasks. For example, the operations of sum­product can be used to compute the data likelihood of any observed evidence; such computations are an inherent part of learning and model comparisons (e.g., [12]). More generally, similar algorithms can be defined to compute functions over any semi­ring possessing the distributive property [11]. Most commonly, the max operation produces a dynamic programming algorithm ("max­product") to compute joint MAP configurations [15]. 3 f1 ¯ f1 (Round 1) (Round 2) (Round 3) 1 0x1 f3 1 011 x200 f2 ¯ f2 ¯ f4 x1 ¯ ¯ f3 x1 ¯ f3 x2 ¯ ¯ f1 11 00 f3 x1 1 11 11 0x300 f4 00 11 00 ¯ f5 f5 1 0 x4 ¯ f2 1 x2 0 x2 ¯ 1 03 x ¯ f4 ¯ f5 x4 ¯ x4 1 0x 3 x4 ¯ ¯ f3 (Round 4) x3 x4 ¯ Figure 1: Clustering a factor graph with rake, compress, finalize operations. 3 Constructing the Cluster Tree In this section, we describe an algorithm for constructing a balanced representation of the input graphical model, that we call a cluster tree. Given the input graphical model, we first apply a clustering algorithm that hierarchically clusters the graphical model, and then apply a labeling algorithm that labels the clusters with cluster functions that can be used to compute marginal queries. Clustering Algorithm. Given a factor graph as input, we first tag each node v with a unary cluster that consists of v and each edge (u, v ) with a binary cluster that consists of the edge (u, v ). We then cluster the tree hierarchically by applying the rake, compress, and finalize operations. When applied to a leaf node v with neighbor u, the rake operation deletes the v and the edge (u, v ), and forms unary cluster by combining the clusters which tag either v or (u, v ); u is tagged with the resulting cluster. When applied to a degree-two node v with neighbors u and w, a compress operation deletes v and the edges (u, v ) and (v , w), inserts the edge (u, w), and forms a binary cluster by combining the clusters which tag the deleted node and edges; (u, w) is then tagged with the resulting cluster. A finalize operation is applied when the tree consists of a single node (when no edges remain); it constructs a final cluster that consists of all the clusters with which the final node is tagged. We cluster a tree T by applying rake and compress operations in rounds. Each round consists of the following two steps until no more edges remain: (1) Apply the rake operation to each leaf; (2) Apply the compress operation to an independent set of degree-two nodes. We choose a random independent set: we flip a coin for each node in each round and apply compress to a degree-two node only if it flips heads and its two neighbors flips tails. This ensures that no two adjacent nodes apply compress simultaneously. When all edges are deleted, we complete the clustering by applying the finalize operation. x3 ¯ ¯ f3 x1 f3 x2 ¯ ¯ ¯ f1 f1 x3 x3f3 x4 ¯ ¯ ¯ f5 x4 f4 = x3x4 x3f4 f4 x4f4 x4f5 ¯ x1 x1f3 f2 x2 x2f3 f5 x1f1 f2 x2f2 Figure 2: A cluster tree. Fig. 1 shows a four-round clustering of a factor graph and Fig. 2 shows the corresponding cluster tree. In round 1, nodes f1 , f2 , f5 are raked and f4 is compressed. In round 2, x1 , x2 , x4 are raked. In round 3, f3 is raked. A finalize operation is applied in round 4 to produce the final cluster. The leaves of the cluster tree correspond to the nodes and the edges of the factor graph. Each internal node v corresponds a unary or a binary cluster formed by deleting v . The ¯ children of an internal node are the edges and the nodes deleted during the operation that forms the ¯ cluster. For example, the cluster f1 is formed by the rake operation applied to f1 in round 1. The ¯ children of f1 are node f1 and edge (f1 , x1 ), which are deleted during that operation. 4 Labeling Algorithm. After building the cluster tree, we compute cluster functions along with a notion of orientation for neighboring clusters in a second pass, which we call labeling.1 The cluster function at a node v in the tree is computed recursively using the cluster functions at v 's child ¯ ¯ clusters, which we denote Sv = {v1 , . . . , vk }. Intuitively, each cluster function corresponds to a ¯ ¯ ¯ partial marginalization of the factors contained in cluster v . ¯ Since each cluster function is defined over a subset of the variables in the original graph, we require some additional notation to represent these sets. Specifically, for a cluster v , let A(v ) be ¯ ¯ the argumeints of its cluster function, and let V (v ) be the set of all arguments of its children, ¯ V (v ) = ¯ A(vi ). In a slight abuse of notation, we let A(v ) be the arguments of the node v in ¯ the original graph, so that if v is a variable node A(v ) = v and if v is a factor node A(v ) = N (v ). The cluster functions cv (·) and their arguments are then defined recursively, as follows. For the base ¯ case, the leaf nodes of the cluster tree correspond to nodes v in the original graph, and we define cv using the original variables and factors. If v is a factor node fj , we take cv (A(v )) = fj (Xj ), and if v is a variable node xi , A(v ) = xi and cv = 1. For nodes of the cluster tree corresponding to edges (u, v ) of the original graph, we simply take A(u, v ) = and cu,v = 1. The cluster function at an internal node of the cluster tree is then given by combining the cluster functions of its children and marginalizing over as many variables as possible. Let v be the internal ¯ node corresponding to the removal of v in the original graph. If v is a binary cluster (u, w), that is, ¯ at v 's removal it had two neighbors u and w, then cv is given by ¯ V ¯ cv (A(v )) = ¯ cvi (A(vi )) ¯ ¯ ¯ (v )\A(v ) vi Sv ¯ ¯ ¯ where the arguments A(v ) = V (v ) (A(u) A(w)). For unary cluster v , where v had a single ¯ ¯ ¯ neighbor u at its removal, cv (·) is calculated in the same way with A(w) = . ¯ We also compute an orientation for each cluster's neighbors based on their proximity to the cluster tree's root. This is also calculated recursively using the orientations of each node's ancestors. For a unary cluster v with parent u in the cluster tree, we define in(v ) = u. For a binary cluster v with ¯ ¯ ¯ ¯ ¯ neighbors u, w at v 's removal, define in(v ) = w and out(v ) = u if w = in(u); otherwise in(v ) = u ¯ ¯ ¯ ¯¯ ¯ ¯ ¯ and out(v ) = w. ¯ ¯ We now describe the efficiency of our clustering and labeling algorithms and show that the resulting cluster tree is linear in the size of the input factor graph. Theorem 1 (Hierarchical Clustering). A factor tree of n nodes with maximum degree of k can be clustered and labeled in expected O(dk+2 n) time where d is the domain size of each variable in the factor tree. The resulting cluster tree has exactly 2n - 1 leaves and n internal clusters (nodes) and expected O(log n) depth where the expectation is taken over internal randomization (over the coin flips). Furthermore, the cluster tree has the following properties: (1) each cluster has at most k + 1 children, and (2) if v = (u, w) is a binary cluster, then u and w are ancestors of v , and one of them ¯ ¯ ¯ ¯ is the parent of v . ¯ Proof. Consider first the construction of the cluster tree. The time and the depth bound follow from previous work [2]. The bound on the number of nodes holds because the leaves of the cluster tree correspond to the n - 1 edges and n nodes of the factor graph. To see that each cluster has at most k + 1 children, note that the a rake or compress operation deletes one node and the at most k edges incident on that node. Every edge appearing in any level of the tree contraction algorithm is represented as a binary cluster v = (u, w) in the cluster tree. Whenever an edge is removed, one of ¯ the nodes incident to that edge, say u is also removed, making u the parent of v . The fact that w is ¯ ¯ ¯ also an ancestor of v follows from an induction argument on the levels. ¯ Consider the labeling step. By inspection of the labeling algorithm, the computation of the arguments A(·) and V (·) requires O(k ) time. To bound the time for computing a cluster function, observe that A(v ) is always a singleton set if v is a unary cluster, and A(v ) has at most two variables ¯ ¯ ¯ if v is a binary cluster. Therefore, |V (v )| k + 2. The number of operations required to compute ¯ ¯ 1 Although presented here as a separate labeling operation, the cluster functions can alternatively be computed at the time of the rake or compress operation, as they depend only on the children of v , and the orientations ¯ can be computed during the query operation, since they depend only on the ancestors of v . ¯ 5 ¯ the cluster function at v is bounded by O(|Sv | d|V (v)| ), where Sv are the children of v . Since each ¯ ¯ ¯ |¯ cluster can appear only once as a child, Sv | is O(n) and thus the labeling step takes O(dk+2 n) ¯ time. Although the running time may appear large, note that the representation of the factor graph takes O(dk n) space if functions associated with factors are given explicitly. 4 Queries and Dynamic Changes We give algorithms for computing marginal queries on the cluster trees and restructuring the cluster tree with respect to changes in the underlying graphical model. For all of these operations, our algorithms require expected logarithmic time in the size of the graphical model. Queries. We answer marginal queries at a vertex v of the graphical node by using the cluster tree. At a high level, the idea is to find the leaf of the cluster tree corresponding to v and compute the messages along the path from the root of the cluster tree to v . Using the orientations computed during the tagging pass, for each cluster v we define the following messages: ¯ m , V ¯ cui (A(ui )) ¯ if u = in(v ) ¯ ¯ ¯ in(u)u ¯ ¯ ( u ) \ A( v ) ¯ ¯ ui Su \{v } ¯ ¯ m , muv = ¯¯ ¯ V ¯ if u = out(v ), ¯ ¯ ¯ out(u)u ¯ ¯ ui Su \{v } cui (A(ui )) ¯ ( u ) \ A( v ) ¯ ¯ ¯ where Su is the set of the children of u. Note that for unary clusters, out(·) is undefined; we define ¯ ¯ the message in this case to be 1. Theorem 2 (Query). Given a factor tree with n nodes, maximum degree k , domain size d, and its cluster tree, the marginal at any xi can be computed with the following formula ¯ V cvi (A(vi )), ¯ g i (xi ) = mout(xi )xi min(xi )xi ¯ (xi )\{xi } ¯ vi Sxi ¯ k+2 where Sxi is the set of children of xi , in O(k d ¯ ¯ log n) time. Messages are computed only at the ancestors of the query node xi and downward along the path to xi ; there are at most O(log n) nodes in this path by Theorem 1. Computing each message requires at most O(k dk+2 ) time, and any marginal query takes O(k dk+2 log n) time. Dynamic Updates. Given a factor graph and its cluster tree, we can change the function of a factor and update the cluster tree by starting at the leaf of the cluster tree that corresponds to the factor and relabeling all the clusters on the path to the root. Updating these labels suffices, because the label of a cluster is a function of its children only. Since relabeling a cluster takes O(k dk+2 ) time and the cluster tree has expected O(log n) depth, any update requires O(k dk+2 log n) time. To allow changes to the factor graph itself by insertion/deletion of edges, we maintain a forest of factor trees and the corresponding cluster trees (obtained by clustering the trees one by one). We also maintain the sequence of operations used to construct each cluster tree, i.e., a data structure which represents the state of the clustering at each round. Note that this structure is also size O(n), since at each round a constant fraction of nodes are removed (raked or compressed) in expectation. Suppose now that the user inserts an edge that connects two trees, or deletes an edge connecting two subtrees. It turns out that both operations have only a limited effect on the sequence of clustering operations performed during construction, affecting only a constant number of nodes at each round of the process. Using a general-purpose change propagation technique (detailed in previous work [2, 1]) the necessary alterations can be made to the cluster tree in expected O(log n) time. Change propagation gives us a new cluster tree that corresponds to the cluster tree that we would have obtained by re-clustering from scratch, conditioned on the same internal randomization process. In addition to changing the structure of the cluster tree via change propagation, we must also change the labeling information (cluster functions and orientation) of the affected nodes, which can be done using the same process described in Sec. 3. It is a property of the tree contraction process that all such affected clusters form a subtree of the cluster tree that includes the root. Since change propagation affects an expected O(log n) clusters, and since each cluster can be labeled in O(k dk+2 ) time, the new labels can be computed in O(k dk+2 log n) time. For dynamic updates, we thus have the following theorem. 6 Theorem 3 (Dynamic Updates). For a factor forest F of n vertices with maximum degree k , and domain size d, the forest of cluster trees can be updated in expected O(k dk+2 log n) time under edge insertions/deletions, and changes to factors. 5 Implementation and Experimental Results We have implemented our algorithm in Matlab2 and compared its performance against the standard two-pass sum-product algorithm (used to recompute marginals after dynamic changes). Fig. 3 shows the results of a simulation experiment in which we considered randomly generated factor trees between 100 and 1000 nodes, with each variable having 51 , 52 , or 53 states, so that each factor has size between 52 and 56 . These factor tree correspond roughly to the junction trees of models with between 200 and 6000 nodes, where each node has up to 5 states. Our results show that the time required to build the cluster tree is comparable to one run of sum-product. Furthermore, the query and update operations in the cluster tree incur relatively small constant factors in their asymptotic running time, and are between one to two orders of magnitude faster than recomputing from scratch. A particularly compelling application area, and one of the original motivations for developing our algorithm, is in the analysis of protein structure. Graphical models constructed from protein structures have recently been used to successfully predict structural properties [17] as well as free energy [9]. These models are typically constructed by taking each node as an amino acid whose states represent their most common conformations, known as rotamers [7], and basing conditional probabilities on proximity, and a physical energy function (e.g., [16]) and/or empirical data [4]. Our algorithm is a natural choice for problems where various aspects of protein structure are allowed to change. One such application is computational mutagenesis, in which functional amino acids in a protein structure are identified by examining systematic amino acid mutations in the protein structure (i.e., to characterize when a protein "loses" function). In this setting, performing updates to the model (i.e., mutations) and queries (i.e., the free energy or maximum likelihood set of rotameric states) to determine the effect of updates would be likely be far more efficient than standard methods. Thus, our algorithm has the potential to substantially speed up computational studies that examine each of a large number local changes to protein structure, such as in the study of protein flexibility and dynamics. Interestingly, [6] actually give a sample application in computational biology, although their model is a simple sequence-based HMM in which they consider the effect of changing observed sequence on secondary structure only. The simulation results given in Fig. 3 validate the use of our algorithm for these applications, since protein-structure based graphical models have similar complexity to the inputs we consider: proteins range in size from hundreds to thousands of amino acids, and each amino acid typically has relatively few rotameric states and local interactions. As in prior work [17], our simulation considers a small number of rotamers per amino acid, but the one to two order-of-magnitude speedups obtained by our algorithm indicate that it maybe be possible also to handle higher-resolution models (e.g., with more rotamer states, or degrees of freedom in the protein backbone). 6 Conclusion We give an algorithm for adaptive inference in dynamically changing tree-structured Bayesian networks. Given n nodes in the network, our algorithm can support updates to the observed evidence, conditional probability distributions, as well as updates to the network structure (as long as they keep the network tree-structured) with O(n) preprocessing time and O(log n) time for queries on any marginal distribution. Our algorithm can easily be modified to maintain MAP estimates as well as compute data likelihoods dynamically, with the same time bounds. We implement the algorithm and show that it can speed up Bayesian inference by orders of magnitude after small updates to the network. Applying our algorithm on the junction tree representation of a graph yields an inference algorithm that can handle updates on conditional distributions and observed evidence in general Bayesian networks (e.g., with cycles); an interesting open question is whether updates to the network structure (i.e., edge insertions/deletions) can also be supported. 2 Available for download at http://www.ics.uci.edu/ihler/code/. 7 10 Time (sec) -1 Naive sum-product Build Query Update Restructure 10 -2 10 -3 10 2 10 # of nodes 3 Figure 3: Log-log plot of run time for naive sum-product, building the cluster tree, computing queries, updating factors, and restructuring (adding and deleting edges). Although building the cluster tree is slightly more expensive than sum-product, each subsequent update and query is between 10 and 100 times more efficient than recomputing from scratch. References [1] Umut A. Acar, Guy E. Blelloch, Matthias Blume, and Kanat Tangwongsan. An experimental analysis of self-adjusting computation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2006. [2] Umut A. Acar, Guy E. Blelloch, Robert Harper, Jorge L. Vittes, and Maverick Woo. Dynamizing static algorithms with applications to dynamic trees and history independence. In ACM-SIAM Symposium on Discrete Algorithms (SODA), 2004. [3] Umut A. Acar, Guy E. Blelloch, and Jorge L. Vittes. An experimental analysis of change propagation in dynamic trees. In Workshop on Algorithm Engineering and Experimentation (ALENEX), 2005. [4] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. Nucl. Acids Res., 28:235­242, 2000. [5] P. Clifford. Markov random fields in statistics. In G. R. Grimmett and D. J. A. Welsh, editors, Disorder in Physical Systems, pages 19­32. Oxford University Press, Oxford, 1990. [6] A. L. Delcher, A. J. Grove, S. Kasif, and J. Pearl. Logarithmic-time updates and queries in probabilistic networks. Journal of Artificial Intelligence Research, 4:37­59, 1995. [7] R. L. Dunbrack Jr. Rotamer libraries in the 21st century. Curr Opin Struct Biol, 12(4):431­440, 2002. [8] M. I. Jordan. Graphical models. Statistical Science, 19:140­155, 2004. [9] H. Kamisetty, E. P Xing, and C. J. Langmead. Free energy estimates of all-atom protein structures using generalized belief propagation. In Proceedings of the 11th Annual International Conference on Research in Computational Molecular Biology, 2007. To appear. [10] F. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47:498­519, 2001. [11] R. McEliece and S. M. Aji. The generalized distributive law. IEEE Transactions on Information Theory, 46(2):325­343, March 2000. [12] Marina Meila and Michael I. Jordan. Learning with mixtures of trees. Journal of Machine Learning Research, 1(1):1­48, October 2000. [13] Gary L. Miller and John H. Reif. Parallel tree contraction and its application. In Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, pages 487­489, 1985. [14] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, 1988. [15] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Tree consistency and bounds on the performance of the max-product algorithm and its generalizations. Statistics and Computing, 14:143­166, April 2004. [16] S. J. Weiner, P.A. Kollman, D.A. Case, U.C. Singh, G. Alagona, S. Profeta Jr., and P. Weiner. A new force field for the molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc., 106:765­784, 1984. [17] C. Yanover and Y. Weiss. Approximate inference and protein folding. In Proceedings of Neural Information Processing Systems Conference, pages 84­86, 2002. 8