An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem A. Leaver-Fay, B. Kuhlman, and J. Snoeyink Pacific Symposium on Biocomputing 10:16-27(2005)


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

AN ADAPTIVE DYNAMIC PROGRAMMING ALGORITHM FOR THE SIDE CHAIN PLACEMENT PROBLEM

ANDREW LEAVER-FAY

BRIAN KUHLMAN

JACK SNOEYINK

UNC-Chapel Hil l Chapel Hil l, NC 27514, USA leaverfa@email.unc.edu bkuhlman@email.unc.edu snoeyink@email.unc.edu
Larger rotamer libraries, which provide a fine grained discretization of side chain conformation space by sampling near the canonical rotamers, allow protein designers to find b etter conformations, but slow down the algorithms that search for them. We present a dynamic programming solution to the side chain placement problem which treats rotamers at high or low resolution only as necessary. Dynamic programming is an exact technique; we turn it into an approximation, but can still analyze the error that can b e introduced. We have used our algorithm to redesign the surface residues of ubiquitin's beta sheet.

1. Intro duction Current techniques for protein redesign fit new amino acids onto the rigid backbone scaffold of a known protein. Side chain conformations are almost always modeled by a rotamer library that discretely samples the conformation space. These libraries are obtained from observed side chain conformations in solved structures and/or by quantum calculations with small molecules.5,15 The energy of a placement depends on the rotamer (or state, Sv ) assigned to each residue v , and can be expressed as a sum of internal/background energies for each rotamer and interaction energies between pairs of residues: v v Eself (Sv ) + Epair (Sv1 , Sv2 ). (1)
1 <v2

Minimizing energy expression (1) is the side chain placement problem. There are two natural subproblems: the rotamer relaxation problem, where the search is only over conformational space, and the redesign problem that explicitly involves the additional search over sequence space.


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

Many computational methods have been applied to side chain placement, including simulated annealing (SA),2,8,16 dead-end elimination (DEE),4,14 branch-and-bound,7 potentials of mean force,10 genetic algorithms,3 and integer linear programming.6 We are not aware of a viable dynamic programming1 (DP) solution that scales to the size of practical rotamer libraries. DP, unlike all other methods except DEE and branchand-bound, is guaranteed to find the global optimal state assignment. To make DP viable, we capture all pairs with non-negligible interactions in an interaction graph, which guides the dynamic programming as we describe in the next section. This is an extension of our previous work on the hydrogen placement problem.13 For side chain placement, we make the algorithm adaptively explore the rotamers at either high or low resolution. 2. Metho ds 2.1. Interaction Graph Formulation A graph G = {V , E } consists of a set V of vertices and E  V × V of edges. We say that vertices u and v are adjacent if the pair (u, v )  E . A hypergraph is a generalization of a graph in which an edge (sometimes called a hyperedge) can contain any number of vertices of V . The degree of a hyperedge is the number of vertices it is incident upon. In Fig. 1b, we draw a hypergraph with the vertices as points and the hyperedges as curves encircling them. We can initially create an interaction graph, a hypergraph, G = {V , E }, that captures energy expression (1). Each residue is represented by a vertex, v  V , and each vertex carries a state space S (v ), which is the set of rotamers that can be placed at v . We initially create a (hyper)edge for each vertex and each pair. We assign each (hyper)edge e  E a scoring function fe (S ) that maps the states v chosen for its vertices to their interaction potentials: fe : e S (v )  . Specifically, each vertex, v  V , has a corresponding degree-1 hyperedge, {v }  E , with a hyperedge scoring function f{v} (Sv ) = Eself (Sv ). Each pair of vertices, v1 , v2  V , has a corresponding degree-2 hyperedge, {v1 , v2 }  E , with a hyperedge scoring function f{v1 ,v2 } (Sv1 , Sv2 ) = Epair (Sv1 , Sv2 ). We can omit an edge e when fe is zero under all possible state assignments to the vertices of e. To ignore small contributions, we can also omit edges for which |fe | is always less than a chosen magnitude threshold, ľ. Figure 1a shows an interaction graph for ubiquitin design. We denote a state assignment to all vertices by SV , and the induced as-


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

signment to any subset (particularly hyperedges) e by Se . Any assignment e induces a score for the interaction graph, E fe (Se ), and the optimal assignment minimizes the score. 2.2. Dynamic Programming A dynamic programming (DP) algorithm can optimize an interaction graph by eliminating vertex v from the graph by solving for the optimal state of v for all possible states of its neighbors, then replacing v with a hyperedge containing these neighbors. Specifically, let Ev be the set of hyperedges that contain v , and Nv = e Ev e \ {v } b e the neighb ors of v . If Nv is not already a hyp eredge, create Nv with an initial scoring function fNv = 0. Next, add to fNv the scoring functions of Ev with the best assignment to v , and eliminate v and edges Ev ^ from the hypergraph. Let fe,v=s denote the function whose domain is the states of Nv obtained from fe by restricting the state of v to be s  S (v ). e ^ Then we can write fNv = minsS (v) Ev fe,v =s . Also record v 's optimal state as a function of its neighbors' states for later retrieval. The scoring function for Nv now represents the simultaneous interaction of the vertices of Nv with the optimal state of v . The minimum score for this reduced interaction graph is the same as the score of the original. Each function is represented in a multi-dimensional table; the table dimensionality is the degree of its corresponding hyperedge. Computing fNv amounts to filling all cells in the table representing it. We can reduce a graph to a single vertex by repeated elimination; Figure 1 illustrates the first two vertex eliminations. Once we have the optimal state of this remaining vertex, we can trace back the optimal states for eliminated vertices by reading off their optimal states in the reverse order of their elimination.

a Figure 1.

b

c

d

Interaction graph reduction. The edges in (a) are drawn as curves encircling the vertices they contain in (b). The DP algorithm eliminates the upper-left vertex of (b) and updates the existing degree-2 hyperedge (c). It creates a degree-3 hyperedge as it eliminates the next vertex (d).

In the remainder of this subsection we give upper bounds on the running time and memory for an interaction graph with n vertices. If we assume


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

at most s states per vertex, and at most w neighbors at each elimination, then DP runs in O(nsw+1 ) time and uses O(nsw ) space. The parameter w is known as the treewidth9 of the interaction graph. Our algorithm shows that side chain placement is another instance of an NP-Hard problem with a polynomial time solution for graphs of bounded treewidth. To define treewidth, we first define the tree decomposition of a hypergraph G = {V , E } as a tree with T whose vertices {X1 , X2 , . . . , Xm } represent subsets of V that satisfy the following three properties: 1 (1) The union of sets im Xi = V . (2) Each edge e  E is in some set: e  Xi for some 1  i  m. (3) Each vertex v  V occupies a connected part of tree T : for any Xj on a path in T from Xi to Xk , the intersection Xi  Xk  Xj . The treewidth of a tree decomposition is maxi |Xi | - 1. The treewidth of a graph G is the minimum treewidth over all possible tree decompositions of G. Figure 2b illustrates a treewidth-3 tree decomposition of the interaction graph in Fig. 2a.

a Figure 2.

b

Tree decomposition. The tree in (b) is a tree-decomposition of the labeled graph in (a). Each tree node contains the vertex label set, Xi .

To show that bounded-treewidth interaction graphs yield polynomial time optimization, we convert a tree decomposition into a canonical form, which will then specify the elimination order for the vertices of the graph. Lemma 2.1. A tree decomposition T with treewidth w of the graph G can be converted into a rooted tree decomposition with treewidth w in which the vertex sets Xi satisfy |Xi \ Xj | = 1 whenever Xj is the parent of Xi . Pro of. We give a constructive proof. Begin by choosing an arbitrary node of T as the root. Now, consider any node Xi and parent Xj for which |Xi \ Xj | = 1. If |Xi \ Xj | = 0, then delete Xi and connect any children of Xi to the parent Xj . Otherwise take one vertex from the set difference, v  Xi \ Xj , and create a new node, Y = (Xi  Xj )  {v } between Xi and Xj .


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

Node Xi now has the desired property. Tree T is still a tree decomposition: the first two properties for tree decompositions hold trivially, and the third property holds because (Xi  Xj ) = (Xi  Y  Xj ). The treewidth of the new decomposition is still w since |Y | < |Xi |  (w + 1). Recurse on Y until the desired property holds. Now, a depth-first traversal of a rooted, canonical tree decomposition gives an order for vertex elimination: Reaching a node, write down the one vertex in the set difference between it and its parent. At the root, r, write down the vertices of Xr in any order. Because each vertex appears in a connected part of the tree, each vertex is written down once, and the recorded list of vertices provides the elimination order. Theorem 2.1. For an interaction graph G on n vertices with at most s states per vertex and a tree decomposition T of treewidth w, we can compute the optimum state assignment in O(nsw+1 + wn) time and O(nsw + wn) space by dynamic programming. Pro of. The O(wn) terms come from the time and space to construct and use the canonical tree decomposition. To bound the running time of DP, we use induction on the number of nodes eliminated so far. Before eliminating node v , we assert that G is the interaction graph that results from having eliminated the vertices up to vertex v , and that the canonical tree decomposition T that we maintain is a tree decomposition of treewidth w for G. If no vertices before v have been eliminated, then our assertion is trivially true. Vertex v was chosen for elimination because it was contained in a leaf node Xi of T , but not in the parent of Xi . If Xi was the root, then Xi has fewer than w + 1 nodes, and we may apply brute force optimization in O(sw+1 ) time. Otherwise, let Xj be Xi 's parent. By definition of the canonical tree decomposition, {v } = Xi \ Xj . Therefore v is adjacent to at most w vertices, and we can record the best state for v for each assignment to these vertices in O(sw+1 ) time and O(sw ) space. The hyperedge that v 's elimination produces is a subset of Xj , so we can delete Xi from T to obtain a tree decomposition of G after the elimination of v . Since each vertex is eliminated once, the theorem is established. Although computing the treewidth of a graph is NP-hard, the interaction graphs we have observed are small enough that we can make lowtreewidth tree decompositions by hand or by heuristics and feed them as input to our algorithm.


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

2.3. The Move Towards Large Rotamer Libraries The analysis of the DP algorithm shows that it is principally limited by two parameters: the number of states per node, s, which can be in the tens to the thousands, and the treewidth, w, which is fixed by the interaction graph and where we have only tried problems when it was between three to five. Since the number of vertices n is in the tens for a redesign problem, we seek to reduce the impact of increasing s. How and why does s increase? Large rotamer libraries are typically cousins of the canonical rotamer libraries,5,15 obtained by sampling around the canonical rotamers. Specifically, rotamers are usually defined as particular values of  dihedral angles for side chains of amino acids. A large rotamer library might expand the canonical leucine rotamer represented by  angles (60, 180) into many samples inside the box (60 ą 5, 180 ą 10). The canonical rotamers represent the bottoms of shallow energy wells for side chain conformations; the large rotamer libraries sample these wells. We can therefore organize our rotamer libraries as a set of base-rotamers (boxes) which contain many sub-rotamers (points): we denote the sub-rotamers of a base-rotamer b by subscripts b1 , b2 , . . . Protein designers have found that large rotamer libraries produce better designs. The improvement can be attributed to the high penalty assigned to colliding atoms. The Lennard-Jones "6-12" potential suggests a larger rotamer library: the d12 repulsive term is sensitive to small changes when 1 d is small, and slight flexes of the  dihedrals resolve some high penalty collisions. The remaining terms of the energy functions are less sensitive: 1 electrostatic interaction, for instance, is a function of d . Thus, for rotamers that are not near each other, we hope to represent many sub-rotamer interactions by a single base-rotamer interaction. We formalize this idea by defining the concept of stiff rotamer interactions. 2.4. Stiff Interactions An assignment of base-rotamers, a, b, to the adjacent vertices u and v is stiff in {u, v }  E if any sub-rotamers ai , aj and bk , bl satisfy |f{u,v} (ai , bk ) - f{u,v} (aj , bl )| > . If a base-rotamer interaction fails to exceed the stiffness threshold, then we will approximate the energy between pairs of sub-rotamers by the energy between the canonical rotamers and will bound the error in this approximation.


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

Our analogy is with stiff differential equations: the stiff interactions are those where the energy is rapidly changing over a collection of sub-rotamers. As adaptive schemes for numerical integration increase their temporal resolution when their input ODEs become stiff, and decrease their resolution when the ODEs behave smoothly, our algorithm increases its spatial resolution when it encounters stiff base-rotamer combinations, examining all possible sub-rotamer combinations, and decreases its resolution for non-stiff base-rotamer combinations. 2.5. Adaptive Dynamic Programming In adaptive dynamic programming, each vertex carries two levels of state spaces. At the top level is the base-state state space, L(v ), and each basestate b  L(v ) carries a sub-state state space, H(b). The hyperedge scoring v b functions are defined as a mapping fe : ) e L(v )  ( B H(b)  where B , a base state assignment to the vertices in e, is the input argument to the outer function. For each edge e and each base-state assignment B , B we maintain a stiffness descriptor, Ce which is a family of subsets of B. B B B Let Ce,k  B represent the k th element of Ce where 1  k  |Ce |. Alongside the stiffness descriptor, we define the set of all stiff base-states B |Ce | B B in an assignment, Ue = k=1 Ce,k . If e is an input hyperedge {u, v } under the base-state assignment {a, b}, {a,b} the stiffness descriptor is either C{u,v} = {{a, b}} if a and b's interaction is stiff or {{}} otherwise. A hyperedge created over the course of the reduction corresponds to an eliminated vertex; the stiffness descriptors for such a hyperedge correspond to the stiffnesses that existed with the optimal states of the eliminated vertex. The elimination of vertex v with incident edges Ev leaves behind a hyperedge Nv containing v 's former neighbors. We will describe the stiffness descriptor for a single base-state assignment, B , to the vertices of Nv . Define the stiffly-interacting base-states of a single base-state of v , b  L(v ) (B ,b) (B ,b)e to be TEv = eEv Ue - b where (B , b)e represents the base-states assigned to the vertices e contains. Let P B represent the power set of B , where if |Nv | = d then |P B | = 2d , and let PiB be the ith member of PiB . (B ,b) Let LPiB (v ) be {b | TEv = PiB }, that is, the set of all base-states of v which have the same stiffly-interacting base-states of PiB . ^ Let fe ((B , b)e )v=s denote the function whose domain is the sub-states of Nv under the base-state assignment B obtained from fe ((B , b)e ) by restricting v to sub-state s. Then for each i with a non-empty LPiB (v ), compute


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

the function fPiB :

b

B Pi

H(b)  min
i

fPiB =

bLP B (v ) sH(b)

so that e ^ min fe ((B , b)e )v=s
Ev

We represent the range of fPiB by [bestPiB . . . worstPiB ]. We define the best-worst score as mini worstPiB and define the set of competing stiffnesses as I = {i | bestPiB < best-worst}. Now, there is a natural partial order based on the subset property for power sets. We use this partial order to define a maximal family of sets to be any where no element in the family is a subset of any other. Given a family of sets, P we define maximal(P ) to be the function which returns the maximal family produced by throwing out any set if it is a subset of any B other. The stiffness descriptor CNv will be assigned maximal({PiB | i  I }). B B Let the set Ik , corresponding to CNv ,k , represent {i | PiB  CNv ,k }. Then B we define a set of functions, fCN ,k = miniIk fPiB . Then the hyperedge v scoring function with the state assignment B can be defined as fNv (B ) = B mink fCN ,k . v Where we represented each hyperedge scoring function before as a multidimensional table, we now represent each function as multi-resolution multidimensional table: a table of tables. The top level table has an entry for each base-state assignment to the vertices the edge contains. In each entry B resides a set of tables holding the fCN ,k functions. Each table requires v b B CNv ,k |H(b)| space. The smaller tables reduce the memory required by the adaptive algorithm compared to standard dynamic programming. 2.6. Irresolvable Col lisions With a final parameter,  , we define a pair of base-rotamers b and c to be in an irresolvable collision if Źi,j Epair (bi , cj ) <  While some pairs of base-rotamers can resolve their collisions by slight dihedral flexes, others cannot. As long as we have hope that a collision-free placement of side chains exists, we need not examine the colliding pairs. Within the DP framework, this means we may avoid calculating the best state of a vertex for any combination of neighbors's base-states that put them in an irresolvable collision. Since the d12 collision term is fluctuating 1 wildly, irresolvable collisions will meet our stiff interaction threshold, , and we would wastefully treat them at high resolution if we did not ignore them.


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

2.7. Error Analysis Theorem 2.2. The score induced by the adaptive algorithm's state assignment SV is within 2 |E2 | of the global optimum score where |E2 | is the number of degree-2 hyperedges in the input interaction graph. Pro of. Let SV be the energy of the adaptive algorithm's state assignment and opt be the global minimum energy. Now, in selecting a sub-state of some base-state and representing it's interaction energy with another basestate non-stiffly, the algorithm can misrepresent the energy by at most . On one hand, this may decrease the apparent score of the state assignment SV while on the other increasing the apparent score of the global optimal state. Over the course of the optimization, the algorithm may misrepresent at most |E2 | non-stiff interactions, so the apparent score of SV may decrease to SV - |E2 | and the apparent score of the optimal state may increase to opt + |E2 |. Because the algorithm chose SV , we know SV - |E2 |  opt + |E2 | and thus SV - opt  2 |E2 |. 3. Results We tested our algorithm at the rotamer relaxation task and the redesign task. For both tasks, we selected 15 surface residues from ubiquitin's  sheet, pictured in Fig. 3a. We excluded the following amino acids to keep the treewidth of our interaction graphs low: arginine, lysine, and methionine.

a

b

c

d

Figure 3. Ubiquitin's  -sheet. The  -sheet in (a) is flattened in (b) with it's 15 surface residues shown. We observed the treewidth-4 interaction graph in (c) by including edges between residues if any pair of rotamers ever interacted with an energy magnitude at least ľ = 0.2 kcal/mol. We artificial ly created the treewidth-3 interaction graph in (d) by dropping a single edge.

For the rotamer relaxation task, we first created 100 sequences for the ubiquitin backbone, asking the design module of the Rosetta molecular


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

modeling program11 to stochastically redesign these 15 surface residues. We then evaluated Rosetta's experimentally validated energy function12 between all pairs of sub-rotamers, and included hyperedges that met our interaction magnitude threshold ľ = 0.2 kcal/mol. This produced a treewidth-4 interaction graph, shown in Fig. 3c. We set our irresolvable collision cutoff to  = 1 kcal/mol. We compared the standard DP algorithm against the adaptive algorithm with values of 0, 10-4 , 10-3 , 10-2 , 0.1, and 1.0. Against DP, we compared the time a single Rosetta SA design required and scores it produced. In the relaxation problem, the average residue had 32 total rotamers, breaking down into 3 base-states and 10 sub-states per base-state. The median state space size was  1018 . We measured performance on a dual 2 GHz AMD Athlon with 2 GB RAM. In Fig. 4 we plot the relative running time of the adaptive and standard DP algorithms against the actual error observed. In table 1, we present the actual running times. Except for three instances, SA produced the optimal answer when run for as long as standard DP.
Table 1. Average running time comparison, in milliseconds, at the rotamer relaxation task. Run Time Mean Median Std Dev DP 206.2 117.3 399.3 =0 63.7 53.7 49.1 10-4 62.9 53.2 48.6 10-3 63.0 53.7 48.9 10-2 61.2 50.7 48.8 0.1 17.5 11.3 38.2 1.0 6.4 4.8 7.6 SA 65.1 65.0 6.5

Rotamer relaxation task. Increasing to as high as 1.0 kcal/mol gives a theoretical error bound of ą 64 kcal/mol but actual ly preserves accuracy and greatly decreases running time.
Figure 4.


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

For the redesign task, we artificially imposed a treewidth-3 interaction graph on the problem, pictured in figure 3d. This interaction graph differs by a single edge from the graph in 3c. The absence of this edge decreases the quality of the design. We none the less include the task as it pushes the DP algorithm's limits. Each residue in the design problem averaged 680 total rotamers. This broke down into about 57 base-rotamers per residue and 12 sub-rotamers per base-rotamer. The size of the state space was  1042 . We measured the performance of both the standard and the adaptive DP algorithms on a dual 900 MHz 64-bit Itanium-2 with 10 GB of RAM. We compared against a single SA run on the 2 GHz Athlon. In table 2 we present the results.
Table 2. Run Time Memory Usage Score (kcal / mol) Error (kcal / mol) Redesign task p erformance comparison. DP 15.99 hrs. 3.7 GB -42.5893 = 0.1 5.07 hrs. 3.4 GB -42.5893 0.0000 = 1.0 1.52 hrs. 1.5 GB -42.5579 0.0314 SA 3.42 seconds 0.2 GB -42.5692 0.0201

4. Discussion We have presented a novel application of our DP algorithm, and an improvement upon it that begins to make it competitive. DP offers an alternative to DEE and branch-and-bound for finding the global optimal solution in the side chain placement problem. The other algorithms are designed to solve a very difficult problem; the generic interaction graph implied by energy expression (1) is fully connected. Distant amino acids, however, have interaction energies of zero, so the interaction graphs we encounter are sparsely connected. Actual instances of the side chain placement problem may not require algorithms as generic as DEE and branch-and-bound. We were surprised to observe the small error for high values of . None of our experiments with > 10-2 produced an error near the theoretical bound we proved in Sec. 2.7. When we set as high as 1 kcal/mol, we limit our high resolution focus to only those sometimes-colliding pairs of base-rotamers. It is possible that in general the interactions in the global optimal solution induces are either very stiff or almost totally insensitive to small change; rotamers will either pack tightly or are far apart. It is also possible that the non-canonical rotamers' internal strain prevents their selection except in the instances of collision resolution. By representing base-state interactions with the canonical rotamer interaction energies, we would introduce error only for the non-stiff interactions induced alongside


September 23, 2004

0:18

Pro ceedings Trim Size: 9in x 6in

adaptiveDPpsbv12

a resolved collision--the resolved collision itself would be modeled stiffly. 5. Future Work We have allowed our adaptive algorithm only two spatial resolutions: high or low. We want to heirarchically group rotamers with varying resolutions dictated by the sub-rotamer energy range. This would let us make tighter theoretical bounds while hopefully preserving our performance gains. Moreover, we want to treat some states at even lower resolution. If two baserotamers for a residue reached away from the vertex v being eliminated so that their interaction energies with the rotamers of v were the same, we could treat these two base-rotamers as one and reduce the redundant computations. We would also like to incorporate partial dynamic programming into a simulated annealing algorithm. We can decrease the problem complexity if we allow DP to eliminate all degree-1 and -2 vertices. This fixes the eliminated vertices in their optimal states. We expect this adaptation will produce better designs. Acknowledgements This research was partially funded by NSF grant 0076984. References
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. R. Bellman. Dynamic Programming. Princeton Univ. Press, 1957. B. I. Dahiyat and S. L. Mayo. Science, 278 82, (1997). J. R. Desjarlais and T. M. Handel. Prot. Sci., 4 2006, (1995). J. Desmet, M. D. Maeyer, B. Hazes, and I. Lasters. Nature, 356 539, (1992). R. L. Dunbrack. Jr. Curr. Opin. Struct. Biol., 12 431, (2002). O. Eriksson, Y. Zhou, and A. Elofsson. In WABI'01, 128, (2001). D. B. Gordon and S. L. Mayo. Structure 7 1089, (1999). L. Holm and C. Sander. Proteins, 14(2):213, (1992). T. Kloks. Treewidth: computations and approximations. Springer, (1994). P. Koehl and M. Delarue. J Mol Biol, 239(2) 249, (1994). B. Kuhlman, and D. Baker. Proc Natl Acad Sci USA, 97 10383, (2000). B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, and D. Baker. Science, 302 1364, (2003). A. Leaver-Fay, Y. Liu, and J. Snoeyink. In ALENEX'04, (2004). L. L. Looger and H. W. Hellinga. J Mol Biol, 307(1) 429, (2001). S. C. Lovell, J. M. Word, J. S. Richardson, and D. C. Richardson. Proteins, 40 389, (2000). J. G. Saven and P. G. Wolynes. J. Phys. Chem. B, 101 8375, (1997).