An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem A. Leaver-Fay, B. Kuhlman, and J. Snoeyink Pacific Symposium on Biocomputing 10:16-27(2005) September 23, 2004 0:18 Pro ceedings Trim Size: 9in x 6in adaptiveDPpsbv12 AN ADAPTIVE DYNAMIC PROGRAMMING ALGORITHM FOR THE SIDE CHAIN PLACEMENT PROBLEM ANDREW LEAVER-FAY BRIAN KUHLMAN JACK SNOEYINK UNC-Chapel Hil l Chapel Hil l, NC 27514, USA leaverfa@email.unc.edu bkuhlman@email.unc.edu snoeyink@email.unc.edu Larger rotamer libraries, which provide a fine grained discretization of side chain conformation space by sampling near the canonical rotamers, allow protein designers to find b etter conformations, but slow down the algorithms that search for them. We present a dynamic programming solution to the side chain placement problem which treats rotamers at high or low resolution only as necessary. Dynamic programming is an exact technique; we turn it into an approximation, but can still analyze the error that can b e introduced. We have used our algorithm to redesign the surface residues of ubiquitin's beta sheet. 1. Intro duction Current techniques for protein redesign fit new amino acids onto the rigid backbone scaffold of a known protein. Side chain conformations are almost always modeled by a rotamer library that discretely samples the conformation space. These libraries are obtained from observed side chain conformations in solved structures and/or by quantum calculations with small molecules.5,15 The energy of a placement depends on the rotamer (or state, Sv ) assigned to each residue v , and can be expressed as a sum of internal/background energies for each rotamer and interaction energies between pairs of residues: v v Eself (Sv ) + Epair (Sv1 , Sv2 ). (1) 1 . If a base-rotamer interaction fails to exceed the stiffness threshold, then we will approximate the energy between pairs of sub-rotamers by the energy between the canonical rotamers and will bound the error in this approximation. September 23, 2004 0:18 Pro ceedings Trim Size: 9in x 6in adaptiveDPpsbv12 Our analogy is with stiff differential equations: the stiff interactions are those where the energy is rapidly changing over a collection of sub-rotamers. As adaptive schemes for numerical integration increase their temporal resolution when their input ODEs become stiff, and decrease their resolution when the ODEs behave smoothly, our algorithm increases its spatial resolution when it encounters stiff base-rotamer combinations, examining all possible sub-rotamer combinations, and decreases its resolution for non-stiff base-rotamer combinations. 2.5. Adaptive Dynamic Programming In adaptive dynamic programming, each vertex carries two levels of state spaces. At the top level is the base-state state space, L(v ), and each basestate b L(v ) carries a sub-state state space, H(b). The hyperedge scoring v b functions are defined as a mapping fe : ) e L(v ) ( B H(b) where B , a base state assignment to the vertices in e, is the input argument to the outer function. For each edge e and each base-state assignment B , B we maintain a stiffness descriptor, Ce which is a family of subsets of B. B B B Let Ce,k B represent the k th element of Ce where 1 k |Ce |. Alongside the stiffness descriptor, we define the set of all stiff base-states B |Ce | B B in an assignment, Ue = k=1 Ce,k . If e is an input hyperedge {u, v } under the base-state assignment {a, b}, {a,b} the stiffness descriptor is either C{u,v} = {{a, b}} if a and b's interaction is stiff or {{}} otherwise. A hyperedge created over the course of the reduction corresponds to an eliminated vertex; the stiffness descriptors for such a hyperedge correspond to the stiffnesses that existed with the optimal states of the eliminated vertex. The elimination of vertex v with incident edges Ev leaves behind a hyperedge Nv containing v 's former neighbors. We will describe the stiffness descriptor for a single base-state assignment, B , to the vertices of Nv . Define the stiffly-interacting base-states of a single base-state of v , b L(v ) (B ,b) (B ,b)e to be TEv = eEv Ue - b where (B , b)e represents the base-states assigned to the vertices e contains. Let P B represent the power set of B , where if |Nv | = d then |P B | = 2d , and let PiB be the ith member of PiB . (B ,b) Let LPiB (v ) be {b | TEv = PiB }, that is, the set of all base-states of v which have the same stiffly-interacting base-states of PiB . ^ Let fe ((B , b)e )v=s denote the function whose domain is the sub-states of Nv under the base-state assignment B obtained from fe ((B , b)e ) by restricting v to sub-state s. Then for each i with a non-empty LPiB (v ), compute September 23, 2004 0:18 Pro ceedings Trim Size: 9in x 6in adaptiveDPpsbv12 the function fPiB : b B Pi H(b) min i fPiB = bLP B (v ) sH(b) so that e ^ min fe ((B , b)e )v=s Ev We represent the range of fPiB by [bestPiB . . . worstPiB ]. We define the best-worst score as mini worstPiB and define the set of competing stiffnesses as I = {i | bestPiB < best-worst}. Now, there is a natural partial order based on the subset property for power sets. We use this partial order to define a maximal family of sets to be any where no element in the family is a subset of any other. Given a family of sets, P we define maximal(P ) to be the function which returns the maximal family produced by throwing out any set if it is a subset of any B other. The stiffness descriptor CNv will be assigned maximal({PiB | i I }). B B Let the set Ik , corresponding to CNv ,k , represent {i | PiB CNv ,k }. Then B we define a set of functions, fCN ,k = miniIk fPiB . Then the hyperedge v scoring function with the state assignment B can be defined as fNv (B ) = B mink fCN ,k . v Where we represented each hyperedge scoring function before as a multidimensional table, we now represent each function as multi-resolution multidimensional table: a table of tables. The top level table has an entry for each base-state assignment to the vertices the edge contains. In each entry B resides a set of tables holding the fCN ,k functions. Each table requires v b B CNv ,k |H(b)| space. The smaller tables reduce the memory required by the adaptive algorithm compared to standard dynamic programming. 2.6. Irresolvable Col lisions With a final parameter, , we define a pair of base-rotamers b and c to be in an irresolvable collision if ¬i,j Epair (bi , cj ) < While some pairs of base-rotamers can resolve their collisions by slight dihedral flexes, others cannot. As long as we have hope that a collision-free placement of side chains exists, we need not examine the colliding pairs. Within the DP framework, this means we may avoid calculating the best state of a vertex for any combination of neighbors's base-states that put them in an irresolvable collision. Since the d12 collision term is fluctuating 1 wildly, irresolvable collisions will meet our stiff interaction threshold, , and we would wastefully treat them at high resolution if we did not ignore them. September 23, 2004 0:18 Pro ceedings Trim Size: 9in x 6in adaptiveDPpsbv12 2.7. Error Analysis Theorem 2.2. The score induced by the adaptive algorithm's state assignment SV is within 2 |E2 | of the global optimum score where |E2 | is the number of degree-2 hyperedges in the input interaction graph. Pro of. Let SV be the energy of the adaptive algorithm's state assignment and opt be the global minimum energy. Now, in selecting a sub-state of some base-state and representing it's interaction energy with another basestate non-stiffly, the algorithm can misrepresent the energy by at most . On one hand, this may decrease the apparent score of the state assignment SV while on the other increasing the apparent score of the global optimal state. Over the course of the optimization, the algorithm may misrepresent at most |E2 | non-stiff interactions, so the apparent score of SV may decrease to SV - |E2 | and the apparent score of the optimal state may increase to opt + |E2 |. Because the algorithm chose SV , we know SV - |E2 | opt + |E2 | and thus SV - opt 2 |E2 |. 3. Results We tested our algorithm at the rotamer relaxation task and the redesign task. For both tasks, we selected 15 surface residues from ubiquitin's sheet, pictured in Fig. 3a. We excluded the following amino acids to keep the treewidth of our interaction graphs low: arginine, lysine, and methionine. a b c d Figure 3. Ubiquitin's -sheet. The -sheet in (a) is flattened in (b) with it's 15 surface residues shown. We observed the treewidth-4 interaction graph in (c) by including edges between residues if any pair of rotamers ever interacted with an energy magnitude at least µ = 0.2 kcal/mol. We artificial ly created the treewidth-3 interaction graph in (d) by dropping a single edge. For the rotamer relaxation task, we first created 100 sequences for the ubiquitin backbone, asking the design module of the Rosetta molecular September 23, 2004 0:18 Pro ceedings Trim Size: 9in x 6in adaptiveDPpsbv12 modeling program11 to stochastically redesign these 15 surface residues. We then evaluated Rosetta's experimentally validated energy function12 between all pairs of sub-rotamers, and included hyperedges that met our interaction magnitude threshold µ = 0.2 kcal/mol. This produced a treewidth-4 interaction graph, shown in Fig. 3c. We set our irresolvable collision cutoff to = 1 kcal/mol. We compared the standard DP algorithm against the adaptive algorithm with values of 0, 10-4 , 10-3 , 10-2 , 0.1, and 1.0. Against DP, we compared the time a single Rosetta SA design required and scores it produced. In the relaxation problem, the average residue had 32 total rotamers, breaking down into 3 base-states and 10 sub-states per base-state. The median state space size was 1018 . We measured performance on a dual 2 GHz AMD Athlon with 2 GB RAM. In Fig. 4 we plot the relative running time of the adaptive and standard DP algorithms against the actual error observed. In table 1, we present the actual running times. Except for three instances, SA produced the optimal answer when run for as long as standard DP. Table 1. Average running time comparison, in milliseconds, at the rotamer relaxation task. Run Time Mean Median Std Dev DP 206.2 117.3 399.3 =0 63.7 53.7 49.1 10-4 62.9 53.2 48.6 10-3 63.0 53.7 48.9 10-2 61.2 50.7 48.8 0.1 17.5 11.3 38.2 1.0 6.4 4.8 7.6 SA 65.1 65.0 6.5 Rotamer relaxation task. Increasing to as high as 1.0 kcal/mol gives a theoretical error bound of ± 64 kcal/mol but actual ly preserves accuracy and greatly decreases running time. Figure 4. September 23, 2004 0:18 Pro ceedings Trim Size: 9in x 6in adaptiveDPpsbv12 For the redesign task, we artificially imposed a treewidth-3 interaction graph on the problem, pictured in figure 3d. This interaction graph differs by a single edge from the graph in 3c. The absence of this edge decreases the quality of the design. We none the less include the task as it pushes the DP algorithm's limits. Each residue in the design problem averaged 680 total rotamers. This broke down into about 57 base-rotamers per residue and 12 sub-rotamers per base-rotamer. The size of the state space was 1042 . We measured the performance of both the standard and the adaptive DP algorithms on a dual 900 MHz 64-bit Itanium-2 with 10 GB of RAM. We compared against a single SA run on the 2 GHz Athlon. In table 2 we present the results. Table 2. Run Time Memory Usage Score (kcal / mol) Error (kcal / mol) Redesign task p erformance comparison. DP 15.99 hrs. 3.7 GB -42.5893 = 0.1 5.07 hrs. 3.4 GB -42.5893 0.0000 = 1.0 1.52 hrs. 1.5 GB -42.5579 0.0314 SA 3.42 seconds 0.2 GB -42.5692 0.0201 4. Discussion We have presented a novel application of our DP algorithm, and an improvement upon it that begins to make it competitive. DP offers an alternative to DEE and branch-and-bound for finding the global optimal solution in the side chain placement problem. The other algorithms are designed to solve a very difficult problem; the generic interaction graph implied by energy expression (1) is fully connected. Distant amino acids, however, have interaction energies of zero, so the interaction graphs we encounter are sparsely connected. Actual instances of the side chain placement problem may not require algorithms as generic as DEE and branch-and-bound. We were surprised to observe the small error for high values of . None of our experiments with > 10-2 produced an error near the theoretical bound we proved in Sec. 2.7. When we set as high as 1 kcal/mol, we limit our high resolution focus to only those sometimes-colliding pairs of base-rotamers. It is possible that in general the interactions in the global optimal solution induces are either very stiff or almost totally insensitive to small change; rotamers will either pack tightly or are far apart. It is also possible that the non-canonical rotamers' internal strain prevents their selection except in the instances of collision resolution. By representing base-state interactions with the canonical rotamer interaction energies, we would introduce error only for the non-stiff interactions induced alongside September 23, 2004 0:18 Pro ceedings Trim Size: 9in x 6in adaptiveDPpsbv12 a resolved collision--the resolved collision itself would be modeled stiffly. 5. Future Work We have allowed our adaptive algorithm only two spatial resolutions: high or low. We want to heirarchically group rotamers with varying resolutions dictated by the sub-rotamer energy range. This would let us make tighter theoretical bounds while hopefully preserving our performance gains. Moreover, we want to treat some states at even lower resolution. If two baserotamers for a residue reached away from the vertex v being eliminated so that their interaction energies with the rotamers of v were the same, we could treat these two base-rotamers as one and reduce the redundant computations. We would also like to incorporate partial dynamic programming into a simulated annealing algorithm. We can decrease the problem complexity if we allow DP to eliminate all degree-1 and -2 vertices. This fixes the eliminated vertices in their optimal states. We expect this adaptation will produce better designs. Acknowledgements This research was partially funded by NSF grant 0076984. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. R. Bellman. Dynamic Programming. Princeton Univ. Press, 1957. B. I. Dahiyat and S. L. Mayo. Science, 278 82, (1997). J. R. Desjarlais and T. M. Handel. Prot. Sci., 4 2006, (1995). J. Desmet, M. D. Maeyer, B. Hazes, and I. Lasters. Nature, 356 539, (1992). R. L. Dunbrack. Jr. Curr. Opin. Struct. Biol., 12 431, (2002). O. Eriksson, Y. Zhou, and A. Elofsson. In WABI'01, 128, (2001). D. B. Gordon and S. L. Mayo. Structure 7 1089, (1999). L. Holm and C. Sander. Proteins, 14(2):213, (1992). T. Kloks. Treewidth: computations and approximations. Springer, (1994). P. Koehl and M. Delarue. J Mol Biol, 239(2) 249, (1994). B. Kuhlman, and D. Baker. Proc Natl Acad Sci USA, 97 10383, (2000). B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, and D. Baker. Science, 302 1364, (2003). A. Leaver-Fay, Y. Liu, and J. Snoeyink. In ALENEX'04, (2004). L. L. Looger and H. W. Hellinga. J Mol Biol, 307(1) 429, (2001). S. C. Lovell, J. M. Word, J. S. Richardson, and D. C. Richardson. Proteins, 40 389, (2000). J. G. Saven and P. G. Wolynes. J. Phys. Chem. B, 101 8375, (1997).