Gaussian Mixture Modeling of Helix Subclasses: Structure and Sequence Variations Ashish V. Tendulkar, Babatunde Ogunnaike, and Pramod P. Wangikar Pacific Symposium on Biocomputing 11:291-302(2006) 1 1,2,3 4 4,5 6 5 7 8 9 10 11 12 13,2 11 x, y , z 14 11 15 15 s n s y = {x1 , x2 , ..., xn } k ik n f (y ) = k i fi (y , i ) i i =1 i µi i i i µi 16 i i i i k 16 x ln p(i|x) = (- 1 1 ln |i | - (x - µi ) 2 2 -1 i i (x - µi ) + ln i ) 11 i i i l i+1 i+1 l+1 8 Pij = fij i i j nij / fij = fi Ni / i n i ij Ni j fi nij i Ni i 12 11 2000 1 6 1500 1000 9 1 4 1000 500 0 6 0 0 PC 2 0 !11 !15 PC1 7 6 0 PC 2 0 !11 !15 PC1 7 - P C4 < -0.70 +0.68 < P C3 + +0.72 < P C4 + - P C3 < -0.84 16 76% i µ1 µ2 µ3 µ4 µ5 µ6 11 22 33 44 55 66 Pk i i = 1 i i k = 11 µij ii i i j i i i+3 di,i+3 di,i+3 di,i+3 Ar ea158 di,i+3 d18 voli,i+1,i+2,i+3 ar ea158 di,i+3 d18 voli,i+1,i+2,i+3 ar ea158 5.15 + / - 0.20 6.99 + / - 0.57 d18 V oli,i+1,i+2,i+3 10.64 + / - 0.33 10.40 + / - 1.49 i 11.38 + / - 0.74 18.53 + / - 4.16 i+3 5.56 + / - 2.61 5.51 + / - 0.45 5.64 + / - 0.57 5.31 + / - 3.00 8.64 + / - 1.59 11.38 + / - 5.93 i, i + 1, i + 2 i+3 di,i+3 di,i+3 d18 Area158 d18 Area158 d18 V oli,i+1,i+2,i+3 11 di,i+3 Area158 d18 7000 6000 Number of Sequences 5000 4000 3000 2000 1000 0 0 10 20 30 40 50 60 Sequence Length 70 80 90 8 leng th 82 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 1 Curved Helix Regular Helix Kinked Helix 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 Curved Helix Regular Helix Kinked Helix Propensity Propensity 2 3 4 5 6 7 8 0.2 1 2 3 4 5 6 7 8 Sequence Position Sequence Position 4 3 3.5 Length 10 Length 15 Length 25 2.5 3 Length 10 Length 15 Length 25 2 Propensity Propensity 5 10 15 20 25 2.5 2 1.5 1.5 1 1 0.5 0.5 0 0 0 0 5 10 15 20 25 Sequence Position Sequence Position 10 16 i 5 5 8