September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak DENSE GRAPHLET STATISTICS OF PROTEIN INTERACTION AND RANDOM NETWORKS ¨ R. COLAK , F. HORMOZDIARI , F. MOSER , A. SCHONHUTH , J. HOLMAN, M. ESTER AND S.C. SAHINALP+ School of Computing Science, Simon Fraser University Joint first authors. + Contact author: cenk@cs.sfu.ca Understanding evolutionary dynamics from a systemic point of view crucially depends on knowledge about how evolution affects size and structure of the organisms' functional building blocks (modules). It has been recently reported that statistics over sparse PPI graphlets can robustly monitor such evolutionary changes. However, there is abundant evidence that in PPI networks modules can be identified with highly interconnected (dense) and/or bipartite subgraphs. We count such dense graphlets in PPI networks by employing recently developed search strategies that render related inference problems tractable. We demonstrate that corresponding counting statistics differ significantly between prokaryotes and eukaryotes as well as between "real" PPI networks and scale free network emulators. We also prove that another class of emulators, the low-dimensional geometric random graphs (GRGs) cannot contain a specific type of motifs, complete bipartite graphs, which are abundant in PPI networks. 1. Introduction On the biochemical level, life can be explained by gene products facilitating and controlling essential cellular mechanisms, thereby establishing overall cellular viability. To suitably model the structural features that underlie the complex wirings of gene products is presently at the core of computational systems biology. As further explained by the modularity paradigm, global interplay of cellular mechanisms can be decomposed into interaction of functional subunits (functional modules), which consist of subsets of gene products that facilitate essential functionalities by concerted actions. Therefore, beyond studying global properties of biomolecular networks (BNs) such as the degree distribution, there has been considerable interest in identifying and quantifying local topological properties of biomolecular networks. In particular, small subgraphs or (graphlets) that appear significantly more frequently in biomolecular networks than in random graphs can yield insights on cellular functionalities10 . In order to identify dense graphlets, a variety of methods that count induced subgraphs of different types and node sizes of up to 5, 6 and 7 have been suggested12,7,5 . In the most recent approach, an 1 September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak algorithm has been developed that counts all non-induced subtrees (as well as certain other "sparse" graphlets) of a PPI network1 . Corresponding subtree statistics turned out to be robust similarity measures between PPI networks, hence, in principle, can quantify organismic changes due to evolutionary dynamics. It has been argued that graphlet distributions resulting from graphlet counting algorithms can also be used to assess the suitability of random graph models for emulating the growth of and achieving the observable topological features of BNs. A thorough assessment would be highly desirable, as available models, although correctly accounting for many of the global features of BNs, can be different in terms of local features. For example, while exhibiting degree distributions similar to those of prior models (e.g. the preferential attachment model (PAM) which gives rise to scale-free networks), the generalized duplication model (GDM) is superior in terms of graphlet distributions that have recently been suggested7. This may be explained by that the GDM is the only model under actual consideration that is based on emulation of processes that guide BN growth, i.e. gene duplications. Apart from the GDM and the PAM, the geometric random graph model (GRGM) has recently been introduced as an intuitive alternative option. Although not being biologically motivated, PPI networks were successfully fit to GRGMs of low dimensions6 (for definitions see Section 3). 1.1. Motivation: Dense Biomolecular Graphlets It has been widely established in the biological literature10 , modules in PPI networks are most likely encoded as highly interconnected (dense) regions, i.e., connected subgraphs that contain relatively large numbers of edges. It is thus of interest how these regions change over evolutionary time and whether they play a significant role in how the PPI networks evolve. As a result, determining the number of specific dense graphlets and analyzing how they vary with respect to the "complexity" of their respective organism is a problem of significant interest. Unfortunately, counting dense graphlets, even approximately, is a highly non-trivial problem; for general graphs it is known to be intractable. In fact, the simpler problem of determining whether an input graph G includes one or more cliques of size k , or any graphlet with density 1 - O(1/n), is NP-hard. Furthermore, the number of any specific dense graphlet M in a dense graph G would be exponential with the size of M . Fortunately it is well known that PPI networks are typically very sparse, with average degree of 7 or less. For such networks, a novel data mining tool for determining whether a given network G includes a dense graphlet M has recently been described4. This tool is based on a pruning technique derived from combinatorial September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak observations on dense subgraphs4. Although, in the worst case, the running time of this pruning technique is exponential with the size of the dense graphlet it is looking for, it is of considerable interest whether it can be generalized to count a given dense graphlet in a PPI network of interest. Once such statistics are obtained for all dense graphlets one can consider a number of interesting questions. (i) Are prominent PPI network generation models, in particular GRGM (i-a) and GDM (i-b) in accordance with statistics over such dense graphlets? If not, it is highly unlikely that they sufficiently account for central cellular functionalities. (ii) Can we use dense graphlet distributions of various organisms, in order to quantify changes in organism complexity due to evolutionary dynamics? In general, it would be interesting to find out whether evolutionary trends towards more complex organisms can be monitored this way. If one accepts that PPI networks have evolved in a duplication oriented procedure, then one might be able to suggest that more evolutionary complex species might have more complex graphlets. As a last point, we noticed that in networks generated by low-dimensional GRGMs, complete bipartite graphs of the types Kn,m where n, m 3 do not/can not exist. However, there is evidence that induced bipartite graphs abundantly occur in the PPI networks of E.coli3 and others. Moreover, there is evidence that complete bipartite graphs in PPI networks are related to "parallel" functional modules which increase cellular flexibility and robustness.Furthermore, complete bipartite graphs of four nodes (known as bi-fan) are the main building blocks of dense overlap regulons (a regulon is the set of genes regulated by given transcription factor) 2,9 . Bi-fan generalizations to larger patterns with row of inputs and row of outputs (bipartite graphs of larger size) are also abundant in PPI networks 2 . Therefore, we put particular emphasis on bipartite graphlet statistics. 1.2. Contributions Counting graphlets (induced or non-induced form) with more than 7 nodes has been a challenging computational task. Recent advances allow improvement for sparse, non-induced graphlets: it is now possible to count trees and bounded treewidth graphlets (with very small treewidth)1. In this paper we generalize these results to all graphlets of density 0.85, up to a node size of 14 (for definitions see Section 2), as well as all complete bipartite graphlets of density 0.55 up to a node size of 10. The reason that we limited the results to graphlets of density 0.85 is that for lower densities the number of these occurrences seems to be extremely high and the algorithm took alot of time to give results. Based on these counts we present graphlet statistics for (1) two of the best studied PPI networks: Yeast (eukaryotic) and E.Coli (prokaryotic) as well as (2) for random networks September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak that are obtained through GDM and the GRGM models, whose parameters are set to emulate the growth of these PPI networks. Our main observation is that for the two organisms considered, the motif statistics were in accordance with their complexity: the Yeast network, in comparison to the E.coli network, contained (relatively) many more denser graphlets than sparser graphlets. We also observe that dense motif statistics of the Yeast network was highly divergent from that of the GDM whose parameters were set to emulate its growth as closely as possible. Finally, we prove that low-dimensional GRGM cannot generate complete, bipartite graphs - which abundantly occur in the PPI networks we considered. These observations may help resolve the inconclusive discussion on the suitability of random network generators in emulating the evolution of PPI networks. 2. Inference of Densely Connected and Bipartite Subgraphs An undirected graph G = (V , E ) is said to be a supergraph of G if G is an induced graphletof G. As usual, a graph is defined to be bipartite if V = V1 V2 such that E ((V1 × V1 ) (V2 × V2 )) = and it is moreover called complete if E = (V1 × V2 ) (V2 × V1 ). A Kn,m is a complete, bipartite graph such that |V1 | = n, |V2 | = m. We define the density d(G) of a graph G as the number of edges divided by 2E the number of possible edges d(G) = ||E|| = |V |(||V ||-1) . G is said to be -dense (V ) 2 if d(G) and dense in general if = 0.5. As usual, G is said to be connected if there is a path between any pair of nodes in G. G is said to be densely connected if it is dense and connected. Given a biomolecular network G = (V , E ) the driving question is to count all its induced densely connected and complete bipartite subgraphs. Note that inference of all densely connected subgraphs is N P -hard which can be shown by a straightforward reduction from the max-clique problem which is N P -complete8. Therefore, one has to screen all 2|V | subgraphs in the worst case. Help comes from the following insights: (i) For 0.5, in each -dense graph of node size n there is an induced -dense graphletof size n - 1 (ii) Each induced graphlet of a complete bipartite graph is a complete bipartite g r ap h . While proving (ii) is straightforward, (i) is based on some more subtle arguments that have been presented elsewhere4 . As a consequence of a combination of (i), (ii) one can employ a search strategy starting with induced 2-node subgraphs and iteratively not considering supergraphs that contradict (i) and/or (ii). September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak Thanks to the peculiarities of PPI networks (sparseness, scale-freeness), by employing a search strategy that incorporates this core idea, the inference problem becomes tractable. 3. Models for PPI Network Emulation 3.1. The Generalized Duplication Model The duplication model grows iteratively in discrete time steps. It starts with an arbitrary connected network G(t0 ), of node size t0 . In iteration t > t0 , one node, denoted as vt , is added to G(t - 1), the network resulting from iteration t - 1, as follows: A node w G(t - 1) is picked uniformly at random and then "duplicated" by creating a new node vt that is connected to all neighbors of w but not to w itself. Subsequently, (1) Edges (u, vt ) (where u is a neighbor of w) are deleted with probability p, (2) Edges (u, vt ) are created with probability r/(t - 1) where u runs through all nodes in the network. Resulting parallel edges are merged if necessary. In this process, p and r are fixed parameters. 3.2. Geometric Random Graphs A geometric random graph (GRG) is created by drawing points uniformly at random from some restricted area (e.g. the hypercube) as nodes and, given some fixed threshold r, connecting two nodes vi , vj by an edge if ||vi - vj || < r where ||.|| is the Euclidean norm. The dimension of the hypercube is referred to as the dimension of the graph. GRGs have been suggested as a model for emulating PPI networks12,6 . Although geometric random graphs can successfully capture a couple of PPI network features we would like to outline that GRGs cannot generate induced, complete bipartite subgraphs in the following. However, such subgraphs are crucial ingredients of PPI networks3 . Theorem 3.1. A 2- resp. 3-dimensional GRG does not contain a K2,3 resp. K3,3 as an induced subgraph. As we will demonstrate (see the results Section 4) that PPI networks contain significant amounts of such induced bipartite subgraphs in the following, this will rule out low-dimensional GRGs as a model that is suited to capture important local topological features of PPI networks. Moreover, as inspired by the theorem's proof, we conjecture that there is no Kn,2 in (n - 1)-dimensional and no Kn,3 in n-dimensional GRGs. We are currently close to finishing respective results for September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak n = 4 and, indeed, we couldn't observe K4,2 's resp. K4,3 's in the 3- resp. 4dimensional GRGs we generated. 3.3. Notations In the following, for X, Y Rd let X Y := {v Rd : v = X + (1 - )Y : [0, 1]} be the line segment that connects X and Y and - - X Y := {v Rd : v = X + (Y - X ), 0} be the half ray that leaves X in direction of Y . For X Rd let Ur (X ), Sr (X ), Br (X ) := {v Rd : ||v - X || {<, =, } r} be the open ball, the (hyper)sphere and the closed ball with radius r around X . Note that for d = 2, Sr (X ) is just a circle around X with radius r. 3.4. Proof of theorem 3.1 We prepare the proof with two essential lemmata. Lemma 3.1. Let P, A, B , C R2 such that ||P - A||, ||P - B ||, ||P - C || < r ||A - B ||, ||A - C ||, ||B - C || Then, for s max{||P - A||, ||P - B ||, ||P - C ||}, it holds that Ur (A) Ur (B ) Ur (C ) Us (P ). (2 ) (1 ) Proof. Obviously, if A, B , C are located on one line, U := Ur (A) Ur (B ) Ur (C ) is empty such that there is nothing to prove. Therefore, we can assume that the affine dimension of A, B , C is 2. In combination with some elementary - - - - - geometric arguments, this implies that P A, P B , P C divide R2 into three sections - - -- such that the section SX Y which is bounded by P X , P Y does not contain Z where X, Y , Z {A, B , C } (X, Y , Z pairwise different). Hence R2 = S AB S AC S B C We will show that Ur (A) SB C , Ur (B ) SAC , Ur (C ) SAB Us (P ) (4 ) a n d A S B C , B S AC , C S AB . (3 ) September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak which yields U = U (SAB SAC SB C ) = (U SAB ) (U SAC ) (U SB C ) (Ur (C ) SAB ) (Ur (B ) SAC ) (Ur (A) SB C ) Us (P ) and therefore establishes (2). In order to show (4), we can restrict our attention to showing Ur (C ) SAB Us (P ) as the remaining two cases follow from arguments that are completely analogous. If we can show that Sr (C ) SAB Us (P ). (5 ) (4) (3) we will be done with the proof by the following argumentation. Let D Ur (C ) - - SAB . Consider P D. By definition of SAB as a section we have - - P D S AB . (6 ) Note that every ray starting within Ur (C ) will finally hit the boundary of Ur (C ) - - which is Sr (C ). Therefore, while traveling from D Ur (C ) on P D away from P , one will hit Sr (C ). Denote by E the point of that intersection. We compute (5) - (6) - E Sr (C ) P D Sr (C ) SAB Us (P ). However, by construction, D is at least as close to P as E , so D Us (P ) with which we have completed the proof. In order to finally show (5) observe that, because of (1), P Ur (C ), that is, P - is inside of Sr (C ), whereas A, B are not inside. Therefore, Sr (C ) intersects P A - - resp. P B between P and, at the latest (maybe just there), A resp. B . Therefore, by definition of s, - - - (7 ) Sr (C ) P A, Sr (C ) P B Bs (P ). - - - - However, for F Sr (C ) P C we have that, as all P, C, F P C (*), ||F - P || = ||P - C || + ||C - F || = ||P - C || + r > r s where ||P - C || > r follows from (i) which implies P = C whereas the last inequation follows from the definition of s. This translates to - - Sr (C ) P C Bs (C ) (8 ) Note that two non-concentric circles (here: Sr (C ), Ss (P )) can intersect at most two points. In our case, these two points establish the transitions of Sr (C ) from being inside Bs (P ) to being outside Bs (P ). Combining this insight with (7,8) implies that, within SAB , Sr (C ) is contained in Bs (P ) which establishes (5). ( ) September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak In the following we will write Rx , Ry , Rz for the coordinates of R = (Rx , Ry , Rz ) R3 . Lemma 3.2. Let P, Q, A, B , C R3 such that Az = Bz = Cz = 0 and Pz Qz 0 (9 ) (1 0 ) (1 1 ) ||P - A||, ||P - B ||, ||P - C ||, ||Q - A||, ||Q - B ||, ||Q - C || < r ||A - B ||, ||A - C ||, ||B - C || r. Then, for s max{||P - A||, ||P - B ||, ||P - C ||}, it holds that Q Us (P ). (1 2 ) Proof. Intuitively speaking, this lemma is trying to show that if three points in two-dimensional space are in proximity of two other points, then these two points have to be proximate to each other. Let P = (Px , Py , 0), Q = (Qx , Qy , 0) be the projection of P and Q onto the x - y -space. Combining (9) and (10) implies that also ||P - A||, ||P - B ||, ||P - C ||, ||Q - A||, ||Q - B ||, ||Q - C || < r. (13) Hence P , A, B , C satisfy the conditions of lemma 3.1 and applying the lemma yields ||P - Q || max{||P - A||, ||P - B ||, ||P - C ||}. Wlog. let A such that ||P - Q || ||P - A||. We compute ( ||P - Q|| = Px - Qx )2 + (Py - Qy )2 + (Pz - Qz )2 = (ii) (14) | | |P - Q ||2 + (Pz - Qz )2 |P - A||2 + (Pz - Qz )2 | 2 |P - A||2 + Pz = ||P - A||. (1 4 ) We are now in position to prove theorem 3.1. Proof. [Th. 3.1] We assume the contrary in both cases. For the first part let A, B , C R2 and P, Q R2 be the partitions of the sampled K3,2 . By definition of a K3,2 in a GRG, P, A, B , C satisfy (1) in lemma 3.1 where r is the threshold of the 2-dimensional GRG. Applying lemma 3.1 yields ||P - Q|| < s r which is a contradiction to having an edge between P and Q. For the second part let P, Q, R R3 and A, B , C R3 be the two partitions of the sampled K3,3 . Observe that any three points in R3 lie on a plane. By necessarily rotating all points around the origin (which is an orthogonal transformation September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak hence norm preserving) we can assume wlog. that A, B , C lie in the x - y -space of R3 . By necessarily reflecting P, Q, R at the x - y -space which, again, is a norm preserving transformation, we can moreover assume that, for two of the P, Q, R, the z-coordinates are non-negative. Wlog. Pz Qz 0. By definition of a K3,3 in a 3-dimensional GRG, this establishes (9,10,11) of lemma 3.2 where r is the threshold of the 3-dimensional GRG. Applying lemma 3.2 to P, Q, A, B , C yields Q Us (P ) which yields ||P - Q|| < s r. This is a contradiction to that there is no edge between P and Q. 4. Results We present our results on dense graphlet statistics for both Yeast and E.coli PPI networks for graphlet densities in the range [0.85, 1.00] and number of nodes in the range [7, 14].a We also present similar dense motif statistics for random graphs generated by both 3-dimensional GRGM (unit cube) and GDM which are set to generate identical number of nodes and edges to the Yeast PPI network13 b , i.e. |V | 4900 and |E | 17000.c Due to limitations on computational resources, we were not able to count graphlets of larger sizes or smaller density. In our experiments we have used the network available from DIP 13 for Yeast and E.coli. Yeast PPI network has around 4900 nodes and 17000 edges, and E.coli network has around 1441 nodes and 5871 edges. By considering the distribution (fraction) of dense graphlets (between range [0.85, 1.00] ) for each species, we are trying to eliminate the effect caused by difference in average degree between these two species. In Figure 1 we depict for each n {7, . . . , 12} (the number of nodes in the graphlet), how the proportion of all graphlets with k edges among all dense graphlets (with density 0.85) vary with respect to k - for all four PPI networks considered. Note that we give only proportional distributions here as the Yeast and the E.coli networks have different number of edges and nodes and thus it is not meaningful to compare absolute figures. It is possible to make a number of observations on Figure 1: (1a) The fractional distribution of dense graphlets in GRGM is significantly different from that of the other networks: for example, it contains no graphlets with density 0.85 for n = 12, 13 and 14 and no graphlets with density 0.89 (i.e. remind the reader that the density of a graphlet with n nodes is the ratio of the number of edges in the graphlet and n(n - 1)/2, the maximum number of edges possible between the nodes of the graphlet; a density of 1 indicates a clique. b The DIP release date for Yeast and E.coli PPI networks is July 7, 2007. c This is possible by setting the radius of the GRGM to an appropriate value and setting p = 0.365 and r = 0.12 in the GDM as per7,1 . a We September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 18 18.5 19 19.5 20 20.5 21 Yeast Ecoli Duplication Geometric 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 24 24.5 25 25.5 26 26.5 27 27.5 28 Yeast Ecoli Duplication Geometric (a) n=7 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 31 32 33 34 35 36 Yeast Ecoli Duplication Geometric 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 39 40 (b) n=8 Yeast Ecoli Duplication Geometric 41 42 43 44 45 (c) n=9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 47 48 49 50 51 52 53 54 55 Yeast Ecoli Duplication Geometric 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 57 58 59 (d) n=10 Yeast Ecoli Duplication Geometric 60 61 62 63 64 65 66 (e) n=11 (f) n=12 Figure 1. Here we demonstrate for each n {7, 8, 9, 10, 11, 12} the fraction of graphlets in each PPI network which have k edges, among all graphlets with density 0.85. The specific PPI networks are depicted with the following colors: Yeast (Red), E.coli (Green), GDM (Blue) and GRGM (Purple). For example, plot (a) indicates that, the graphlets in the E.coli PPI network (Green) with 7 nodes and 18 edges, cumulatively, form 80% of all of its dense graphlets with density 0.85. with 49 edges or more) for n = 11. This, together with the proof that we provide in Section 3.4, this observation seems to support a negative answer to the question (i-a) in Section 1.1. (1b) The fractional distribution of GDM largely agrees with the two PPI networks for n 12. However, GDM fails to generate any dense network (with density 0.85) with n = 13 or 14. September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak (2) The fractional distribution (of dense graphlets) in Yeast and E.coli are quite similar for smaller values of n. However for n = 11 and especially for n = 12, the fraction of graphlets in Yeast increase with density. For example, for n = 12, the fraction of graphlets with 59 edges is 0.1 in Yeast whereas it is 0.05 in E.coli. This observation seems to provide a positive answer to question (ii) in Section 1.1 (although we believe much more study should be done regarding this question). In Figure 2, we depict how the total number of dense graphlets (with density 0.85) with n nodes (n = {3, . . . , 14}) vary as a function of n - for the Yeast PPI network as well as the specific networks generated by GRGM and GDM whose parameters were set to emulate the Yeast network. 1e+06 100000 10000 1000 100 10 1 2 4 6 8 10 12 14 Yeast Duplication Geometric Figure 2. Total number of dense subgraphs (with density 0.85) with n nodes - in the Yeast PPI network as well as networks generated by GRGM and GDM - whose parameters are set to emulate the Yeast PPI network. The specific colors used are GRGM (Blue), GDM (Green) and Yeast (Red). As can be seen in Figure 2, there is a wide gap between the total numbers of dense graphlets in the the Yeast PPI network and the random networks generated by GDM and GRGM. Although the number of dense graphlets for n = 6 is consistent with an earlier study 7 , the figure 2 shows substantially difference for n > 6 between GDM (or GRGM) and Yeast PPI network, especially for n 8, where there is a 7-fold (or 50-fold) difference respectively. More drastically GRGM includes no dense graphlets with n = 12 nodes and GDM includes no dense graphlets with n = 14 nodes. Once again, Figure 2 seems to support a negative answer to question (i-a) in Section 1.1: i.e. GRGM is not suitable for emulating the growth of PPI networks. The figure also seems to imply (somewhat less strongly) a negative answer to question (i-b), i.e., the GDM used in this paper does not capture the dense graphlet distribution of the Yeast PPI network. We leave the possibility of how (or if) we can modify the seed network or the GDM itself so September 22, 2008 16:55 Proceedings Trim Size: 9in x 6in rcolak as to better capture the distribution of denser graphlets of size bigger than 6 (see 7 ) via a duplication oriented model to future study. Our final results are on the fractional distributions of all Kn,n 's and all Kn,n-1 's (which are all 0.55-dense complete bipartite graphs) up to n = 5 in each of the PPI networks we considered. Bipartite Graph K2,3 K3,3 K3,4 K4,4 K4,5 K5,5 Ecoli 2685054 2188868 11103153 5155489 13561155 1125496+ Yeast 498844 376186 1677626 852301 2077675 659614 Duplication 337218 23311 21623 519 129 2 Geometric 153 0 0 0 0 0 In the above table, it can be seen that in the E.Coli and the Yeast PPI networks, complete bipartite graphlets are abundant. However, as can be deduced from theorem 3.1, the GRGM cannot generate Kn,m 's for n, m 3. Our experiments confirm this finding: there are no Kn,n 's and no Kn,n+1 's in the GRGM network fo r n 3 . References 1. N. Alon, P. Dao, I. Hajirasouliha, F. Hormozdiari and S.C. Sahinalp, Proceedings of the ISMB 2008, (2008). 2. U. Alon, An Introduction to Systems Biology: Design Principles of Biological Circuits (2006). 3. K. Baskerville and M. Paczuski, Physical Review E 74, 051903 (2006). 4. R. Colak, F. Moser, A. Rafiey and M. Ester, Tech Report, Simon Fraser University, (2008). 5. J. Grochow and M. Kellis, Proc. RECOMB 2008, 92, (2008). 6. D.J. Higham, M. Rasajski and N. Przulj, Bioinformatics 24, 1093 (2008). 7. F. Hormozdiari, P. Berenbrink, N. Przulj and S.C. Sahinalp, PLoS Computational Biology 3, e118 (2007). 8. R.M. Karp, in Complexity of Computer Computations, Plenum Press, 85, 1972. Theoretical Computer Science 369, 234 (2006). 9. M. Middendrof, E. Ziv, and C. Wiggins, PNAS 102, 9 (2005). 10. R. Milo, S. Shen-Orr, S. Itzkovski, N. Kashtan, D. Chklovskii and U. Alon, Science 298, 824 (2002). 11. ME. Newman, SH. Strogatz, DJ. Watts, Phys Rev E Stat Nonlin Soft Matter Phys. (2001). 12. N. Przulj, D.G Corneil and I. Jurisica, Bioinformatics 20, 3508 (2004). 13. L. Salwinski, C.S. Miller, A.J. Simith, F.K. Pettit, J.U. Bowie and D. Eisenberg, Nucleic Acid Research 32, Database issue:D449-52 (2004).