Computational Strategy for Discovering Druggable Gene Networks from Genome-Wide RNA Expression Profiles Seiya Imoto, Yoshinori Tamada, Hiromitsu Araki, Kaori Yasuda, Cristin G. Print, Stephen D. Charnock-Jones, Deborah Sanders, Christopher J. Savoie, Kousuke Tashiro, Satoru Kuhara, and Satoru Miyano Pacific Symposium on Biocomputing 11:559-571(2006) September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto COMPUTATIONAL STRATEGY FOR DISCOVERING DRUGGABLE GENE NETWORKS FROM GENOME-WIDE RNA EXPRESSION PROFILES SEIYA IMOTO1, , YOSHINORI TAMADA2, , HIROMITSU ARAKI3, , KAORI YASUDA3 , CRISTIN G. PRINT4, , STEPHEN D. CHARNOCK-JONES4 , DEBORAH SANDERS4 , CHRISTOPHER J. SAVOIE3 , KOUSUKE TASHIRO5 , SATORU KUHARA5 , SATORU MIYANO1 Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1, Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan 2 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan 3 Gene Networks International, 4-2-12, Toranomon, Minato-ku, Tokyo, 105-0001, Japan 4 Department of Pathology, Cambridge University, Tennis Court Road, Cambridge, CB2 1QP, United Kingdom 5 Graduate School of Genetic Resources Technology, Kyushu University, 6-10-1, Hakozaki, Higashi-ku, Fukuoka, 812-8581, Japan We propose a computational strategy for discovering gene networks affected by a chemical compound. Two kinds of DNA microarray data are assumed to be used: One dataset is short time-course data that measure responses of genes following an experimental treatment. The other dataset is obtained by several hundred single gene knock-downs. These two datasets provide three kinds of information; (i) A gene network is estimated from time-course data by the dynamic Bayesian network model, (ii) Relationships between the knocked-down genes and their regulatees are estimated directly from knock-down microarrays and (iii) A gene network can be estimated by gene knock-down data alone using the Bayesian network model. We propose a method that combines these three kinds of information to provide an accurate gene network that most strongly relates to the mode-of-action of the chemical compound in cells. This information plays an essential role in pharmacogenomics. We illustrate this method with an actual example where human endothelial cell gene networks were generated from a novel time course of gene expression following treatment with the drug fenofibrate, and from 270 novel gene knock-downs. Finally, we succeeded in inferring the gene network related to PPAR-, which is a known target of fenofibrate. 1 *These authors contributed equally to this work. Current affiliation: Department of Molecular Medicine & Pathology, School of Medical Sciences, University of Auckland, Private Bag 92019, Auckland, New Zealand September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto 1. Intro duction The microarray technology has produced a huge amount of gene expression data under various conditions such as gene knock-down, overexpression, experimental stressors, transformation, exposure to a chemical compound, and so on. Using a large volume of microarray gene expression data, a number of algorithms together with mathematical models1,5,7,9,12,23 for estimating gene networks has been proposed and successfully applied to the gene network estimation of S. cerevisiae, E. coli etc. As a real application of gene network estimation techniques, computational drug target discovery19 enhanced with gene network inference6,14,20,22 has made tremendous impacts on pharmacogenomics. In this paper, we propose a computational strategy for discovering the druggable gene networks, which are most strongly affected by a chemical compound. For this purpose, we use two types of microarray data: One is gene expression data obtained by measuring transcript abundance responses over time following treatment with the chemical compound. The other is gene knock-down expression data, where one gene is knocked-down for each microarray. Figure 1 is the conceptual view of our strategy. First, we estimate dynamic relationships denoted by GT between genes based on time-course data by using dynamic Bayesian networks.17 Second, in gene knock-down expression data, since we know the information of knockeddown genes, possible regulatory relationships between knocked-down gene and its regulatees can be obtained. We denote this information by R. Finally, the gene network GK is estimated by gene knock-down data denoted by X K together with GT and R by using Bayesian networks based on multi-source biological information.13 The key idea for estimating a gene network based on multi-source biological information is to use GT and R as the Bayesian prior probability of GK . The prior probability of the graph proposed by Imoto et al.13 only uses binary prior information, i.e. known or unknown for each gene-gene relation. In this paper, we extend the prior probability of graph13 in order to use prior information represented as continuous values. After estimating a gene network, for extracting biologically plausible information from the estimated gene network, we have also developed a gene network analysis tool called iNET that is an extended version of G.NET.14 The iNet tool provides a computational environment for various path searches among genes with annotated gene network visualization. As for related works, Basso et al.2 estimated a gene network of human B cells as an undirected graph by their proposed algorithm. Our aim is to September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto time Estimation: Time-course expression data dynamic Bayesian network time t Estimation: time t+1 Possible dynamic relations measure mRNAs shock Bayesian network based on multi-source biological information with bootstrapping Analysis: iNet } knock-down knock-down knock-down ( Gene knock-down library estimate druggable gene networks as directed graphs, that are sub-networks of the tissue-specific network. In this method, the edge direction is very important information and selection of compound-related genes is necessary. Therefore, the aim of this paper is clearly different from theirs. Di Bernardo et al.6 proposed an interesting method for identifying mode-ofaction of a chemical compound based on microarray gene expression data. Di Bernardo et al.6 used statistical inference of a linear regression-based network model to find affected genes by a chemical compound. On the other hand, our interest is not only in the identification of affected genes, but also in the elucidation of their dependency as the network. In addition, since di Bernardo et al.6 used examples of S. cerevisiae genes, more discussions might be needed in order to apply their method to human genes. To demonstrate the whole process of the proposed method, we analyze expression data from human endothelial cells. We generate new timecourse data that reveal the responses of human endothelial cell transcripts to treatment with the anti-hyperlipidaemia drug fenofibrate. We also generate new data from 270 gene knock-down experiments in human endothelial cells. The fenofibrate-related gene network is estimated based on fenofibrate time-course data and 270 gene knock-down expression data by the proposed method. The estimated gene network reveals gene regulatory relationships related to PPAR-, which is known to be activated by fenofibrate. Our ( Treatment shocks chemical compound ... KD Information: Which genes are knocked-down Possible regulatory relations measure mRNAs Gene knock-down expression data Figure 1. Conceptual view of the proposed method. September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto computational analysis suggests that this computational strategy based on gene knock-down and drug-dosed time-course microarrays will give a new way to druggable gene discovery. 2. Metho ds for Reverse-Engineering Gene Networks In the proposed method, we use Bayesian networks and dynamic Bayesian networks for estimating gene networks from gene knock-down and timecourse microarray data, respectively. In this section, we briefly describe these two network models and then elucidate how we combine multi-source biological information to estimate more accurate gene networks. 2.1. Preliminary Suppose that we have the observational data X of the set of p random variables X = {X1 , ..., Xp } and that the dependency among p random variables, shown as a directed graph G, is unknown and we want to estimate it from X . In gene network estimation based on microarray data, a gene is regarded as a random variable representing the abundance of a specific RNA species, and X is the microarray data. From a Bayes approach, the optimal graph is selected by maximizing the posterior probability of the graph conditional on the observed data. By the Bayes' theorem, the posterior probability of the graph can be represented as p(G|X ) = p(G)p(X |G) p(G)p(X |G), p(X ) where p(G) is the prior probability of the graph, p(X |G) is the likelihood of the data X conditional on G and p(X ) is the normalizing constant and does not depend on the selection of G. Therefore, we need to set p(G) and compute p(X |G) for the graph selection based on p(G|X ). The prior probability of the graph p(G) enables us to use biological data other than microarray data to estimate gene networks and the likelihood p(X |G) can be computed by Bayesian networks and dynamic Bayesian networks from gene knock-down and time-course microarray data, respectively. We elucidate how we construct p(G|X ) in the following sections. 2.2. Bayesian Networks Bayesian networks are a graphical model that represents the causal relationship in random variables. In the Bayesian networks, we use a directed September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto acyclic graph encoding Markov relationship between connected nodes. Suppose that we have a set of random variables X = {X1 , ..., Xp } and that there is a causal relationship in X by representing a directed acyclic graph GK . Bayesian networks then enable us to compute the joint probability by the product of conditional probabilities Pr(X ) = p j =1 Pr(Xj |P aj ), (1) where P aj is the set of random variables corresponding to the direct parents of Xj in GK . In gene network estimation, we regard a gene as a random variable representing the abundance of a specific RNA species, shown as a node in a graph, and the interaction between genes is represented by the direct edge between nodes. Let X K be an N × p gene knock-down data matrix whose (i, j )-th element xj |Di corresponds to the expression data of j -th gene when Di th gene is knocked down, where j = 1, ..., p and i = 1, ..., N . Here we assume that i-th knock-down microarray is measured by knocking-down Di -th gene. Since microarray data take continuous variables, we represent the decomposition (1) by using densities fBN (X K | , GK ) = Np i=1 j =1 fj (xj |Di |paj |Di , j ), where = ( 1 , ..., p ) is a parameter vector, paj |Di is the expression value vector of P aj measured by i-th knock-down microarray. Hence, the construction of the graph GK is equivalent to model the conditional probabilities fj (j = 1, ..., p), that is essentially the same as the regression problem. For constructing fj (xj |Di |paj |Di , j ), we assume the nonparametric regression model with B -splines of the form |P aj | xj | D i = (k) paj |Di mj k (paj |Di ) + j |Di , (k) k=1 where is the k -th element of paj |Di , j |Di i.i.d.N (0, 2 ) for i = 1, ..., N , and mj k (k = 1, ..., |P aj |) are smooth functions constructed by Mjk (j k) (j k) (j k) (j k) B -splines as mj k (x) = m=1 m bm (x). Here m and bm (x) (m = 1, ..., Mj k ) are parameters and B -splines, respectively. The likelihood p(X K |GK ) is then obtained by p(X K |GK ) = fBN (X K | , GK )p( |, GK )d , (2) September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto where p( |, GK ) is the prior distribution on the parameter specified by the hyperparameter . The high-dimensional integral can be asymptotically approximated with an analytical form by the Laplace approximation and Imoto et al.12 defined a graph selection criterion, named BNRC, of the form BNRC(GK ) = -2 log{p(GK )} - r log(2 /N ) ^ ^ + log |J ( |X K )| - 2N l ( |X K ), where 1 {log fBN (X K | , GK ) + log p( |, GK )} , N 2 J ( |X K ) = - ( |X K ), l l ( |X K ) = ^ r is the dimension of , and is the mode of l ( |X K ). The network structure is learned so that BNRC(GK ) decreases by the greedy hillclimbing algorithm.12 We should note that the solution obtained by the greedy hill-climbing algorithm cannot be guaranteed as the optimal. To find better solution, we repeat the greedy algorithm and choose the best ^ one as GK . It happens quite often that the likelihood p(X K |GK ) gives almost the same values for several network structures, construction an effective p(GK ) based on various kinds of biological information is a key technique. We elucidate how we construct p(GK ) in Section 2.4. 2.3. Dynamic Bayesian Networks Dynamic Bayesian networks represent the dependency in random variables based on time-course data. Let X (t) = {X1 (t), ..., Xp (t)} be the set of p random variables at time t (t = 1, ..., T ). In the dynamic Bayesian networks, a directed graph that contains p nodes is rewritten as a complete bipartite graph that allows direct edges from X (t) to X (t + 1), where t = 1, ..., T - 1. The directed graph GT of the causal relationship among p random variables is then constructed by estimating the bipartite graph defined above. Under GT structure, we then have the decomposition Pr(X (1), ..., X (T )) = p T t=1 j =1 Pr(Xj (t)|P aj (t - 1)), (3) where P aj (t) is the set of random variables at time t corresponding to the direct parents of Xj in GT . September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto Let X T be a T × p time-course data matrix whose (t, j )-th element xj (t) corresponds to the expression data of j -th gene at time t, where j = 1, ..., p and t = 1, ..., T . As we described in the Bayesian networks, the decomposition in (3) holds by using densities fDBN (X T | , GT ) = p T t=1 j =1 fj (xj (t)|paj (t - 1), j , GT ), where = (1 , ..., p ) is a parameter vector, paj (t) is the expression value vector of direct parents of Xj measured at time t. Here we set paj (0) = . We can construct fDBN by using nonparametric regression with B -splines in the same way of the Bayesian networks. Therefore, by replacing fBN by fDBN in (2), Kim et al.17 proposed a graph selection criterion for dynamic Bayesian networks, named BNRCdynamic , with successful applications. 2.4. Combining Multi-Source Biological Information for Gene Network Estimation Imoto et al.13 proposed a general framework for combining biological knowledge with expression data aimed at estimating more accurate gene networks. In Imoto et al.13 , the biological knowledge is represented as the binary values, e.g. known or unknown, and is used for constructing p(G). In reality, there are, however, various confidence in biological knowledge in practice. Bernard and Hartemink3 constructed p(G) using the binding location data18 that is a collection of p-values (continuous information). In this paper, we construct p(G) by using multi-source information including continuous and discrete prior information. Let Z k is the matrix representation of k -th prior information, where (k ) (i, j )-th element zij represents the information of "gene i gene j ". For example, (1) If we use a prior network Gprior for Z k , zij takes 1 if e(i, j ) Gprior or 0 if e(i, j ) Gprior . Here e(i, j ) denotes the direct edge from / (k) gene i to gene j . (2) By using the gene knock-down data for Z k , zij represents the value that indicates how gene j changes by knocking down gene i. We can use the absolute value of the log-ratio of gene j for gene i (k) knock-down data as zij . Using the adjacent matrix E = (eij )1i,j p of G, where eij = 1 for e(i, j ) G or 0 for otherwise, we assume the Bernoulli distribution on eij having probabilistic function p(eij ) = ijij (1 - ij )1-eij , e (k) September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto where ij = Pr(eij = 1). For constructing ij , we use the logistic model K (k) with linear predictor ij = k=1 wk (zij - ck ) as ij = {1 + exp(-ij )}-1 , where wk and ck (k = 1, ..., K ) are weight and baseline parameters, respectively. We then define a prior probability of the graph based on prior information Z k (k = 1, ..., K ) by p(G) = p(eij ). i j This prior probability of the graph assumes that edges e(i, j ) (i, j = 1, ..., p) are independent of each other. In reality, there are several dependencies among eij 's such as p(eij = 1) < p(eij = 1|eki = 1), and so on, we consider adding such information into p(G) is premature by the quality of such information. 3. Application to Human Endothelial Cells' Gene Network 3.1. Fenofibrate Time-Course Data We measure the time-responses of human endothelial cell genes to 25µM fenofibrate. The expression levels of 20,469 probes are measured by CodeLinkTM Human Uniset I 20K at six time-points (0, 2, 4, 6, 8 and 18 hours). Here time 0 means the start point of this observation and just before exposure to the fenofibrate. In addition, we measure this time-course data as the duplicated data in order to confirm the quality of experiments. Since our fenofibrate time-course data are duplicated data and contain six time-points, there are 26 = 64 possible combinations to create a timecourse dataset. We should fit the same regression function to a parent-child relationship in the 64 datasets. Under this constrain, we consider fitting nonparametric regression model to the connected data of 64 datasets. That (c) (c) is, if we consider gene i gene j , we will fit the model xj (t) = mj (xi (t - 1)) + j (t), where xj (t) is the expression data of gene j at time t in the c-th dataset for c = 1, ..., 64. In the Bayesian networks, the reliability of estimated edges can be measured by using the bootstrap method. For timecourse data, several modifications of the bootstrap method are proposed such as block resampling, but it is difficult to apply these methods to the small number of data points generated by short time-courses. However, by using above time-course modeling, we can define a method based on the bootstrap as follows: Let D = {D(1), ..., D(64)} be the combinatorial timecourse data of all genes. We randomly resample D(c) with replacement and define a bootstrap sample D = {D (1), ..., D (64)}. We then re(c) September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto estimate a gene network based on D . We repeat 1000 times bootstrap ^ ^ ^ replications and obtain G1 , ..., G1000 , where GB is the estimated graph T T T based on the B -th bootstrap sample. The estimated reliability of edge can be used as the matrix representation of the first prior information Z 1 as (1) ^ zij = #{B |e(i, j ) GB , B = 1, ..., 1000}/1000. T 3.2. Gene Knock-Down Data by siRNA For estimating gene networks, we newly created 270 gene knock-down data by using siRNA. We measure 20,469 probes by CodeLinkTM Human Uniset I 20K for each knock-down microarray after 24 hours of siRNA transfection. The knock-down genes are mainly transcription factors and signaling ~ molecules. Let xDi = (x1|Di , ..., xp|Di ) be the raw intensity vector of i-th ~ ~ knock-down microarray. For normalizing expression values of each microarray, we compute the median expression value vector v = (v1 , ..., vp ) as the control data, where vj = mediani (xj |Di ). We apply the loess normaliza~ tion method to the MA transformed data and the normalized intensity xj |Di is obtained by applying the inverse transformation to the normalized log(xj |Di /vj ). We refer to the normalized log(xj |Di /vj ) as the log-ratio. ~ ~ In 270 gene knock-down microarray data, we know which gene is knocked-down for each microarray. Thus, when we knock-down gene Di , genes that significantly change their expression levels can be considered as the direct regulatees of gene Di . We measure this information by computing corrected log-ratio as follows: The fluctuations of the log-ratios depend on their sum of sample's and control's intensities. From the normalized MA transformed data, we can obtain the conditional variance sj = Var[log(xj |Di /vj )| log(xj |Di · vj )] and the log-ratios can be corrected (2) (2) zij = log(xj |Di /vj )/sj satisfying Var(zij ) = 1. 3.3. Results For estimating fenofibrate-related gene networks from fenofibrate timecourse data and 270 gene knock-down data, we first define the set of genes that are possibly related to fenofibrate as follows: First, we extract the set of genes whose variance-corrected log-ratios, | log(xj |Di /vj )/sj |, are greater than 1.5 from each time point. We then find significant clusters of selected genes using GO Term Finder. Table 1 shows the significant clusters of genes at 18 hours. The first column indicates how expression values are changed, i.e. " " and " " mean "overexpressed" and "suppressed", respectively. The GO annotations of clusters with " " are mainly related to cell cycle, September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto Table 1. Significant GO annotations of selected fenofibrate-related genes from 18 hours microarray. GO:0007049 GO:0000278 GO:0000279 GO:0006629 GO:0007067 GO:0000087 GO:0000074 GO:0044255 GO:0016126 GO:0016125 GO:0008203 GO:0006695 GO:0008202 GO:0000375 GO:0000377 GO:0000398 GO:0006694 GO:0016071 GO Function cell cycle mitotic cell cycle M phase lipid metabolism mitosis M phase of mitotic cell cycle regulation of cell cycle cellular lipid metabolism sterol biosynthesis sterol metabolism cholesterol metabolism cholesterol biosynthesis steroid metabolism RNA splicing, via transesterification reactions RNA splicing, via transesterification reactions with bulged adenosine as nucleophile nuclear mRNA splicing, via spliceosome steroid biosynthesis mRNA metabolism p-value 1.0E-08 3.7E-07 5.0E-06 1.3E-05 1.3E-05 1.6E-05 2.7E-05 4.4E-05 4.3E-04 4.5E-04 1.5E-03 2.4E-03 3.6E-03 4.1E-03 4.1E-03 4.1E-03 6.0E-03 6.3E-03 #genes 35 19 17 25 15 15 22 21 6 8 7 5 10 9 9 9 7 13 the genes in these clusters are expressed ubiquitously and this is a common biological function. On the other hand, the GO annotations of clusters with " " are mainly related to lipid metabolism. In biology, it is reported that the fenofibrate acts around 12 hours after exposure.8,10 Our first analysis for gene selection suggests that fenofibrate affects genes related to lipid metabolism and this is consistent with biological facts. We also focus on the genes from the 8 hour time-point microarray. Unfortunately, no cluster with specific function could be found in the selected genes from the 8 hour time-point microarray However, there also exist some genes related to lipid metabolism. Therefore we use the genes from the 8 and 18 hour time-point microarrays. Finally we add the 267 knock-down genes (three genes are not spotted on our chips) to the selected genes above, total 1192 genes are defined as possible fenofibrate-related genes and used for the next network analysis. By converting the estimated dynamic network and knock-down gene information into the matrix representations of the first and second prior ^ information Z 1 and Z 2 , respectively, we estimate the gene network GK based on Z 1 , Z 2 and the knock-down data matrix X K . For extracting biological information from the estimated gene network, we first focus on lipid metabolism-related genes, because the clusters related this func- September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto Figure 2. Down-stream of PPAR-. tion are significantly changed at 18 hours microarray. In the estimated gene network, there are 42 lipid metabolism-related genes and PPAR- (Homo sapiens peroxisome proliferative PPAR- activated receptor, alpha) is the only transcription factor among them. AcPLA2R1 ZNF198 tually, PPAR- is a known target of PTPRB fenofibrate. Therefore, we next focus ENC1 ZNF251 on the node down-stream of PPAR-. ATR In Figure 2, the node down-stream of PCDHA PPAR- (491 genes). Here we consider STAT5B that genes in the four steps down-stream GLS VLDLR of PPAR- are candidate regulatees of LDLR PPAR-. Among the candidate regFigure 3. A sub-network related to ulatees of PPAR-, there are 21 lipid PPAR-. metabolism-related genes and 11 mole- September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto cules previously identified experimentally to be related to PPAR-. Actually, PPAR- is known to be activated by fenofibrate. We show one sub-network having PPAR- as a root node in Figure 3. One of the drug efficacies of fenofibrate whose target is PPAR- is to reduce LDL cholesterol. LDLR and VLDLR mainly contribute the transporting of cholesterol and they are children of PPAR-, namely candidate regulatees of PPAR-, in our estimated network. As for LDLR, it has been reported the relationship with PPAR-.15 Moreover, several genes related to cholesterol metabolism are children of PPAR- in our network. We also could extract STAT5B and GLS that are children of PPAR- and have been reported their regulationrelationships with PPAR-.16,21 Therefore, it is not surprising that our network shows that many direct and indirect relationships involving known PPAR- regulatees are triggered in endothelial cells by fenofibrate treatment. In the node up-stream of PPAR-, PPAR- and RXR-, which form a heterodimer, share a parent. We could extract fenofibrate-related gene network and estimate that PPAR- is the one of the key molecules of fenofibrate regulations without previous biological knowledge. 4. Discussion From the point of view of pharmacogenomics, it is very important to know druggable gene networks. Our gene networks have the potential to predict the mode-of-action of a chemical compound, discover more effective drug target and predict side-effects. In this paper, we proposed a computational method to discover gene networks relating to a chemical compound. We use gene knock-down microarray data and time-course response microarray data for this purpose and combine multiple information obtained from observational data in order to estimate accurate gene networks under a Bayesian statistics framework. We illustrated the entire process of the proposed method using an actual example of gene network inference in human endothelial cells. Using fenofibrate time-course data and data from gene knock-downs in human endothelial cells, we successfully estimated a gene network related to the drug fenofibrate, which is a known agonist of PPAR-. In the estimated gene network, PPAR- has many direct and indirect regulatees including lipid metabolism related genes and this result indicates PPAR- works as a trigger of the estimated fenofibrate-related network. There are many known relationships in the candidate regulatees of PPAR- and we could find the relationship between PPAR- and RXR in the estimated network. Peroxisome proliferator-activated receptors September 23, 2005 21:15 Proceedings Trim Size: 9in x 6in imoto (PPARs) are ligand-activated transcription factors expressed by endothelial cells and several other cell types. They are activated by ligands such as naturally occurring fatty acids and synthetic fibrates. Once activated, they heterodimerize with the retinoid-X-receptor (RXR) to activate the transcription of target genes. Many of these genes encode proteins that control carbohydrate and glucose metabolism and down-regulate inflammatory responses.4 The further details on the relation between PPAR- and RXR- and their common parent will be discussed in another paper with biological evidences. Acknowledgements We wish to acknowledge Ben Dunmore, Sally Humphries, Muna Affara and Yuki Tomiyasu for assistance with endothelial cell culture and gene array analysis. Computation time was provided by the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. T. Akutsu et al., Pac. Symp. Biocomput., 4:17­28, 1999. K. Basso et al., Nat. Genet., 37:382­390, 2005. A. Bernard and A.J. Hartemink, Pac. Symp. Biocomput., 10:459­470, 2005 A. Cabrero et al., Curr. Drug Targets Inflamm. Al lergy, 1:243­248, 2002. T. Chen et al., Pac. Symp. Biocomput., 4:29­40, 1999. D. di Bernardo et al., Nat. Genet., 37:382­390, 2005. N. Friedman et al., J. Comp. Biol., 7:601­620, 2000. K. Goya et al., Arterioscler. Thromb. Vasc. Biol., 24:658­663, 2004. A.J. Hartemink et al., Pac. Symp. Biocomput., 7:437­449, 2002. K. Hayashida et al., Biochem. Biophys. Res. Commun., 323:1116­1123, 2004. D. Heckerman et al., Machine Learning, 20:197­243, 1995. S. Imoto et al., Pac. Symp. Biocomput., 7:175­186, 2002. S. Imoto et al., J. Bioinform. Comp. Biol., 2:77­98, 2004. S. Imoto et al., J. Bioinform. Comp. Biol., 1:459­474, 2003. K.K. Islam et al., Biochim. Biophys. Acta., 1734:259­268, 2005. S. Kersten et al., FASEB J., 15:1971­1978, 2001. S. Kim et al., Biosystems, 75:57­65, 2004. T.I. Lee et al., Science, 298:799­804, 2002. M.J. Marton et al., Nat. Med., 4:1293­1301, 1998. C.J. Savoie et al., DNA Res., 10:19­25, 2003. J.M. Shipley and D.J. Waxman, Mol. Pharmacol., 64:355­364, 2003. Y. Tamada et al., Genome Informatics, 16:182­191, 2005. E.P. van Someren et al., Pharmacogenomics, 3:507­525, 2002.