Pacific Symposium on Biocomputing 12:193-204(2007) METABOLIC FLUX PROFILING OF REACTION MODULES IN LIVER DRUG TRANSFORMATION JEONGAH YOON, KYONGBUM LEE Department of Chemical & Biological Engineering, Tufts University, 4 Colby Street, Medford, MA, 02155, USA With appropriate models, the metabolic profile of a biological system may be interrogated to obtain both significant discriminatory markers as well as mechanistic insight into the observed phenotype. One promising application is the analysis of drug toxicity, where a single chemical triggers multiple responses across cellular metabolism. Here, we describe a modeling framework whereby metabolite measurements are used to investigate the interactions between specialized cell functions through a metabolic reaction network. As a model system, we studied the hepatic transformation of troglitazone (TGZ), an antidiabetic drug withdrawn due to idiosyncratic hepatotoxicity. Results point to a welldefined TGZ transformation module that connects to other major pathways in the hepatocyte via amino acids and their derivatives. The quantitative significance of these connections depended on the nutritional state and the availability of the sulfur containing amino acids. 1. Introduction Metabolites are intermediates of essential biochemical pathways that convert nutrient fuel to energy, maintain cellular homeostasis, eliminate harmful chemicals, and provide building blocks for biosynthesis. Many metabolites are in free exchange with the extracellular medium, and may be used to obtain quantitative estimates of biochemical pathway activities in intact cells. In recent years, metabolite measurement arrays, or metabolic profiles, in conjunction with appropriate models, have been used for a variety of applications, e.g. comparisons of plant phenotypes [1], elucidation of new gene functions [2], and discovery of disease biomarkers [3]. Another promising application is the study of drug-mediated toxicity in specialized metabolic organs such as the liver. One approach to identifying drug toxicity markers has been to extract characteristic fingerprints by applying pattern recognition techniques to `metabonomic' data obtained through nuclear magnetic resonance (NMR) spectroscopy [4]. An alternative and complementary approach is to build structured network models applicable to metabolomic data. These models could be used, for example, to globally characterize the effects of drug chemicals across cell metabolism, and thereby identify potential metabolic burdens; to associate adverse events, such as the formation of a harmful derivative, with Pacific Symposium on Biocomputing 12:193-204(2007) specific marker metabolites; and to formulate hypotheses on the mechanisms of drug toxicity. Here, we describe a modeling framework for characterizing the modularity of specific reaction clusters, in this case xenobiotic transformation. At its core, this framework consists of an algorithm for top-down partitioning of directed graphs with non-uniform edge weight distributions. The core algorithm is further augmented with metabolic flux profiling and stoichiometric vector space analysis. Thus, our modeling framework is well-suited for leveraging advances in both analytical technologies as well as biological informatics, especially genome annotation and pathway database construction [5]. As a model system, we considered the metabolic network of the liver, which is the major site of xenobiotic transformation in the body. Representative metabolic profile data were obtained for cultured rat or human hepatocytes from prior work [6, 7]. The model xenobiotic was troglitazone (TGZ), an anti-diabetic drug that has recently been withdrawn due to idiosyncratic liver toxicity [8]. The exact mechanisms of toxicity remain unknown, but could involve the formation of harmful derivatives through metabolic activation, cellular energy depletion via mitochondrial membrane damage [9], or other metabolic burdens such as oxidative stress [10]. In this work, we utilize our modularity analysis model to characterize the connections between the reactions of known TGZ conjugates and the major pathways of liver cellular metabolism. This type of analysis should complement more detailed studies on the roles of specific conjugation enzymes by identifying their interdependence with other major components of the cellular metabolic network. In the case of TGZ transformation, our results indicate that key connectors are sulfur-containing amino acids and their derivatives. 2. Methods 2.1. Liver metabolic network Stoichiometric models of liver central carbon metabolism were constructed as follows. First, a list of enzyme-mediated reactions was collected from an annotated genome database [11]. Second, stoichiometric information was added for each of the collected enzymes by cross-referencing their common names and enzyme commission (EC) numbers using the KEGG database [12]. Third, biochemistry textbooks and the published literature [13] were consulted to build organ (liver) and nutritional state (fed or fasted) specific models. Net flux directions of reversible or reciprocally regulated pathways were set based on the nutritional state. These models were rendered into compound, directed graphs, visualized using the MATLAB (MathWorks, Natick, MA) Bioinformatics Pacific Symposium on Biocomputing 12:193-204(2007) toolbox, and corrected for missing steps and nonsensical dead ends. Reversible reactions flanked by irreversible reactions were assigned directionality so as to ensure unidirectional metabolic flux between the flanking reactions. The pathway memberships and other dimensional characteristics are summarized for each of the two models in Table 1 * . Table 1. Pathway memberships of the fed- and fasted-state liver models Pathway Alcohol metabolism Amino acid metabolism Bile acid synthesis Cholesterol synthesis Gluconeogenesis Glycogen synthesis Glycolysis Ketone body metabolism Lipogenesis Lipolysis, -oxidation Oxidative phosphorylation PPP TCA cycle TGZ metabolism Urea cycle Fed Fasted 2.2. TGZ metabolism The base models were augmented with TGZ conjugation reactions identified in the literature. Upon entry into the hepatocyte, TGZ is almost entirely transformed into one of its four main conjugate forms [14]: TGZ-sulfate (TGZS), TGZ-quinone (TGZ-Q), TGZ-glucuronide (TGZ-G), and TGZ-gluthathione (TGZ-GSH). Extension of the liver models with these derivatives added 10 new intermediates and 14 reactions. 2.3. Data sets Inputs to the flux calculations were external flux measurements (rates of metabolite uptake or output) taken from previously published work. These studies profiled the metabolism of cultured hepatocytes under medium conditions that set up either a fed or fasted state. All data sets included time series measurements on glucose, lactate, ketone bodies, ammonia, and the naturally occurring amino acids. The number of measured metabolites was 25. * Complete model details, including reaction stoichiometry, the identities of balanced metabolites, and thermodynamic reaction parameters are available upon request to the authors. Pacific Symposium on Biocomputing 12:193-204(2007) Summary descriptions of the experimental settings are shown in Table 2. A representative mean value for TGZ uptake rate was estimated based on a study involving primary hepatocytes obtained from human donors [15]. Table 2: Metabolite data sets used for flux estimation Model Nutritional state Medium Supplements Hormones Reference Cultured rat hepatocytes Fed DMEM w/ high (4.5 g/L) glucose Amino acids Insulin [6] Cultured HepG2 cells Fasted (spent medium) DMEM w/ low (1.0 g/L) glucose Dexamethasone [7] 2.4. Flux calculation 2.4.1 Metabolic Flux Analysis (MFA) Intracellular fluxes were calculated using an optimization based approach as described previously [16]. Briefly, a non-linear, constrained optimization problem was set up as follows: Minimize: Subject to: (v k k o - v k bs ) 2 , k {external fluxes} (1) (2) (3) S v = 0 G v 0 where the objective function minimizes the sum squared error between o experimentally observed ( vk bs ) and predicted external fluxes ( vk ). Eq. (1) expresses the balances around intracellular metabolites using an M×R stoichiometric matrix S and an R×1 flux vector v. The number of balanced metabolites (M) and reactions (R) were 37 and 64 for the fasted-state and 60 and 102 for the fed-state model. Inequality (3) expresses constraints derived from the Second Law. To account for biochemical coupling between energetically favorable and unfavorable reactions, the thermodynamic constraints were applied to pathways, as opposed to individual reactions. Stoichiometrically balanced pathways were enumerated using the elementary flux mode (EFM) algorithm [17]. The output of the EFM analysis was collected into a P×R pathway matrix E, where P was the number of pathways (190 and 237 for the fasted- and fed-state model, respectively). To formulate the pathway G (GPATH°) constraint matrix G, we first collected the reaction Gs into an R×1 vector g and then performed element-by-element multiplications with each of the P (R-dimensional) rows of E: Gij = Eij g j , i {1...P}, j {1... N } (4) Pacific Symposium on Biocomputing 12:193-204(2007) 2.4.2 Flux Balance Analysis (FBA) We also simulated flux distributions that maximized the formation of the key liver anti-oxidant glutathione (GSH), which in vitro studies had shown to play a critical role in the detoxification of TGZ and other drugs in the liver [10]. The simulations were performed using linear programming with maximization of the GSH synthesis step (vGSH) as the objective. The equality and inequality constraints were identical to the above MFA problem. The measured external fluxes were used as upper and lower bound constraints. To prevent overconstraining, we specified five of the 25 measured metabolites as major carbon and nitrogen sources/sinks. The final form of the FBA problem was: Maximize: Subject to: vGSH S v = 0 G v 0 0.5 v i ,meas v i 2 v i , meas (5) (2) (3) (6) where vi refers to the measured rates of uptake, accumulation, or output of glucose, triglyceride, glutamine, urea or TGZ. 2.5. Modularity analysis Analysis of reaction modules was performed using an algorithm for top-down decomposition of directed graphs. Details of the algorithm have been described elsewhere [18]. The algorithm consists of the following two steps, which are iteratively applied until all edges in the graph have been removed. 1. Shortest paths through the network are calculated using Dijkstra's algorithm. This calculation critically depends on the edge-weight matrix, which specifies the relative adjustments of reactant-product node pair distances based on the activity of the intervening reaction. Here, a node pair distance was inversely scaled with the connecting reaction activity as measured by its steady-state flux. The edge-betweenness centrality index is calculated for all edges. Edgebetweenness centrality refers to the frequency of an edge that lies on the shortest paths between all pairs of vertices. The edges with highest betweenness values are most likely to lie between sub-graphs, rather than inside a sub-graph [19]. Successive removal of edges with the highest edgebetweenness values will eventually isolate sub-graphs consisting of vertices that share connections only with other vertices in the same sub-graph. The edge-betweenness centrality values were calculated using a newly developed method [18] based on an algorithm for vertex betweenness centrality calculation of large, sparse networks [20]. 2. Pacific Symposium on Biocomputing 12:193-204(2007) 2.6. Projection and Match Scores The biological significance of a partition (iteration) was assessed by mapping the modules to stoichiometrically feasible pathways (defined by the EFMs). Projection score ­ Each sub-graph was transformed into a 1×R binary reaction composition vector, where R is the total number of reactions included in the network. An element was set to 1 if both the reactants and products of the corresponding reaction were present as nodes in the module; otherwise, an element was set to 0. The EFM vectors (rows of E) were also transformed into 1×R binary pathway inventory vectors (PIVs) by replacing all non-zero entries with one. A projection score was computed for every pair-wise combination of a binary module vector and each of the fed- or fasted-state model PIVs as follows: PS ik, j = ( RCVi k PIV j ) N ik , i = 1,2,K, Lk , j = 1,2,K, m (7) where PS ik, j , RCVi k , and Nik were, respectively, the projection score, reaction composition vector, and number of nodes in module i at iteration number k, Lk was the number of modules at k, PIV j was the jth PIV, and m was the total number of EFMs. The overall projection score of an iteration number k was calculated by averaging the `best match' projection scores of this iteration: PS k = max PS ik, j i =1 Lk ( ) Lk (8) Match score - Several cases were noted where the projection score identified more than one `best match' PIV for a given module. In these cases, a 1×R consensus pathway fragment (CPF) vector was formed for each module i of iteration number k as the smallest common reaction set of the best match PIVs. The similarity between a module and its CPF was assessed by a match score: MSik = R - Wi k R , i = 1,2,K, Lk MSik (9) where was the match score of module i at iteration number k, and Wi k was the number of mismatches between the module RCVi k and the corresponding CPF. A mismatch occurs if a reaction is found in the module, but not the fragment vector or if a reaction is found in the fragment, but not the module vector. The overall match score of an iteration was calculated as a simple average of the individual module match scores: MS k = MS ik i =1 Lk Lk (10) Pacific Symposium on Biocomputing 12:193-204(2007) 3. Results 3.1. Flux distribution The predicted distributions of fluxes through the reactions of TGZ metabolism (Table 3) were only in partial agreement with the experimentally determined proportions of the derivatives reported in the literature. Radioactive tracer studies in animal models have shown that the major derivative is TGZ-S, accounting for up to 70 % of all conjugated forms [21]. For the fed-state model, the calculated distribution was, in decreasing order, 39 % TGZ-Q, 33 % TGZ-G, 15 % TGZ-GSH, and 13 % TGZ-S. The distribution calculated for the fastedstate model was 37 % TGZ-Q, 37 % TGZ-G, and 26 % TGZ-S. Thus, both models predicted the sulfate conjugate to be a minor component, contrary to the animal studies. On the other hand, the GSH conjugate was correctly predicted to be a minor derivative [14]. Table 3. Reactions of TGZ metabolism Reaction Stoichiometry Cysteine + O2 + -Ketoglutarate Pyruvate + SO32- + Glutamate Cysteine Pyruvate + NH4+ + HSHS- + 2Gluthatione + 2O2 GSSG + HSO3- + H2O TGZ uptake Glutamate + Cysteine + Glycine Glutathione TGZ + Glutathione TGZ-GSH TGZ + SO32- TGZ-Sulfate TGZ + HSO3- TGZ-Sulfate TGZ TGZ-Quinone TGZ TGZ-Glucuronide TGZ-GSH secretion TGZ-Sulfate secretion TGZ-Glucuronide secretion TGZ-Quinone secretion GRXN° kcal/mol -125.6 -38 -648.5 0 132.4 -7.5 19 28.5 -31.1 -197.5 0 0 0 0 Flux, mol/106 cells/day Fed Fasted +TGZ Max +TGZ GSH 0.06 0.00 0.00 0.46 0.07 0.07 0.06 0.00 0.18 0.15 0.07 0.06 0.15 0.18 0.32 0.00 0.00 0.91 0.59 0.59 0.32 0.00 0.00 0.00 0.59 0.32 0.00 0.00 0.12 0.00 0.00 0.46 0.00 0.00 0.12 0.00 0.17 0.17 0.00 0.12 0.17 0.17 Max GSH 0.42 0.00 0.00 0.92 0.50 0.50 0.42 0.00 0.00 0.00 0.50 0.42 0.00 0.00 Measured inputs are shown in bold. +TGZ: flux distribution calculated by MFA with total drug uptake set to 0.46 mol/106 cells/day. Max GSH: flux distribution calculated by FBA with upper and lower bounds on glucose, TG, GLN, urea, and TGZ. Interestingly, the two models predicted qualitatively similar trends despite their significantly different compositions and measured inputs, suggesting that there were a limited number of actively engaged connections between TGZ transformation and the other metabolic pathways. The major quantitative difference involved the contribution of the GSH conjugate. Thus, we next Pacific Symposium on Biocomputing 12:193-204(2007) examined the effect of increasing the availability of this conjugation substrate by simulating flux distributions that maximized GSH synthesis under the same stoichiometric and thermodynamic constraints applied to the MFA problems. To obtain flux values numerically compatible with the MFA results, we also assigned upper and lower bounds to the major carbon and nitrogen sinks and sources based on their respective measured external flux values. As expected, the flux through the GSH synthesis step (vGSH) increased significantly for both the fed- and fasted-models (in mol/106 cells/day) from 0.07 to 0.59 and 0 to 0.50, respectively, when the maximization objective was paired with no direct constraints on the uptake or output of the amino acid reactants. The only indirect constraint on GLU was applied through the upper and lower bounds on GLN (0.75 and 3 mol/106 cells/day, respectively), which were not approached. However, the higher vGSH flux for the fed-state model suggests a positive correlation with GLN uptake, which was significantly higher for the fed-state model. The predicted distribution of conjugation reaction fluxes were 65 % TGZ-GSH and 35 % TGZ-S for the fed-state model and 54 % TGZGSH and 46 % TGZ-S for the fed-state model. Both models predicted zero fluxes for the formation of the glucuronide and quinone conjugates, suggesting that the distribution of the TGZ derivatives may be dramatically altered by the availability of GSH, which in turn is influenced by the medium supply of its constituent amino acids. The increase in TGZ-GSH was accompanied by an increase in TGZ-S formation, likely because the cysteine component of GSH also acts as a source of sulfate (HSO3- and SO32-), which drive the formation of TGZ-S. Cysteine as well as its sulfate derivatives mutually interacts with other intermediates of central carbon metabolism. These interactions have been further characterized through modularity analysis. 3.2. Reaction modules To characterize the interconnections between TGZ derivatives and other major liver metabolites, we applied a partition algorithm to directed graph representations of the various network models with and without edge-weights. The left-hand panels of Fig. 1 show the optimal partitions of the fed-state model without an edge-weight matrix (a), with an edge-weight matrix derived from MFA (c), and with an edge-weight matrix derived from FBA (e). Figs. 1b, 1d, and 1f show the corresponding partitions of the fasted-state model. Optimality was evaluated based on the projection and match scores (see Methods, Fig. 2). For both the fed- and fasted-state models, the inclusion of reaction flux, or connection activity, significantly influenced their modularity. When only connectivity was considered, the (unweighted) fed-state network was optimally partitioned at iteration number 34 (Fig. 1a). Three modules were generated. Pacific Symposium on Biocomputing 12:193-204(2007) a b c d e f Figure 1. Optimal partitions of the liver network models. Left- and right-hand column panels show fed-and fasted-state models, respectively. Partition without flux weights (a, b), with flux weights (c, d), and with flux weights maximizing GSH (e, f). Arrows indicated carbon flow between modules as determined from the partition of the previous iteration. Pacific Symposium on Biocomputing 12:193-204(2007) The smallest module consisted of two metabolites in lipid synthesis (palmitate, PAL, and triglyceride, TG). The largest module included all other metabolites with the exception of TGZ and its direct derivatives, which constituted the remaining third module. When an edge-weight matrix was applied with MFA derived fluxes, the optimal partition was reached at iteration 8 (Fig. 1c). Four modules (consisting of at least two connected nodes) were found. The smallest module consisted of metabolites in the urea cycle. A second module consisted of lipid synthesis and PPP metabolites. A third module consisted of the TCA cycle metabolites. The largest module included TGZ, its direct derivatives, and the intermediates of amino acid and pyruvate metabolism. When a different edgeweight matrix was used with a flux distribution corresponding to maximal GSH synthesis, the optimal partition (reached at iteration 8) consisted of three modules (Fig. 1e). The two smaller modules were identical to the two smallest modules of the partition in Fig. 1c. The third module essentially combined the larger two modules of Fig. 1c, with connections through the reactions in and around the urea and TCA cycles. Fig. 2. Mean projection and match score plots for the fed-state model partitions. Legends refer to flux distribution used to form the edge-weight matrix. For both series of partitions, the optimal iteration was set at 8, which corresponds to the first significant rise in the two scores. The modularity of the fasted-state was also significantly influenced by the connection diversity (flux) data. Without an edge-weight matrix, the net effect of the edge removals was to reduce the network graph size (Fig. 1b). Application of the MFA derived fluxes as edge-weights generated an optimal partition with two modules at iteration 15 (Fig. 1d). Similar to Fig. 1a, TGZ and its derivatives formed a separate module. However, this module lacked TGZGSH, presumably because the fasted-state model calculated zero flux for GSH synthesis. Unlike the fed-state partition (Fig. 1c), the TGZ module did not connect directly to the other metabolic pathways. Direct connections remained absent when the GSH maximizing flux distribution was used to form the edgeweight matrix (Fig. 1f). The major effects were to isolate a small module consisting of urea cycle metabolites from the largest reaction module. As expected from the results of Table 3, TGZ-G and TGZ­Q were eliminated from the TGZ module, and replaced with TGZ-GSH. Together, Figs. 1c-d suggest Pacific Symposium on Biocomputing 12:193-204(2007) that the nutritional state of the liver directly impacts the connections between reactions of TGZ transformation and the other major pathways of liver metabolism. Moreover, a comparison of the partitions in Fig. 1c and 1e indicated that conjugation substrate availability, in this case GSH, influences the extent of integration between these reaction modules. 4. Discussion In this paper, we examined the interactions between the specialized reactions of TGZ transformation and the network of major metabolic reactions in hepatocytes. Using prior data, flux distributions were simulated that were in partial agreement with experimental observations on the relative distributions of various TGZ conjugates. With only total TGZ clearance rate as input, TGZGSH was correctly predicted as a minor derivative, but the contribution of TGZS was significantly under-estimated, suggesting that additional measurements on the conjugation reactions are needed to improve the flux calculations. Nevertheless, we noted several useful outcomes. First, the thermodynamic constraints allowed convergent solutions to be found with relatively small numbers of measured inputs. Second, we avoided potential pitfalls of individual reaction-based inequality constraints. For example, flux calculations correctly predicted significant net production of TGZ-S in all cases, even though the individual reaction Gs of the final synthesis steps were positive (Table 3). These results directly reflect the energetic coupling between sequential reaction steps as specified by the EFM calculations. Third, the EFMs generated for the flux calculations provided an inventory of stoichiometrically and energetically feasible reaction routes of the model networks. A major obstacle to applying the EFM analysis to larger, e.g. genome scale, networks is its computational intractability. One way to address this issue is to solve for a partial set of EFMs by eliminating high-degree currency metabolites. Many currency metabolites cannot be accurately measured or balanced, and thus frequently not included in the stoichiometric constraints, but form metabolic cycles that significantly expand the EFM solution space. In this work, ATP, CO2 and O2 were not balanced, and the EFM calculations became NP-hard problems. The EFMs and the calculated flux distributions were ultimately used to examine the modularity of TGZ metabolism across different nutritional states and levels of conjugation substrate availability. While the connections between the immediate reactions of TGZ metabolism were well-conserved across these different conditions, connections to other major pathways varied. In the fastedstate, interactions between the main carbon network and the TGZ module were limited, regardless of the GSH level. In contrast, a number of active connections Pacific Symposium on Biocomputing 12:193-204(2007) were found for the fed state. These connections mainly involved the sulfur containing amino acid cysteine (CYS) and its immediate reaction partners. The liberation of the sulfide moiety from CYS requires complete degradation of the amino acid via transamination reactions, which involves other high-degree metabolites such as GLU and -ketoglutarate. Along with glycine, GLU and CYS make up GSH, which also interacts with the TGZ module as a conjugation substrate. Taken together, our findings suggest that the availability of common medium nutrients could significantly influence the formation of drug derivatives. Prospectively, metabolic profile-based studies on drug reaction modules could be used to analyze drug transformation under varying metabolic states, which in turn could facilitate the development of effective nutritional approaches for managing drug toxicity [10]. Acknowledgements We thank Dr. Anselm Blumer in the Department of Computer Science at Tufts University for his help in implementing the edge-betweenness centrality algorithm. This work was in part funded by NIH grant 1-R21DK67228 to KL. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. O. Fiehn et al., Nat Biotechnol 18, 1157 (2000). R. N. Trethewey, Curr Opin Biotechnol 12, 135 (2001). J. L. Griffin et al., Anal Biochem 293, 16 (2001). J. K. Nicholson et al., Nat Rev Drug Discov 1, 153 (2002). M. Kanehisa et al., Nucleic Acids Res 34, D354 (2006). C. Chan et al., Metab Eng 5, 1 (2003). R. P. Nolan, M.S., Tufts University (2005). E. A. Gale, Lancet 357, 1870 (2001). Y. Masubuchi et al., Toxicology 222, 233 (2006). S. Tafazoli et al., Drug Metab Rev 37, 311 (2005). H. Ma, A. P. Zeng, Bioinformatics 19, 270 (2003). M. Kanehisa, S. Goto, Nucleic Acids Res 28, 27 (2000). I. M. Arias, J. L. Boyer, The liver: biology and pathobiology. 4th ed. (2001) M. T. Smith, Chem Res Toxicol 16, 679 (2003). N. J. Hewitt et al., Chem Biol Interact 142, 73 (2002). R. P. Nolan et al., Metab Eng 8, 30 (2006). S. Schuster et al., Nat Biotechnol 18, 326 (2000). J. Yoon et al., Bioinformatics (2006). M. E. Newman, M. Girvan, Phys Rev E Stat Nonlin Soft Matter Phys 69, 026113 (2004). 20. U. Brandes, J Math Soci 25, 163 (2001). 21. S. Prabhu et al., Chem Biol Interact 142, 83 (2002).