Biological Network Analysis: Human Metabolic Network Tomer Shlomi Winter 2008 Lecture Outline 1. 2. 3. 4. Human metabolic network reconstruction Predicting tissue-specific metabolism Predicting biomarkers for disease diagnosis Implications of network topology for disease co-morbidity 5. Building models of tissue metabolism 1. Human metabolic network reconstruction Why Study Human Metabolism? • In born errors of metabolism cause acute symptoms and even death on early age • Metabolic diseases (obesity, diabetics) are major sources of morbidity and mortality. • Metabolic enzymes and their regulators gradually becoming viable drug targets • In-vivo studies of tissue-specific metabolic functions are limited in scope 4 5 6 Network statistics 7 8 Human metabolic knowledge landscape • Confidence scores: • 3 – biochemical or genetic evidence • 2 - physiological data or evidence from other mammalian cell • 1 – modeling evidence • 0 – unevaluated 9 Correlated reaction sets • Deficiencies in enzymes belonging to the same functionally coupled reaction set may have similar phenotypes • Production and transport of two glutathione (antioxidant) related genes involved in hemolytic anemia (OMIM database) 10 11 Drug targets • 3-Hydroxy-3-methylglutaryl-CoA reductase (Entrez Gene ID 3156) is a primary target of antilipidemic class of statin drugs, is in this coupled reaction set • Other members of the set are potential candidates for treating hyperlipidemia Cholesterol biosynthesis 12 2. Predicting tissue-specific metabolism Human metabolism • The first large-scale model of human metabolism (with ~2000 genes) published last year (Duarte et al., PNAS’07) • Various cell-types/tissues activate different pathways • Unknown tissue-specific metabolic objective functions • Unknown tissue-specific metabolite uptake rates • How to predict feasible metabolic states under various conditions? byp Growth medium nutrients A B C cof ? cof E Biomass byp D 14 Our objective • Develop a general approach for predicting tissue-specific metabolic states • Provide the first large-scale description of the metabolism of various tissues Our method Model integration with tissue-specific gene and protein expression data Motivated by the assertion that highly expressed genes are expected to carry metabolic flux 15 Enzyme expression level vs. metabolic flux level • Changes in gene expression levels between conditions significantly correlate with changes in predicted fluxes via FBA: – Schuster, et al, 2002 – Famili, et al, 2003 – Bilu, et al, 2006 • Changes in gene expression levels show high qualitative correspondence with changes in measured fluxes: – Daran, et al, 2004 – Fong, et al, 2004 16 Model integration with tissue-specific expression data • Use expression level only as a clue for the existence of metabolic flux • Network integration is then used to accumulate these cues into a global, consistent metabolic state Highly expressed E1 Input E5 M3 E2 M4 M1 M5 M2 M6 E6 E3 E7 M7 Output M8 Output E4 M9 Lowly expressed 17 Inconsistencies with expression data: putative post-transcriptional regulation • A flux activity state of a gene is defined based on the predicted flux through its reactions • • Metabolic regulation – flux regulation via mass-action based effects Hierarchical regulation – flux regulation via changes in enzyme concentrations Highly expressed E1 Input E5 M3 E2 M4 M1 M5 M2 M6 E6 E3 M7 Output M8 Output E4 Up regulated E7 M9 Lowly expressed Down regulated 18 The computational method • Relies on Mixed-Integer Linear Programming (MILP) – Steady-state fluxes – v – For a highly expressed reaction, ai represents whether it is metabolically active – For a lowly expressed reaction, ni represents whether it is metabolically inactive • Max Σa + Σn (maximize the correlation with the expression data) S·v=0; vminv vmax (feasible flux distribution) if ai=1 then vi>0; if ni=1 then vi=0 (activity/inactivity constraints) E1 a1 n1 E5 M3 E2 M1 M4 n2 M5 M6 M2 E6 M7 E3 M8 a3 E4 a2 E7 M9 19 The computational method: considering alternative solutions • • Here, the lower pathway may be either activated or inactivated in an optimal solution – achieving maximal correlation with the expression data Predict gene activity state by considering all feasible flux distributions – A gene is predicted to be active if it cannot be inactivated in any optimal solution – A gene is predicted to be inactive if it cannot be activated in any optimal solution • Genes may be predicted to have an undetermined activity state E1 a1 n1 ? E5 M3 E2 M7 M4 M1 M5 M2 M6 E6 ? E3 M8 a3 E4 a2 E7 M9 20 Validating the method in predicting yeast metabolism Expression data under various media Comparison with measured fluxes (Daran et al’04) E1 M3 E2 M4 M8 E3 M5 M1 M2 E4 M6 E6 E5 M7 E7 M9 Flux Balance Analysis (FBA) growth maximization Comparison with FBA Biomass E1 Uptake rates E5 M3 E2 M7 M4 M1 M5 E3 M2 M6 E6 E7 M8 E4 M9 21 Applying the method to Human • Employing the model of Duarte et al. • Gene and protein expression from GeneNote and HPRD • 10 tissues: brain, heart, kidney, liver, lung, pancreas, prostate, spleen, skeletal muscle and thymus. • The activity state of 644 was uniquely determined in at least one tissue, with an average of 408 genes per tissue 22 Cross validation • The expression state of 80% of the genes is used as input • Gene activity states for 20% held-out set is predicted • The overlap between the predicted activity state and the expression data or the held-out set is highly significant for all tissues! 23 Post-transcriptional regulation plays a major role in tissue-specific metabolism • 20% of the metabolic genes are predicted to be post-transcriptionally regulated across tissues • An average of 42 (3.6%) genes post-transcriptionally up-regulated and 180 (15.4%) post-transcriptionally down-regulated in each tissue Up regulation Down regulation 24 Large-scale validation • Predicted tissue-specificity of genes, reactions, and metabolites is significantly correlated with various independent data sources • The high correlation is still evident when focusing on predictions of posttranscriptionally regulated genes: – A significantly high fraction of the genes that are predicted to be posttranscriptionally up regulated in certain tissues are known to be active there – A significant low fraction of the down-regulated genes in certain tissues are known to be active there 25 Metabolic disease-causing genes • Many disease genes (OMIM) are predicted to be post-transcriptional upregulated specifically in tissues affected by the disease 26 3. Predicting biomarkers for disease diagnosis A method for predicting metabolic biomarkers • In-born errors of metabolism are commonly diagnosed via biofluid metabolomics, identifying metabolites with altered concentrations • Perform systematic biomarker prediction for all known genetic metabolic disorders via a genome-scale model Metabolite exchange interval Biofluids Uptake 0 Secretion Tissue v1 M1 M2 M3 M4 v2 V2 v4 V1 reduced (high confidence) elevated (high confidence) M5 v5 M6 v6 V4 reduced V6 elevated M7 v7 V5,V7 unchanged 28 Validation via Kinetic RBC Model • • • • Apply the method to predict biomarkers for enzyme deficiencies in human erythrocytes, for which a detailed kinetic model is available for validation (Jamshidi et al. 2001). Kinetic simulations identified 156 biomarkers for 43 enzymatic disorders Our method predicts 85 biomarkers, with a precision of 0.73 and recall of 0.4. Our method correctly identifies alterations in extracellular metabolite concentrations, relying solely on reaction stoichiometry and directionality data. 29 Predicting biomarkers for an array of inborn errors of metabolism • • • The concentration of 223 metabolites is predicted to change as a result of 176 possible dysfunctional enzymes A high fraction of the disorders (42%) are predicted to have very few biomarker changes (less than 6) Many of the disorders (45%) have a unique set of biomarker alteration - these predictions may be used for the unique diagnosis of metabolic disorders via biofluids metabolomics 30 Validating biomarker predictions • Systematic OMIM data – Extracted via a text mining approach of disease description field of the OMIM database – Erroneous data – Show moderate correlation with our method’s predictions • Manual OMIM data – Extracted via manual inspection of errors in amino-acid metabolism in the OMIM database – High quality data: identify the specific reaction affected in each disease; resolve metabolite name ambiguities – Show high correlation with our method’s predictions • Ramdis/HMDB – low quality data (has a low correlates with OMIM data) 31 Validating predicted biomarkers • Extracting data on known biomarkers for in-born errors of amino-acid metabolism • The predictions are significantly correlated with the known biomarkers (pvalue=4·10-13) – precision = 0.76, recall = 0.56 32