Lessons learned from the Genomescale metabolic reconstruction and curation of Neurospora crassa Jeremy Zucker Jonathan Dreyfuss Heather Hood James Galagan The Eli and Edythe L. Broad Institute A Collaboration of Massachusetts Institute of Technology, Harvard University and affiliated Hospitals, and Whitehead Institute for Biomedical Research Capture Metabolic Knowledge Pathway-tools/BioCyc • Reactions • Interactions • Literature KEGG Visualizing ‘omics Data Provide a visually intuitive, metabolic framework for interpreting large ‘omics datasets in silico Predictions Algorithmically Interpret Expression Data in a Metabolic Context? Example: Plasmodium Eflux* Validation • KO Phenotype Predictions – 90% Accuracy • External Metabolite Changes – 70% Accuracy New Predictions • 40 Enzymatic drug targets • Experimental validation of novel targe *Colijn, C., A. Brandes, J. Zucker, et al. (2009). PLoS Comput Biol Modeling in the Neurospora PO1 Clock Profiling RNA-Seq Visualization and Analysis ChIP-Seq Interpretation of Expression Profiling and Regulatory Network Data in a Metabolic Context – Inform Experiments BUILDING THE MODEL Manual reconstruction protocol Nature Protocols, Vol. 5, No. 1. (07 January 2010), pp. 93-121. Automated Model SEED reconstruction pipeline Nature biotechnology, Vol. 28, No. 9. (29 September 2010), pp. 977-982 Genome sequence to metabolic model Elements Pathways Literature Metadata Nutrient media (Vogels) Reactions Complexes NeurosporaCyc Transporters Biomass composition EFICAz2 predicts enzymes Databases 1993 enzymes 1770 reactions HMMs FDR SVM BMC Bioinformatics 2009, 10:107 Decision tree 9934 protein sequences … Protein Complex editor 182 reactions with isozymes or complexes • 2-oxoisovalerate alpha subunit • 2-oxoisovalerate beta subunit Identify multiple genes of reaction Present all possible combinations of complexes 31 complexes experimentally validated through literature search 2-oxoisovalerate complex … • fatty acid synthase beta subunit dehydratase • fatty acid synthase alpha subunit reductase Allow curator to validate potential complexes … Fatty acid synthase complex Transport inference parser (TIP) 9934 free-text Protein annotations Filter proteins for transporters 176 transporters assigned to 97 transport reactions Infer multimeric complex • MFS glucose transporter • ATP synthase … • sucrose transporter Infer substrate … Infer energy-coupling mechanism Bioinformatics (2008) 24 (13): i259-i267. Pathologic predicts pathways 1770 enzymecatalyzed reactions 265 Pathways X = #rxns in metacyc pwy … Y = #rxns with enzyme evidence Z = #unique rxns in pwy P(X|Y|Z) = prob of pwy in Neurospora Science 293:2040-4, 2001. … Literature curation validates predictions 1212 citations associated with 307 pathways 31 complexes 168 genes … … Neurospora Cellular overview NEUROSPORACYC New feature on Broad website NeurosporaCyc Cellular overview NeurosporaCyc cellular overview Googlemaps-like zoomable interface Highlight genes on overview Highlight genes on overview Highlight genes on overview NeurosporaCyc Omics Viewer Omics data mapped onto metabolism Omics data mapped onto metabolism Omics data mapped onto metabolism Omics data mapped onto Genome Omics data mapped onto Genome Omics data mapped onto Genome DEBUGGING THE BUG The problem with EC numbers Reaction class Number of reactions neurospora (metacyc) Balanced normal reactions 993 (4585) Generic reactions 198 (688) Protein modification reactions: 82 (469) Reactions with instanceless classes: 80 (228) Generic redox reactions 36 (212) Polymeric reactions 24 (91) Polymerization pathway reactions 11 (17) Generic Reactions 3.6.1.42 instance of 3.6.1.6? Protein Modification reactions Reactions with instanceless classes Solution: Instantiate classes Generic Redox reactions Polymeric reactions Polymerization Pathway reactions Solution: Instantiate polymerization steps • POLYMER-INST-Fatty-Acids-C16 + coenzyme A + ATP -> POLYMER-INST-Saturated-Fatty-Acyl-CoAC16 + diphosphate + AMP + H+ • POLYMER-INST-Fatty-Acids-C14 + coenzyme A + ATP -> POLYMER-INST-Saturated-Fatty-Acyl-CoAC14 + diphosphate + AMP + H+ • … • POLYMER-INST-Fatty-Acids-C0 + coenzyme A + ATP -> POLYMER-INST-Saturated-Fatty-Acyl-CoAC0 + diphosphate + AMP + H+ What happens when the metabolic network is infeasible? • Add a “reaction” with the smallest number of reactants and products that results in a feasible model minimize card(r) subject to Sv + r = 0 l≤v≤u Fast Automated Reconstruction of Metabolism • Input: – EFICAz probabilities for each reaction – Biomass components – Experimental growth / no growth phenotypes in different nutrient conditions – Gene essentiality – Manual curation of pathways • Output: – Metabolic network of MetaCyc reactions maximally consistent with input VALIDATING THE MODEL WITH IN SILICO KNOCKOUT PREDICTIONS Neurospora phenotypes for validation • Neurospora e-Compendium – 29 Mutants essential on minimal media – Non-essential on supplemental media • PO1 Phenotype Collection – 79 non-essential KOs under minimal media – Additional phenotypes are observed. Used FBA with Neurospora model to simulate gene knockouts in minimal medium Neurospora phenotype prediction results Predicted Observed Essential Non-Essential Essential 22 (TN) 7 (FP) Non-Essential 14 (FN) 65 (TP) Precision TP/ (TP+FP) 90% Recall TP/ (TP+FN) 82% Specificity TN/ (TP+FP) 76% Accuracy (TP+TN)/ (TP+TN+FP+FN) 81% Comparison of model organisms under minimal media Yeast (iND750)1 E.Coli (iAF1260)2 Neurospora Viable Predicted/ Observed 439/455=96% 993/1022=97% 65/79=82% Essential Predicted/ Observed 35/109=32% 159/238=67% 22/29=76% Overall accuracy 84% 91% 81% [1] Genome Res. 2004. 14: 1298-1309 [2] Molecular Systems Biology 2007 3:121 MODELING THE EFFECT OF OXYGEN LIMITATION ON XYLOSE FERMENTATION Biofuels from Neurospora? • Growing interest for obtaining biofuels from fungi • Neurospora crassa has more cellulytic enzymes than Trichoderma reesei • N. crassa can degrade cellulose and hemicellulose to ethanol [Rao83] • Simultaneous saccharification and fermentation means that N. crassa is a possible candidate for consolidated bioprocessing Xylose Ethanol Effects of Oxygen limitation on Xylose fermentation in Neurospora crassa Ethanol production vs Oxygen level Xylose Glycolysis Pyruvate Respiration TCA Fermentation Ethanol conversion (%) 70 Intermediate O2 60 50 40 30 20 Low O2 10 Ethanol High O2 0 0 2 4 6 8 10 12 14 Oxygen level (mmol/L*g) Zhang, Z., Qu, Y., Zhang, X., Lin, J., March 2008. Effects of oxygen limitation on xylose fermentation, intracellular metabolites, and key enzymes of Neurospora crassa as3.1602. Applied biochemistry and biotechnology 145 (1-3), 39-51. Pentose phosphate Xylose Two paths from xylose to xylitol Model of Xylose Fermentation Aerobic respiration Fermentation Oxygen Ethanol TCA Cycle ATP Pentose phosphate High Oxygen NADPH Regeneration NADPH & NAD+ Utilization Aerobic respiration Fermentation TCA Cycle Oxygen=5 NAD+ Regeneration ATP=16.3 Pentose phosphate Low Oxygen Aerobic respiration Fermentation Ethanol TCA Cycle Oxygen=0 Pentose phosphate Intermediate Oxygen NADPH Regeneration Optimal Ethanol NADPH & NAD Utilization Aerobic respiration Fermentation Oxygen=0.5 Ethanol TCA Cycle NAD Regeneration ATP=2.8 All O2 used to regenerate NAD used in first step Pentose phosphate NADPH Regeneration Improve NADH enzyme Intermediate Oxygen Optimal Ethanol NADPH & NAD Utilization Bottleneck Pyruvate decarboxylase Aerobic respiration Fermentation Oxygen=0.5 Ethanol TCA Cycle NAD Regeneration ATP=2.8 All O2 used to regenerate NAD used in first step USING E-FLUX TO PREDICT DRUG TARGETS BY INTEGRATING EXPRESSION DATA WITH FBA E-Flux explanation Application of E-flux to TB Next Steps • Annotation: use phenotype predictions to improve model • NeurosporaCyc: Use E-flux to interpret the effect of clock genetic regulatory program on metabolism. • Validation: add additional phenotypes Acknowledgements SRI Peter Karp Mario Latendresse Markus Krumenacker Ingrid Kesseler Tomer Altman Suzanne Paley Ron Caspi Mike Travers Neurospora P01 Project Heather Hood Jonathan Dreyfuss James Galagan Fast Automated Reconstruction of Metabolism (FARM) Gene Calls (Broad) Protein Complex prediction Enzyme prediction (EFICAz) Pathway prediction (Pathologic) Nutrient media (Vogels) Literature curation (CAP) NeurosporaCyc Transport predictor (TIP) Fast Automated Reconstruction of Metabolism (FARM) • EFICAz predictions • Pathway predictions • Nutrient conditions • Biomass composition • Protein complexes • Transport C 846 Reactions 640 Metabolites 564 Genes