Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -1- Tissue Specific Models Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -2- LEARNING OBJECTIVES Each student should be able to: • Describe the process of creating tissue-specific models. • Explain the GIMME-like family of tools for creating tissue-specific models • Explain the iMAT-like family of tools for creating tissue-specific models • Explain the MBA-like family of tools for creating tissue-specific models Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -3- Lesson Outline • Overview • Creating Tissue-specific Models GIMME-like Family iMAT-like Family MBA-like Family • Tissue-specific Example Red Blood Cells Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -4- The multiple uses of high-throughput omics data in constraint-based models Bordbar, A., J. M. Monk, et al. (2014). "Constraintbased models predict metabolic and associated cellular functions." Nature reviews. Genetics 15(2): 107-120. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -5- Four Major Applications of Recon X I) Utilizing high-throughput data, Recon X can be tailored to cell and tissue-specific networks. The process has been done both algorithmically and manually. II) Similarly, Recon X has been transformed into other mammalian reconstructions, particularly M. musculus. The high overlap of homologous genes in Recon X with similar mammals allows for reconstructing accurate mammalian models quickly. III) High-throughput data can be interpreted by mapping the data onto Recon X’s metabolic network backbone. This process has been done to study pathological and drug-treated states. IV) Recon X can be used to simulate and predict phenotypes, providing biological clues to physiology and pathology as well as guiding experimental design. Bordbar, A. and B. O. Palsson (2012). "Using the reconstructed genome-scale human metabolic network to study physiology and pathology." J Intern Med 271(2): 131-141. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -6- Lesson Outline • Overview • Creating Tissue-specific Models GIMME-like Family iMAT-like Family MBA-like Family • Tissue-specific Example Red Blood Cells Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -7- Existing Methods for Context-specific Model Extraction Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -8- Lesson Outline • Overview • Creating Tissue-specific Models GIMME-like Family iMAT-like Family MBA-like Family • Tissue-specific Example Red Blood Cells Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -9- GIMME-like Family • The objective employed by the GIMME-like family corresponds to the similarity of the flux phenotype to data, which is to be maximized while guaranteeing a given Required Metabolic Functionality (RMF), such as: growth or ATP production. • This family reconstructs a context-specific model in two steps: First, it optimizes an objective function, the RMF, by using the classical linear programming (LP) formulation of flux balance analysis which imposes mass balance and thermodynamic constraints. This objective function is assumed to be the main cellular task in the investigated condition. It then involves solving a second LP that minimizes a penalty function, corresponding to the discrepancies between flux values and the respective transcript levels, with the additional constraint that the flux through the previous RMF must be above a given lower bound (e.g. a fraction of the optimum value found by flux balance analysis). • The methods included in this family mainly differ in the way the discrepancies are minimized in the second step, the type of high-throughput data used, and in the treatment of reversible reactions Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -10- GIMME-like Family GIMME GIM3E Becker, S. A. and B. O. Palsson (2008). "Context-specific metabolic networks are consistent with experiments." PLoS computational biology 4(5): e1000082. Schmidt, B. J., et al. (2013). "GIM3E: condition-specific models of cellular metabolism developed from metabolomics and expression data." Bioinformatics 29(22): 2900-2908. • The penalty function is termed inconsistency score. • GIM3E introduces several modifications to the original GIMME. • This function penalizes flux values of reactions whose associated expression levels are below a user-defined cut-off (i.e., threshold). More specifically, the inconsistency score is given by the dot product of the flux distribution and the reaction penalty, defined as the vector difference of the associated expression values from the threshold. • First, it allows integration of metabolomics data, imposing a nonzero flux value to reactions involving a metabolite for which there is evidence of being synthesized in an investigated condition. • Second, it modifies the definition of the reaction penalty; here, the penalties for all reaction-associated genes are determined separately and are then mapped to the reaction following the GPR rules. Moreover, the penalties are calculated as the distance between each transcript and the maximum expression level of the set. Consequently, after mapping transcript penalties all reactions obtain a penalty value, rather than only the set below the threshold which is the case in GIMME. • Finally, GIM3E takes into account directionality of reversible reactions by constraining them to operate in only one direction, which is modeled by introducing a binary variable for the direction of choice. As a result, GIM3E is formulated as a mixed integer linear program (MILP), which is more computationally challenging than the LP formulation of GIMME. • The reaction associated expression level is obtained following the standard GPR rules (Becker and Palsson, 2008), which take into account the presence of isoenzymes and protein complexes. • Although transcript profiles were used in the original formulation, a variant called GIMMEp allows for the integration of proteomic data (Bordbar, et al. (2012). "Model-driven multi-omic data analysis elucidates metabolic immunomodulators of macrophage activation." Molecular Systems Biology). • The result of applying this algorithm is a flux distribution which ensures that a given RMF can be carried out and is as consistent as possible to the employed data. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -11- Create a Tissue Model of the Eye • Step #1 – Collect and format the gene expression data for the non-treated ARPE-19 cells • Gene Expression Omnibus data (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5741) • GSM133871.CEL, GSM133872.CEL, GSM133873.CEL (Non-treatment) • Step #2 - Convert the Affymetrix data to a format that can be used by the “createTissueSpecific()” Cobra Toolbox function • A script has been created by Farhad Farjood using the language R to get the Absence/Presence (AP.txt) data and Entrez IDs (EID.txt). • R is open source and can be downloaded for free at http://cran.rstudio.com/ • A function call getExpD() will convert to AP.txt and EID.txt to expression data that can be used by the Cobra Toolbox • Step #3 – Create a tissue-specific model with the Cobra Toolbox • Use an update version of “createTissueSpecific()” called “createTissueSpecificRec()” that has been modified by Farhad Farjood to work with Recon 1 and Recon 2. • Step #4 – Create a specific objective function based on the cellular functions of reabsorption and secretion. • Step #5 - Use manual assessment with primary literature to validate the physiological functions for accuracy. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis createTissueSpecificModel H. Scott Hinton, 2015 -12- Create draft tissue specific model from mRNA expression data [tissueModel,Rxns] = createTissueSpecificModel(model,expressionData,proceedExp,orphan,exRxnRemove,solver,options,funcModel) INPUTS model global recon1 model expressionData mRNA expression Data structure Locus Vector containing GeneIDs Data Presence/Absence Calls OUTPUTS tissueModel Model produced by GIMME or Shlomi, Rxns containing only reactions carrying flux Statistics of test Use: (1 - Present, 0 - Absent) when proceedExp = 1 Use: (2 - Present, 1 - Marginal, 0 - Absent) when proceedExp = 0 Transcript RefSeq Accession (only required if proceedExp = 0) OPTINAL INPUTS proceedExp orphan ExpressedRxns - predicted by mRNA data UnExpressedRxns - predicted by mRNA data unknown - unable to be predicted by mRNA data Upregulated - added back into model Downregulated - removed from model UnknownIncluded - orphans added 1 - data are processed ; 0 - data need to be processed (Default = 1) 1 - leave orphan reactions in model for Shlomi Method 0 - remove orphan reactions (Default = 1) exRxnRemove Names of exchange reactions to remove (Default = []) solver Use either 'GIMME' or 'Shlomi' to create tissue specific model (Default = 'GIMME') options If using GIMME, enter objectiveCol here Default: objective function with 90% flux cutoff, written as: [find(model.c) 0.9] funcModel 1 - Build a functional model having only reactions that can carry a flux (using FVA), 0 - skip this step (Default = 0) http://opencobra.sourceforge.net/openCOBRA/opencobra_documentation/cobra_toolbox_2/index.html Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -13- Build a draft tissue-specific human macrophage model from the global human metabolic network and omics data Download the MAT file “testTissueModel.mat” from “https://github.com/opencobra/cobratoolbox/tree/master/testing/testTissueModel.” It contains the global human metabolic network model and a formatted expressionData structure. The model is the version of the human metabolic network reconstruction Recon 1 that was used to create an alveolar macrophage model1 using expression data from Kazeros et al.2 > > load(‘testTissueModel.mat’) The GIMME algorithm retains reactions from Recon 1 that are orphans or are present in the high-throughput data. The reactions with no detected expression are minimized and those not required to retain flux through the objective reaction are removed. > > [tissueModel,Rxns] = createTissueSpecificModel(model,expressionData); Where tissueModel is the GIMME algorithm-derived draft model; and Rxns is a structure with lists of all the reactions. The reactions fall into the following categories: • Expressed—1,769 potentially active reactions based on transcriptome data; • UnExpressed—497 reactions without requisite gene products based on transcriptome data; • Unknown—41 reactions unable to be predicted by transcriptome data; • Upregulated—52 UnExpressed reactions added back into model; • Downregulated— 0 Expressed reactions removed from model; and • UnknownIncluded—1,476 orphan reactions included. 1. Bordbar, A., Lewis, N.E., Schellenberger, J., Palsson, B.O. & Jamshidi, N. Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions. Mol. Syst. Biol. 6, 422 (2010). 2. Kazeros, A. et al. Overexpression of apoptotic cell removal receptor MERTK in alveolar macrophages of cigarette smokers. Am. J. Respir. Cell Mol. Biol. 39, 747–757 (2008). Schellenberger, J., R. Que, et al. (2011). "Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0." Nature protocols 6(9): 1290-1307. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -14- Step #1 – Collect Gene Expression Files for the Non-treated ARPE-19 Cells http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5741 Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -15- R Filename = AP_HSH.R source("http://bioconductor.org/biocLite.R") biocLite("affy") biocLite("simpleaffy") biocLite("annotate") biocLite("hgu133plus2.db") Step #2 - Convert the Affymetrix Data to a Format That can be Used by the “createTissueSpecific()” Cobra Toolbox Function library('affy') library('simpleaffy') library('annotate') library('hgu133plus2.db') setwd("/… /GSE5741_RAW") Set working directory affy.data = ReadAffy() data.AP = mas5calls(affy.data) data.AP.calls = exprs(data.AP) Affymetrix_ID <- row.names(data.AP.calls) Entrez_ID <- getEG(Affymetrix_ID, annotation(affy.data)) write.table(data.AP.calls[,1:3], file="AP1.txt", row.names=FALSE,col.names=F, quote=F, sep="\t") write.table(Entrez_ID, file="EID.txt", row.names=FALSE,col.names=F, quote=F, sep="\t") Utah State University BE 5500/6500 AP.txt EID.txt Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis Step #3 – Create a Tissue-specific GIMME-based Model with the Cobra Toolbox H. Scott Hinton, 2015 -16- Expression Data % CreateARPE19GIMMEModel.m clear; load('Recon2.mat'); ExpressionData = getExpD('EID.txt','AP.txt'); Rxns ARPE19 Model [ARPE19_GIMME,Rxns] = createTissueSpecificRec(Recon2_2,ExpressionData,1,1,[],'GIMME',options,1); writeCbModel(ARPE19_GIMME,'sbml','ARPE19_GIMME_Model'); Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -17- Lesson Outline • Overview • Creating Tissue-specific Models GIMME-like Family iMAT-like Family MBA-like Family • Tissue-specific Example Red Blood Cells Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -18- iMAT-like Family • The iMAT-like family comprises three methods, iMAT (Shlomi et al., 2008), INIT (Agren et al., 2012) and its extension, tINIT (Agren et al., 2014). • The iMAT-like family does not assume a RMF achieved by the cell. More specifically, these methods maximize the number of matches between reaction states (i.e., active or inactive) and corresponding data states (i.e., expressed or not non-expressed). • The mathematical formulation results in a MILP, in which the value of the binary variable denotes the most concordant reaction state for a given (data) context. • Although sharing the general strategy, iMAT, INIT and tINIT differ considerably respecting to how they deal with data: • iMAT integrates data in the constraints, • INIT and tINIT do so directly in the objective function. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis iMAT-like Family iMAT (Shlomi) • iMAT tackles this issue through an adapted flux variability analysis (FVA): First, it forces each reaction to be active and evaluates the similarity, and then repeats the process in a similar way by forcing each reaction to be inactive. The final outcome is computed by comparing the two obtained similarities. A reaction is termed active if its inclusion results in higher similarity to data, and it is termed as inactive, if its inclusion decreases this similarity. In the case that both similarities are equal, iMAT categorizes the reaction as undetermined. -19- INIT Agren, R., S. Bordel, et al. (2012). "Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT." PLoS computational biology 8(5): e1002518. Shlomi, T., M. N. Cabili, et al. (2008). "Network-based prediction of human tissue-specific metabolism." Nat Biotechnol 26(9): 1003-1010. • The algorithm first classifies reactions into two groups based on a previously defined threshold for the corresponding expression data; this results in the groups of reactions with a high and low associated expression values. It then maximizes the number of matches between a reaction state, defined through a minimum flux value, and the group to which the reaction belongs. Thus, if a reaction is included in the highly expressed group, the aim is to obtain a flux value over the minimum. H. Scott Hinton, 2015 • INIT was optimized to integrate evidences from the Human Protein Atlas, although expression data are integrated when proteomic evidences are missing. • INIT does not group reactions in categories in contrast to iMAT. Instead, it adopts experimental data to weight the binary variable of the corresponding reaction, whereby the weight is a function of experimental data (e.g., gene expression profiles) or a set of arbitrary numbers that quantify the color code of the entries of the Human Protein Atlas. • INIT imposes a positive net production of metabolites for which there is experimental support for that context or tissue. Hence, when a metabolite is experimentally determined to be present, its net production is forced to comply with a given lower bound. As a result, INIT allows the integration of metabolomics data in a qualitative way. • This method has been applied to generate a human metabolic reaction database (“Human Metabolic Atlas2 ”) where several tissuespecific model reconstructions can be examined. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis Step #3 – Create a Tissue-specific Shlomi-based Model with the Cobra Toolbox H. Scott Hinton, 2015 -20- Expression Data % CreateARPE19ShlomiModel.m clear; load('Recon2.mat'); Rxns Rxns.solution ARPE19 Shlomi Model ExpressionData = getExpD('EID.txt','AP.txt'); [ARPE19_Shlomi,Rxns] = createTissueSpecificRec(Recon2_2,ExpressionData,1,1,[],'Shlomi',options,1); writeCbModel(ARPE19_Shlomi,'sbml','ARPE19_Shlomi_Model'); Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -21- Lesson Outline • Overview • Creating Tissue-specific Models GIMME-like Family iMAT-like Family MBA-like Family • Tissue-specific Example Red Blood Cells Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -22- MBA-like Family • The MBA-like family is composed of MBA,mCADRE and FastCORE. • While previous methods perform both a flux prediction and a context-specific model reconstruction, MBA-like methods only return a context-specific model as output. • This family a priori categorizes reactions in two sets, the core and the non-core. • The core set includes those reactions with positive evidences (e.g., high through put data and/or well-curated biochemical knowledge) of being active in a certain context. Once these sets are defined, the MBA-like methods prune the GEM by eliminating non-core reactions that are unnecessary to ensure consistency in the core set, i.e., no blocked reaction is allowed in the final model. Thereby, all reactions must carry non-zero flux in at least one feasible solution. • Checking model consistency is a crucial part of these methods and also the main difference in comparison to the other methods. • FVA have been used to pinpoint blocked reactions, but it is computationally expensive since it requires solving two optimization problems per reaction. Thus, the major changes in formulation are due to finding faster alternatives to perform the same task. • Other differences arise when defining the core set and during the pruning process. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis MBA H. Scott Hinton, 2015 -23- MBA-like Family mCADRE • MBA divides the core set in two subcores: a set with high likelihood to be present in the context-specific model (CH), if evidence comes from well-curated biochemical knowledge in that particular context, and a set with moderate likelihood (CM) if evidence comes from context-specific high-throughput data. • A prominent characteristic of mCADRE lies in ranking reactions of the genome-scale reconstruction according to three scores: expression-, connectivity-, and confidence-level-based. • The algorithm performs the pruning iteratively and randomly by selecting a noncore (NC) reaction to be eliminated, and checking consistency at each step: if CH and a user-defined fraction of CM remain unblocked, MBA removes the reaction out of the model along with CM and NC corresponding blocked reactions. This routine is repeated until no reaction is left in NC. • • The topology of the final model clearly depends on the order in which non-core reactions are eliminated. Therefore, to remove artifacts due to the order, the algorithm is repeated a number of times (1000) to obtain a population of context-specific models. Later, reactions are ranked according to their occurrence in the population and added up to CH until a consistent model is obtained. This ranking determines the core set of reactions as well as the order by which non-core reactions are eliminated. The core is determined by fixing a threshold value to the expression based score; therefore, reactions whose values are above the threshold are included in the core, and the rest constitute the non-core reactions. • Unlike other methods, the expression-based score does not directly consider the levels of expression. Instead, it calculates the frequency of expressed states over a battery of transcript profiles in the same context, and, thus, requires a previous binarization of the expression data. • Reactions outside the core are then ranked according to the connectivity-based score, which assesses the connectedness of adjacent reactions, and the confidence level-based score, which accounts for the type of evidences supporting a reaction in the genome-scale reconstruction. • MBA proposes an alternative to FVA to check consistency in a more efficient way: First, it solves a LP problem which maximizes the total sum of fluxes. It then removes active reactions (i.e., carrying non-zero flux) and repeats the LP over the remaining set of reactions. If no reaction is found to be active, FVA is applied to each reaction to determine whether it is blocked. The process is repeated until all reactions have been classified either as blocked or unblocked. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -24- MBA-like Family mCADRE • A prominent characteristic of mCADRE lies in ranking reactions of the genome-scale reconstruction according to three scores: expression-, connectivity-, and confidence-level-based. • This ranking determines the core set of reactions as well as the order by which non-core reactions are eliminated. The core is determined by fixing a threshold value to the expression based score; therefore, reactions whose values are above the threshold are included in the core, and the rest constitute the non-core reactions. • Unlike other methods, the expression-based score does not directly consider the levels of expression. Instead, it calculates the frequency of expressed states over a battery of transcript profiles in the same context, and, thus, requires a previous binarization of the expression data. • Reactions outside the core are then ranked according to the connectivity-based score, which assesses the connectedness of adjacent reactions, and the confidence level-based score, which accounts for the type of evidences supporting a reaction in the genome-scale reconstruction. • Non-core reactions are in turn sequentially removed according to the previous ranking, and consistency is evaluated. • Here, mCADRE presents two other innovations: it defines a set of key metabolites, with positive evidences of appearing in the context specific model reconstruction, and relaxes the stringent condition of including all core reactions in the final model. More specifically, a reaction can only be eliminated if it does not prevent the production of a key metabolite and if it is unnecessary to ensure core consistency. However, if evidence exists for the respective transcript to be unexpressed in any of the context-specific samples, mCADRE allows the elimination of the reaction even if it blocks some of the core reactions. • To this end, two conditions have to be satisfied: (1) production of key metabolites is not impaired and (2) the relation between the number of blocked core and non-core reactions matches a predefined ratio. • To check model consistency, mCADRE maintains the procedure proposed in MBA, although adapted to use FastFVA instead of maximizing the total sum of flux values. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis MBA -25- MBA-like Family FastCore • MBA divides the core set in two subcores: a set with high likelihood to be present in the context-specific model (CH), if evidence comes from well-curated biochemical knowledge in that particular context, and a set with moderate likelihood (CM) if evidence comes from context-specific high-throughput data. • The algorithm performs the pruning iteratively and randomly by selecting a noncore (NC) reaction to be eliminated, and checking consistency at each step: if CH and a user-defined fraction of CM remain unblocked, MBA removes the reaction out of the model along with CM and NC corresponding blocked reactions. This routine is repeated until no reaction is left in NC. • The topology of the final model clearly depends on the order in which non-core reactions are eliminated. Therefore, to remove artifacts due to the order, the algorithm is repeated a number of times (1000) to obtain a population of context-specific models. Later, reactions are ranked according to their occurrence in the population and added up to CH until a consistent model is obtained. • H. Scott Hinton, 2015 MBA proposes an alternative to FVA to check consistency in a more efficient way: First, it solves a LP problem which maximizes the total sum of fluxes. It then removes active reactions (i.e., carrying non-zero flux) and repeats the LP over the remaining set of reactions. If no reaction is found to be active, FVA is applied to each reaction to determine whether it is blocked. The process is repeated until all reactions have been classified either as blocked or unblocked. • While FastCORE aims also at obtaining a minimal consistent model containing all core reactions, typical for this family of methods, it differs principally from MBA and mCADRE in the algorithmic strategy. • Instead of eliminating one non-core reaction followed by consistency evaluation at each step, FastCORE solves two LPs: The first LP maximizes the cardinality of the core set of reactions, computed as the number of reaction values above a small positive constant. On the other hand, the second LP minimizes the cardinality outside the core set by minimizing the L1-norm of the flux vector, under the constraint that the entire core set must remain active. • These two LPs are repeatedly applied in alternating fashion until the core set is consistent, whereby activation of all core reactions is ensured while including a minimum set of non-core reactions in the final model. To deal with reversible reactions, FastCORE evaluates both directions by changing the sign of the corresponding column of the stoichiometric matrix. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -26- Flowchart Context-specific Reconstruction Methods Optimal choice of methodologies when tackling a context-specific reconstruction problem. The choice can be made by answering a few questions, in a flowchart manner, related to: demand of model extraction and flux prediction, knowledge on a required metabolic functionality, the type of experimental data available or the computational platform. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -27- Summary of Methods for Context-specific Metabolic Model Extraction. Estevez, S. R. and Z. Nikoloski (2014). "Generalized framework for context-specific metabolic model extraction methods." Frontiers in plant science 5. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -28- Recon 1 Cell-specific Reconstructions Human brain Lewis, N. E., G. Schramm, et al. (2010). "Large-scale in silico modeling of metabolic interactions between cell types in the human brain." Nat Biotechnol 28(12): 1279-1285. • Liver Gille, C., C. Bolling, et al. (2010). "HepatoNet1: a comprehensive metabolic reconstruction of the human hepatocyte for the analysis of liver physiology." Molecular Systems Biology 6: 411. Jerby, L., T. Shlomi, et al. (2010). "Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism." Molecular Systems Biology 6: 401. • Kidney Chang, R. L., L. Xie, et al. (2010). "Drug off-target effects predicted using structural analysis in the context of a metabolic network model." PLoS computational biology 6(9): e1000938. • Aveolar macrophage Bordbar, A., N. E. Lewis, et al. (2010). "Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions." Molecular Systems Biology 6: 422. • Red Blood Cells Bordbar, A., N. Jamshidi, et al. (2011). "iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states." BMC systems biology 5: 110. • Human small intestinal epithelial cells Sahoo, S. and I. Thiele (2013). "Predicting the impact of diet and enzymopathies on human small intestinal epithelial cells." Hum Mol Genet 22(13): 27052722. Platelets Thomas, A., S. Rahmanian, et al. (2014). "Network reconstruction of platelet metabolism identifies metabolic signature for aspirin resistance." Sci Rep 4: 3925. • Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -29- Recon 2 Cell-specific Reconstructions • Using Recon 2 Haraldsdottir, H. S., Thiele, I., Fleming, R. M. T.,"Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2", Journal of Cheminformatics, 6(2) (2014). • Kidney Quek, L. E., S. Dietmair, et al. (2014). "Reducing Recon 2 for steady-state flux analysis of HEK cell culture." J Biotechnol 184: 172178. • Membrane Transporters Sahoo, S., Aurich, M. K., Jonsson, J. J. and Thiele, I., "Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease", Front. Physiol., doi: 10.3389 (2014). Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -30- ‘Google Map’ of Human Metabolism http://medicalxpress.com/news/2013-03-international-consortium-google-human-metabolism.html Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -31- Recon 2–based Models http://humanmetabolism.org/ Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -32- Lesson Outline • Overview • Creating Tissue-specific Models GIMME-like Family iMAT-like Family MBA-like Family • Tissue-specific Example Red Blood Cells Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -33- Red Blood Cells (erythrocytes) • The most common type of blood cell and the vertebrate organism's principal means of delivering oxygen (O2) to the body tissues—via blood flow through the circulatory system. Red blood cells (RBCs) take up oxygen in the lungs or gills and release it into tissues while squeezing through the body's capillaries. • The cytoplasm of erythrocytes is rich in hemoglobin, an iron-containing biomolecule that can bind oxygen and is responsible for the red color of the cells. The cell membrane is composed of proteins and lipids, and this structure provides properties essential for physiological cell function such as deformability and stability while traversing the circulatory system and specifically the capillary network. Scanning electron micrograph of blood cells. From left to right: human erythrocyte, thrombocyte (platelet), leukocyte. • In humans, mature red blood cells are flexible and oval biconcave disks. They lack a cell nucleus and most organelles, in order to accommodate maximum space for hemoglobin. Approximately 2.4 million new erythrocytes are produced per second in human adults. The cells develop in the bone marrow and circulate for about 100–120 days in the body before their components are recycled by macrophages. Each circulation takes about 20 seconds. Approximately a quarter of the cells in the human body are red blood cells. https://en.wikipedia.org/wiki/Red_blood_cell Utah State University Red Blood Cell membrane major proteins BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -34- Building an in silico Red Blood Cell (Erythrocyte) Metabolic Network • The three major required data types are: the human genome sequence, high-throughput data (specifically, proteomics for an enucleated cell), and primary literature • To build the erythrocyte network, iAB-RBC-283, proteomics was used to remove non-erythrocyte related open reading frames (ORFs) or genes. • Detailed curation utilizing protein, metabolite, and transport experimental literature was needed to build a high-quality metabolic reconstruction. • Without network reconstruction and rigorous curation, the experimentally generated proteomic data is raw and difficult to interpret. Bordbar, A., N. Jamshidi, et al. (2011). "iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states." BMC systems biology 5: 110. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -35- Functional Assessment of Model • Flux variability analysis (FVA) was utilized to determine the functional metabolic pathways of the erythrocyte network. • FVA is used to define the bounding box on network capabilities (FVA determines the minimum and maximum allowable flux through each metabolic reaction). Reactions with a calculated non-zero maximum or minimum have the potential to be active and have a potential physiological function. For a reaction to have a non-zero flux, the reaction must be linked to other metabolic reactions and pathways and plays a functional role in the system. Thus, potentially active reactions are deemed as functional. Thus, we use FVA to determine the capability/capacity of the network reactions to determine metabolic functionality. • After determining which reactions were functional, the reaction list was perused to determine pathway and subsystem functionality in the network. Bordbar, A., N. Jamshidi, et al. (2011). "iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states." BMC systems biology 5: 110. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -36- Analyzing iAB-RBC-283 as a functional biomarker • The Morbid Map from the Online Mendelian Inheritance in Man (OMIM) [51] and the DrugBank [52] were downloaded from their respective databases. The enzyme names in iAB-RBC-283 were cross-referenced against the database entries to determine morbid SNPs in erythrocyte proteins and drugs with protein targets in the erythrocyte. The morbid SNPs that did not have sole pathological effects in the erythrocyte were classified using the Merck Manual [43]. • Just as FVA can be used to assess the function of a network under a particular set of constraints, it can also be used to assess the changes in function and thus has applications for characterizing disease states [53] and identifying biomarkers [54]. When simulating a morbid SNP or a drug inhibited enzyme, the lower and upper bound constraints on the affected reaction is set to zero. FVA is then used to characterize the exchange reactions under morbid SNP or drug treated conditions and then compared to the normal state. • A reaction was considered to be confidently altered if the change in the minimum or maximum flux was 40% of the total flux span. The flux span is defined as the absolute difference between the original (unperturbed) maximum and minimum fluxes. Bordbar, A., N. Jamshidi, et al. (2011). "iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states." BMC systems biology 5: 110. Utah State University BE 5500/6500 There are four major differences that can occur for an exchange reaction in two different states: i) the reaction is either active (non-zero minimum or maximum flux) or inactive (zero minimum and maximum flux), ii) the exchange becomes fixed in one direction (uptake or secretion only), iii) there is a magnitude change in exchange, iv) the reaction is unaffected and is the same for both states. Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -37- Topological Map of the Human Erythrocyte Metabolic Network • 283 genes • 292 reactions, • 267 metabolites • 77 transporters Bordbar, A., N. Jamshidi, et al. (2011). "iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states." BMC systems biology 5: 110. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -38- Additional File #2: Detected SNPs and FVA Results for SNP Perturbations OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. Its official home is omim.org. Bordbar, A., N. Jamshidi, et al. (2011). "iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states." BMC systems biology 5: 110. Utah State University BE 5500/6500 A reaction was considered to be confidently altered if the change in the minimum or maximum flux was 40% of the total flux span. The flux span is defined as the absolute difference between the original (unperturbed) maximum and minimum fluxes. Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -39- OMIM Database http://omim.org/ Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res 2009, , 37 Database: D793-6. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -40- Additional File #3: Detected Drug Targets and FVA Results for Drug Effect Perturbations A reaction was considered to be confidently altered if the change in the minimum or maximum flux was 40% of the total flux span. The flux span is defined as the absolute difference between the original (unperturbed) maximum and minimum fluxes. Describes relationship between RBC genes and drugs in the drug bank Spreadsheet also includes drug descriptions like: Fluocinonide - A topical glucocorticoid used in the treatment of eczema. [PubChem] Bordbar, A., N. Jamshidi, et al. (2011). "iAB-RBC-283: A proteomically derived knowledge-base of erythrocyte metabolism that can be used to simulate its physiological and patho-physiological states." BMC systems biology 5: 110. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -41- DrugBank Database http://www.drugbank.ca/ Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 2008, , 36 Database: D901-6. Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -42- RBC Flux Balance Analysis (Default) EX_ac(e) EX_acnam(e) EX_ade(e) EX_adn(e) EX_co2(e) EX_fru(e) EX_fum(e) EX_gal(e) EX_gam(e) EX_glc(e) EX_h(e) EX_h2o(e) EX_hco3(e) EX_lac-L(e) EX_mal-L(e) EX_man(e) EX_nh4(e) EX_pyr(e) ACGAM2E ACGAMK 3.74e-05 -3.74e-05 0.01 -0.01 -5.87112 -0.0075 -0.25 -0.3169 -1e-05 -1.12 8.79668 -6.13121 5.87112 2.92556 0.25 -0.01 4.74e-05 3.74e-05 -3.74e-05 3.74e-05 Utah State University ACNAMt2 ACNMLr ACt2r ADEt ADNt AGDC CO2t ENO FBA FRUt1r FUM FUMtr G6PDA GALK GALT GALU GALt1r GAMt1r GAPD GLCt1r 3.74e-05 3.74e-05 -3.74e-05 -0.01 0.01 3.74e-05 5.87112 2.92556 1.46111 0.0075 0.25 0.25 4.74e-05 0.3169 1000 -1000 0.3169 1e-05 2.92556 1.12 H2Ot HCO3E HCO3_CLt HEX1 HEX10 HEX4 HEX7 Ht KCCt L-LACt2r LDH_L MALt MAN6PI MANt1r NAt NH4t3r NaKt PFK PGI PGK BE 5500/6500 6.13121 5.87112 -5.87112 1.12 1e-05 0.01 0.0075 5.87113 -5.87112 -2.92556 -2.92556 -0.25 0.01 0.01 8.80668 4.74e-05 2.93556 1.46111 1.4369 -2.92556 PGM PGMT PPM PUNP1 PYK PYRt2r RPE RPI TALA TKT1 TKT2 TPI UDPG4E UGLT -2.92556 0.3169 0.01 0.01 2.92556 -3.74e-05 0.00666667 0.00666667 0.00333333 0.00333333 0.00333333 1.46111 -0.3169 -999.683 Objective Function testRBC.m Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -43- RBC Flux Balance Analysis (Default,No Loops) EX_ac(e) EX_acnam(e) EX_ade(e) EX_adn(e) EX_arg-L(e) EX_co2(e) EX_fru(e) EX_gal(e) EX_gam(e) EX_glc(e) EX_h(e) EX_h2o(e) EX_h2o2(e) EX_hco3(e) EX_lac-L(e) EX_man(e) EX_nh4(e) EX_o2(e) EX_ptrc(e) EX_pyr(e) EX_urea(e) 3.74e-05 -3.74e-05 0.01 -0.01 -0.1152 -5.75592 -0.0075 -0.3169 -1e-05 -1.12 10 4.00359 -10 5.87112 2.2663 -0.01 4.74e-05 5 0.1152 0.659295 0.1152 Utah State University ACGAM2E ACGAMK ACNAMt2 ACNMLr ACt2r ADEt ADNt AGDC ARGN ARGt5r CAT CO2t DM_nadh ENO FBA FRUt1r G6PDA GALK GALt1r GAMt1r GAPD -3.74e-05 3.74e-05 3.74e-05 3.74e-05 -3.74e-05 -0.01 0.01 3.74e-05 0.1152 0.1152 5 5.75592 0.659258 2.92556 1.46111 0.0075 4.74e-05 0.3169 0.3169 1e-05 2.92556 GLCt1r H2O2t H2Ot HCO3E HCO3_CLt HEX1 HEX10 HEX4 HEX7 Ht KCCt L-LACt2r LDH_L MAN6PI MANt1r NAt NH4t3r NaKt O2t ORNDC PFK BE 5500/6500 1.12 10 -4.00359 5.87112 -5.87112 1.12 1e-05 0.01 0.0075 7.07445 -5.87112 -2.2663 -2.2663 0.01 0.01 8.80668 4.74e-05 2.93556 -5 0.1152 1.46111 PGI PGK PGM PGMT PPM PTRCt PUNP1 PYK PYRt2r RPE RPI TALA TKT1 TKT2 TPI UDPG4E UGLT UREAt 1.4369 -2.92556 -2.92556 0.3169 0.01 -0.1152 0.01 2.92556 -0.659295 0.00666667 0.00666667 0.00333333 0.00333333 0.00333333 1.46111 -0.3169 0.3169 -0.1152 Objective Function testRBC.m Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis RBC Metabolic Map H. Scott Hinton, 2015 -44- Citric Acid Cycle Pyruvate Metabolism Glycolysis/Gluconeogenesis Pentose Phosphate Pathway Glutamate Metabolism Entire RBC Network Utah State University BE 5500/6500 Lesson: Tissue Specific Models Constraint-based Metabolic Reconstructions & Analysis H. Scott Hinton, 2015 -45- Lesson Outline • Overview • Creating Tissue-specific Models GIMME-like Family iMAT-like Family MBA-like Family • Tissue-specific Example Red Blood Cells Utah State University BE 5500/6500 Lesson: Tissue Specific Models