SUPPLEMENTARY MATERIAL Quantitative Prediction of Cellular Metabolism with Constraint-based Models: The COBRA Toolbox S.1 Installation instructions for the lp_solve linear programming solver on Windows The instructions below assume that the current version of lp_solve is 5.5.0.10. The instructions should hold for any other version as well. Note that lp_solve is quite slow and is not suitable for large-scale optimization problems such as flux variability analysis or single/double gene deletion studies. 1. Download an uncompress lp_solve_5.5.0.10_dev.zip from https://sourceforge.net/projects/lpsolve/. 2. Copy lpsolve55.dll into the C:/WINDOWS/system32 directory or any other location that is in the Windows path for your computer. 3. Download an uncompress lp_solve_5.5.0.10_MATLAB_exe.zip from https://sourceforge.net/projects/lpsolve/. 4. Copy the files included in the zip file to a directory that is in your Matlab path (for example the solvers subfolder under the main COBRA Toolbox folder). 5. Test the lp_solve solver by entering mxlpsolve in the Matlab command window. If lp_solve libraries are correctly installed, you should see usage instructions of the mxlpsolve interface. 6. Run ex.m from the command line in Matlab to test full functionality of lp_solve. 7. Follow troubleshooting instructions in MATLAB.htm document that is included in lp_solve_5.5.0.10_MATLAB_exe.zip if problems occur. S.2 Description of the model structure used by the COBRA Toolbox The model structure used by the COBRA Toolbox contains the following fields: rxns: A list of all of the reaction abbreviations in the same order they appear in the stoichiometric matrix mets: A list of all of the metabolite abbreviations in the model in the same order they appear in the stoichiometric matrix S: The stoichiometric matrix in sparse format lb: The lower bound corresponding to each reaction, in order ub: The upper bound corresponding to each reaction, in order c: The relative weight of each reaction in the objective function—often a single one in the position corresponding to the biomass reaction and zeros elsewhere subSystem: The metabolic subsystem for each reaction rules: Boolean rules for each reaction describing the gene-reaction relationship. For example ‘gene1 and gene2’ indicate that the two gene products are part of a enzyme comples whereas ‘gene1 or gene2’ indicate that the two gene products are isozymes that catalyze the same reaction. genes: The gene names of all the genes included in the model rxnGeneMat: A matrix with as many rows as there are reactions in the model and as many columns as there are genes in the model. The ith row and jth column contains a one if the jth gene in genes is associated with the ith reaction in rxns and zero otherwise. S.3 Description of the SBML file structure (Level 2 version 1) The SBML files generated in this work contain the following types of data and the format of this file should be generated by using the standards outlined at (http://sbml.org/documents/) for SBML, level 2 version 1 reactions (format: ‘R_<reaction abbreviation >’) o reaction name, reversibility, reaction stoichiometry, gene-protein-reaction (GPR) association, subsystem, E.C. number metabolites (format: ‘M_<metabolite abbreviation>_<compartment abbreviation>’) o metabolite name, compartment, charge, formula (appended to the end of the name, <metabolite name>_FORMULA) a flux distribution associated with a steady-state modeling simulation o lower bound, upper bound, objective coefficient, flux value, reduced cost Special helpful notes on the specific SBML format developed for this toolbox: For a reaction, the notes field was used to provide the gene association, protein association, subsystem and protein class The lower and upper bounds on a reaction, the objective coefficient for the presented simulation, the flux value for that particular simulation and the reduced cost are all parameters assigned to the flux value Metabolites are listed in the <listOfSpecies> section of the SBML file in the following format (example for dihydroxyacetone phosphate): <listOfSpecies> … <species id="M_dhap_c" name="M_Dihydroxyacetone_phosphate_C3H5O6P" compartment="Cytosol" charge="-2" boundaryCondition="false"/> … </listOfSpecies> Reactions are listed in the <listOfReactions> section of the SBML file in the following format (example for triose phosphate isomerase): <listOfReactions> … <reaction id="R_TPI" name="R_triose_phosphate_isomerase" reversible="true"> <notes> <html:p>GENE_ASSOCIATION: b3919</html:p> <html:p>PROTEIN_ASSOCIATION: Tpi</html:p> <html:p>SUBSYSTEM: S_GlycolysisGluconeogenesis</html:p> <html:p>PROTEIN_CLASS: 5.3.1.1</html:p> </notes> <listOfReactants> <speciesReference species="M_dhap_c" stoichiometry="1.000000"/> </listOfReactants> <listOfProducts> <speciesReference species="M_g3p_c" stoichiometry="1.000000"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <ci> LOWER_BOUND </ci> <ci> UPPER_BOUND </ci> <ci> OBJECTIVE_COEFFICIENT </ci> </apply> </math> <listOfParameters> <parameter id="LOWER_BOUND" value="-1000" units="mmol_per_gDW_per_hr"/> <parameter id="UPPER_BOUND" value="1000" units="mmol_per_gDW_per_hr"/> <parameter id="OBJECTIVE_COEFFICIENT" value="0.000000"/> </listOfParameters> </kineticLaw> </reaction> … <listOfReactions> The reactants of a reaction are listed in the <listOfReactants> section whereas the products are listed in the <listOfProducts> section. Parameters for each reaction (lower and upper bounds, objective coefficients) are listed in the <kineticLaw> section. The gene-protein-reaction associations and subsystems for each reaction are listed in the <notes> section. S.4 Adding the capability to use other LP solvers with the COBRA Toolbox The interfaces to LP solvers are defined in the solveLPStm.m function that is called by all other functions in the Toolbox that use LP solvers. A new solver can be added by simply inserting a new section into the ‘switch-case’ statement that deals with selecting the solver that the user wants to use. For example, the following segment allows the user to use the TOMLAB CPLEX solver (Tomlab Optimization Inc., San Diego, CA) by defining the CBTLPSOLVER global variable to be ‘tomlab_cplex’: case ‘tomlab_cplex’ if (~isempty(csense)) b_L(csense == 'E') b_U(csense == 'E') b_L(csense == 'G') b_U(csense == 'G') b_L(csense == 'L') b_U(csense == 'L') else = = = = = = b(csense b(csense b(csense 1e6; -1e6; b(csense == 'E'); == 'E'); == 'G'); == 'L'); b_L = b; b_U = b; end tomlabProblem = lpAssign(osense*c,A,b_L,b_U,lb,ub); Result = tomRun('cplex', tomlabProblem, 0); x = Result.x_k; f = osense*Result.f_k; stat = Result.Inform; w = Result.v_k(1:length(lb)); y = Result.v_k((length(lb)+1):end); if (stat == 1) solStat = 1; elseif (stat == 3) solStat = 0; elseif (stat == 2 | stat == 4) solStat = 2; else solStat = -1; end The key parts of this interface code are converting the description of the LP problem to a format suitable to the particular solver, calling the appropriate external functions with the correct arguments, assigning the correct solution status to be returned to the toolbox, and converting variables returned by the function into those used by the Toolbox. Examples of how an interface is built for a number of solvers can be found in solveLPStm.m function. Note that certain solvers may require calling multiple external functions. In this case, it may be preferable to write an additional solver interface function that handles these function calls (these can be placed in the ‘solvers’ folder). S.5 Linear MOMA approach The COBRA Toolbox includes an implementation of a linear version of the standard Minimization of Metabolic Adjustment (MOMA) approach introduced by Segre et al. [1]. MOMA finds the flux distribution (vdel) and growth rate (μdel) for a gene deletion strain by solving the following quadratic programming problem: del c T v del min v del i viwt 2 i Sv del (S.1) 0 del videl videl , min vi , max Here the wild type flux distribution vwt is found solving a separate FBA problem: wt max c T v wt Sv wt 0 v wt i , min v (S.2) v wt i wt i , max In the linear version of the MOMA approach (linearMOMA) the same FBA problem (S.2) is first solved for the wild type strain model to obtain the wild type growth rate μwt. In order to obtain the deletion strain flux distribution and growth rate, the following optimization problem is then solved: del c T v del min v del i viwt i Sv v del del i , min 0 videl videl , max (S.3) Sv wt 0 viwt,min viwt viwt,max wt c T v wt This problem can be solved as a standard linear programming problem using the approach described in [2]. The differences between MOMA and linearMOMA are 1) the use of 1-norm objective function (absolute value) as opposed to a 2-norm objective (Euclidean norm), and 2) solving the wild-type and deletion strain problems simultaneously (three last constraints in S.3) in order to avoid problems with alternative optimal flux distributions in solving the FBA problem (S.2). The linear MOMA approach was first proposed as an alternative to MOMA by Burgard et al. [3]. References 1. 2. 3. Segre, D., D. Vitkup, and G.M. Church, Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci U S A, 2002. 99(23): p. 151127. Herrgard, M.J., S.S. Fong, and B.O. Palsson, Identification of genome-scale Metabolic Network Models Using Experimentally Measured Flux Profiles. PLoS Computational Biology, 2006. 2(7): p. e72. Burgard, A.P., P. Pharkya, and C.D. Maranas, Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng, 2003. 84(6): p. 647-57.