Discovery and Analysis of Novel Biochemical Transformations Linda J. Broadbelt Department of Chemical and Biological Engineering Northwestern University Evanston, IL 60208 How Can We Create Products from Natural Resources? •Concern over dwindling petroleumbased resources sparks exploration of alternative feedstocks www.clemson.edu/edisto/ corn/corn.htm •Biochemical processes are being explored as alternatives to traditional chemical processes www.timberland.com/.../ tim_ product_detail.jsp?OID=18298 Overall reaction biomass polysaccharides monosaccharides glucose ethanol Reaction Networks of Novel Biochemical Transformations •Reactants, intermediates and products DG1 k1 •Thermodynamic parameters DG9 DG6 k6 k5 DG4 k4 •Kinetic parameters k2 DG5 •Reactions DG3 k3 DG7 DG8 k7 k8 DG11 k11 DG10 k10 k9 DG16 k16 DG13 DG14 k14 DG15 k15 k13 DG12 k12 DG18 k18 DG17 k17 Challenges for Reaction Network Development Reactive intermediates have not been detected Pathways have not been elucidated experimentally Thermodynamic and kinetic parameters are unknown Reaction networks are large Construction is tedious and prone to user’s bias and errors Computer generation of reaction networks Elements of Computer Generated Reaction Networks • Graph Theory • Reaction Matrix Operations • Connectivity Reactants Scan Reaction • Uniqueness Types Determination Reaction • Property Rules Calculation • Termination Criteria DG1 k1 k2 DG5 k5 DG4 k4 DG9 DG3 k3 DG6 k6 DG7 DG8 k7 k8 DG11 k11 DG10 k10 k9 DG16 k16 DG13 DG14 k13 k14 DG15 k15 DG12 k12 DG18 k18 DG17 k17 Bond-Electron Representation Allows Implementation of Chemical Reaction C H H H H 01111 10000 10000 10000 10000 methane C 1111 H 1000 H 1000 H 1000 methyl radical C C H H H H 021001 200110 100000 010000 010000 100000 ethylene ij entries denote the bond order between atoms i and j ii entries designate the number of nonbonded electrons associated with atom i Chemical Reaction as a Matrix Addition Operation H • + CH4 •CH3 + H2 Reactant Matrices C H H H H 01111 10000 10000 10000 10000 H• 1 C H H H H H• Reaction Operation H 001 0 -1 1 H 010 C• 0 1 0 C 1 0 0 + -1 1 0 H 100 1 0 -1 H• 0 0 1 Reactant Matrix Reordered Reactant Matrix 011110 100000 100000 100000 100000 000001 H C H• H H H 010 100 001 010 010 010 000 111 000 000 000 000 Product Matrix H C• H H H H 001 010 100 010 010 010 000 111 000 000 000 000 Unique Patterns of Observed Enzyme Chemistry EC i.j.k.l → unique enzyme i → main class j → functional group k → cofactor / cosubstrate Tipton, S.B. and Boyce, S. Bioinformatics. 16 (2000), 34-40 Kanehisa, M. and Goto, S. Nucleic Acid Research. 28 (2000), 27-30 4th level is specific to substrate Formulation of Reaction Matrices Using Enzyme Classification System Enzyme commission (EC) code number provides systematic names for enzymes EC i.j.k.l unique enzyme i the main class j the specific functional groups k cofactors l specific to the substrates Formulation of Reaction Matrices Using Enzyme Classification System Enzyme commission (EC) code number provides systematic names for enzymes EC i.j.k.l unique enzyme i the main class j the specific functional groups k cofactors l specific to the substrates Generalized Enzyme Function Examined at the i.j.k Level •More than 5,000 specific enzyme functions (i.j.k.l) •Fewer than 250 generalized enzyme functions (i.j.k) •Novel enzyme functions should be expected through genomic sequencing, proteomics and protein engineering Example of a Generalized Enzyme Reaction Generalized enzyme reaction (EC 4.2.1) • EC 4.2.1.2 (fumarate hydratase) OH CO2H HO2C HO2C CO2H + O H H H-C-C-O-H - - - - • EC 4.2.1.3 ( aconitate hydratase) HO HO2C CO2H CO2H HO2C CO2H CO2H + O H H C=C + H2O Matrix Representation of Generalized Enzyme Function (i.j.k) Generalized enzyme reaction EC 4.2.1 OH H O 2C CO 2 H HO 2C + O H H C=C + HOH H -C -C -O -H H 0 C 1 C 0 O 0 H 1 0 1 0 C 0 1 0 1 C 0 0 1 4 O Reactant CO 2H + C -1 0 1 0 C 0 1 0 -1 O 1 0 -1 0 H0 H -1 = C 0 C 1 O Reaction operator O 1 0 0 4 H 0 C 0 C 0 H 0 0 2 C 0 2 0 C 1 0 0 O Products Discovery of Novel Biosynthetic Routes I.J.K A+B C I.J.K L.M.N Q.R.S A+B L.M.N I.J.K L.M.N Q.R.S C D I.J.K L.M.N Q.R.S C +A+B Q.R.S E C +A+B D D E Generation Generation 0 1 Generation 2 Generation 3 Implications for Novel Pathway Development Given a novel reaction (reactant/product), can we identify enzymes (catalysts) that could be engineered (evolved) to carry this novel biotransformation ? If A gives B under 2.4.1 action, then target enzymes within the 2.4.1 class Application of Reaction Matrix Approach Step 1 Enumerate all enzymes in the EC system Step 2 Choose a specific pathway to explore its synthetic ability Example Aromatic amino acid biosynthesis Exists in higher plants and microorganisms Pathway does not exist in mammals Aromatic Amino Acid Biosynthesis: Phenylalanine and Tyrosine chorismate phenylpyruvate glutamate prephenate dehydratase chorismate mutase aromatic aminotransferase 4-hydroxyphenyl pyruvate prephenate dehydrogenase prephenate phenylalanine glutamate tyrosine Aromatic Amino Acid Biosynthesis: Phenylalanine and Tyrosine chorismate phenylpyruvate glutamate 2.6.1.57 5.4.99.5 chorismate mutase prephenate dehydratase 4.2.1.51 1.3.1.12 prephenate aromatic aminotransferase 4-hydroxyphenyl pyruvate prephenate dehydrogenase phenylalanine 2.6.1.57 glutamate tyrosine Reaction Misclassification (?) Some reactions within classes are not general General 4.2.1 reaction (4. = lyase) Loses water (4.2.1 = hydrolyase) AND forms a double bond. However… 4.2.1.51 prephenate dehydratase + H2O + CO2 It is both a carboxy-lyase (4.1.1) and a hydro-lyase (4.2.1) Reaction Decomposition 4.2.1.51 prephenate dehydratase + H2O + CO2 4.2.1.51 can be broken down into 3 general reactions: 4.1.1 will decarboxylate (4.1.1 is a carboxy-lyase) 5.3.3 will rearrange the double bond (5.3.3 transposes C=C bonds) 4.2.1 will lose H2O and form a double bond (4.2.1 is a hydro-lyase) Mapping Results •Although only 2500 reactions in the KEGG and 269 reactions in the iJR904 model were contained in the curated EC classes, 3267 (50%) of the KEGG reactions and 430 (46%) of the iJR904 reactions were reproduced using the 86 reaction rules •The reproduced reactions are involved in 129 different third-level enzyme classes in the KEGG and iJR904 •100% of the reactions contained in 25 of the uncurated EC classes in the KEGG were mapped to the 86 existing reaction rules Tryptophan Biosynthesis Pathway Input Molecules phosphoenolpyruvate (PEP), erythrose-4-phosphate (E4P), glutamine, serine, ribose-5phosphate (R5P) Cofactors ATP, NADPH Specific Enzyme Actions 12 The Evolution and Wealth of the Aromatic Amino Acid Biochemistry Number of Products 105 104 TRP 103 PHE TYR 102 Convergence 10 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Generation Number Specialty Organic Chemicals 7-carboxyindole 3-Hydroxypropanoate from Pyruvate • 3-Hydroxypropanoate is a useful chemical with known biochemical production routes • Generate all of the possible compounds and reactions from pyruvate using only the reaction rules involved in the known biosynthetic routes to 3HP • Generate all of the possible compounds and reactions from pyruvate using all of the 86 current reaction rules Number of pathways Novel Biosynthetic Pathways Discovered: Pyruvate to 3HP pathways discovered using only the reaction rules involved in the known pathways to 3HP pathways discovered using all reaction rules Distribution of lengths of pathways 106 105 104 103 102 10 1 2 3 4 5 6 7 8 Pathway length A pathway of length two and a pathway of length three were both discovered using the additional reaction rules 9 10 3HP-CoA Propan-2-ol Propane-1,3-diol 3-HP Acryloyl-CoA 3-hydroxypropanol Propenoate Propene Allyl alcohol Lactoyl-CoA 3-Oxopropionyl-CoA 2-hydroxy-2,4Lactate Propane 1,2 diol pentadienoate Propane-1-ol Ethylamine Ethanol Acetaldehyde Malonate semialdehyde Ethylene Propanoyl-CoA Alanine Malate Pyruvate Fumarate Aspartate Propanoate Beta-alanyl-CoA Oxaloacetate Beta-alanine Propanoate Homoserine Theonine Acrolein Hydroxyacetone Lactaldehyde 3HP-CoA 3-HP Acryloyl-CoA Lactoyl-CoA Lactate Malonate semialdehyde Pyruvate Aspartate Beta-alanyl-CoA Oxaloacetate Beta-alanine What Screening Methods Can We Use to Identify the Most Attractive Pathways? •Pathway length •Fewest novel intermediates •Thermodynamic feasibility •Maximum achievable yield to 3HP from glucose during anaerobic growth •Maximum achievable intracellular activity at which 3HP can be produced •Protein docking calculations •Quantum chemical investigations Shortest Novel Pathways to 3HP CO2 glu Oxaloacetate 4.1.1 Rev Pathway 2-oxo H+ nadh 2-oxo N1 CO2 Alanine 4.1.1 Aspartate 2.6.1 glu 2-oxo Ethylamine 4.1.1 1.1.1 Pyruvate glu Acetaldehyde CO 2 2.6.1 CO2 nad 4.2.1 Rev Lactate NH3 4.3.1 Propenoate CoA Pathway N2 NH3 CoA 2-oxo 2.6.1 2.3.1 H2O Β-alanine H2O CoA H2O Lactoyl CoA Β-alanyl CoA H2O glu Malonate Semialdehyde CoA H+ H+ nadh nadh H2O 2-oxo nadh H2O Acryloyl CoA NH3 H+ nad 3-oxopropionyl CoA 1.1.1 4.2.1 glu nad nad H2O H2O 3-HP-CoA CoA 2.3.1 Rev 4.2.1 Rev 3-HP KEGG Reaction not in Patented Pathways Part of the Patented Pathway Not found in KEGG or Patented Pathways Attractive Novel Pathways Successfully Identified •Two-step pathway identified with only one novel reaction •Maximum achievable yield to 3HP from glucose during anaerobic growth matches commercial pathway •Slightly reduced maximum achievable intracellular activity at which 3HP can be produced •Numerous other attractive candidates Are These Novel Reactions Feasible? Decarboxylation reaction of ketoacids PFOR (1.2.7.a) : pyruvate + CoA + Fd (ox) CO2+ acetyl-CoA + Fd (red) Generalized enzyme operators can act on all of the above keto acids to give their corresponding products Can the enzyme that catalyzes decarboxylation of pyruvate perform catalysis of different substrates? Explore Novel Reactions Using Molecular Modeling • Substrate binding Docking analysis • Ability to form initial enzyme-substrate bound species with no distortion to the active site of the enzyme or the cofactor QM/MM structural studies • Follow the reaction pattern of the native substrate Study of reaction mechanism using QM methods Enzyme Docking Results Scored using GLIDE PFOR Substrate 1.2.7 pyruvic acid -10.7 2-ketobutyric acid -11.63 2-ketoisovaleric acid -11.56 2-ketovaleric acid -11.31 2-keto-3-methylvaleric acid -11.27 2-keto-4-methylpentanoic acid -11.01 phenylpyruvic acid X Enzyme Docking Poses 1 2 pyruvic acid 3 2-ketobutyric acid 2-ketoisovaleric acid 4 5 6 2-ketovaleric acid 2-keto-3-methylvaleric acid 2-keto-4-methylpentanoic acid Binding Using Quantum Mechanics/Molecular Mechanics MM part : 50 Å of active site and solvent molecules ~20,000 atoms QM part : 63 atoms Geometry : B3LYP/6-31G* Comparison of Bound Structures of Different Acids: QM/MM pyruvic acid 2-ketoisovaleric acid 2-keto-3-methylvaleric acid 2-keto-4-methylpentanoic acid QM/MM structural studies suggest that the binding of the substrates does not cause distortions to the active site Kinetics of Enzyme-Catalyzed Decarboxylation: Quantum Mechanics LThDP TS1 TS2 HEThDP enamine ThDP ylide + KA Ville et al., Nature Chemical Biology, 2(6), 2006, 324 Free Energy Surface of Thiamine-Catalyzed Decarboxylation: Pyruvic Acid gas phase Relative free energy ( Kcal/Mol) 15 water dichloro ethane TS 1 10 5 TS 2 0 -5 -10 -15 -20 -25 ThDP + pyruvic acid -30 Reaction coordinate LThDP enamine + CO2 Comparison of Thiamine-Catalyzed Decarboxylation C-C bond formation barrier 35 Energy barrier (kcal/mol) 30 25 20 15 10 5 0 Free energy barrier (∆Gactivation298K, DCE) C-C bond breaking barrier Exploring Novel Pathways and Molecules New routes to bioavailable species New molecules HO HO CO2H HO OH OH 1,3,4,5-TetrahydroxyCyclohexanecarboxylic acid O N H CO2H CO2H 3-[1-Carboxy-2-(1,4-dihydro-pyridin -3-yl)-ethoxy]-4-hydroxy-cyclohexa-1,5-dienecarboxylic acid Present in KEGG (Kyoto Encyclopedia of Genes and Genomes) NOT present in KEGG NOT present in CAS REGISTRY Migration to Biocatalytic Processes New biochemical routes to existing chemicals HO CO2H HO OH O 1,3,5-Trihydroxy-4-oxo-cyclohexane carboxylic acid NOT present in KEGG Present in CAS REGISTRY Acknowledgments Funding •Department of Energy •National Science Foundation Cyber-enabled Discovery and Innovation Collaborators •Vassily Hatzimanikatis •Chunhui Li •Chris Henry •Goran Krilov •Raj Assary