Chakrabarti Group Overview of Research and Educational Initiatives CAPD Meeting March 11, 2013 Approaches to Molecular Design and Control Static Optimization Dynamic Control milliseconds, micrometers Control of Biochemical Reaction Networks Molecular Structure/Function Optimization: Enzyme Design picoseconds, nanometers [protein pic] femtoseconds, angstroms ms Coherent Control of Chemical Reaction Dynamics How enzymes work How to design them? What makes them optimal for catalysis, and how to improve? Problem: hyperastronomical sequence space Catalytic Mechanisms of Enzymes General acid/base Y159 Electrostatic stabilizer Lys65 Catalytic nucleophile Glu-299 Catalytic Nucleophile Ser62 DD-peptidase General acid/base Glu-200 b-gal The physics in the model: sequence optimization requires accurate energy functions and solvation models S-GB continuum solvation 10o resolution rotamer library (297 proteins) Xiang, Z. and Honig, B. (2001) J. Mol. Biol. 311: 421-430. Ghosh, A., Rapp, C.S. & Friesner, R.A. (1998) J. Phys Chem. B 102, 10983-10990. OPLS-AA molecular mechanics force field + Glidescore semiempirical binding affinity scoring function Friesner, R.A, Banks, J.L., Murphy, R.B., Halgren, T.A. et al. (2004) J. Med. Chem. 47, 1739-1749. Jacobson, M.P., Kaminski, G.A. Rapp, C.S. & Friesner, R.A. (2002) J. Phys. Chem. B 106, 11673-11680. A model fitness measure for enzyme sequence optimization slack variable N 1 N J seq Gbind seq ij rij ,hbond rij seq ij2 i 1 j i Enzyme-substrate binding affinity Catalytic constraint: interatomic distances rij < hbond dist • Minimize J over sequence space • Represent dynamical constraint with requirement that total energy of complex minimized for any sequence • Omits selection pressure for product release Computational sequence optimization correctly predicts most residues in ligand-binding sites and enzyme active sites Streptavidin kcal/mol Native –10.04 CO2- is covalent attachment site for biomolecules 9 / 10 residues predicted correctly in top 0.5 kcal/mol of sequences Chakrabarti, R., Klibanov, A.M. and Friesner, R.A. Computational prediction of native protein ligand-binding and enzyme active site sequences. PNAS, 2005. Computational active site optimization is structurally accurate to near-crystallographic resolution Rmsd to native (A) 1.2 1 0.8 0.6 0.4 0.2 0 Phe120 Asn161 Trp233 Arg285 Thr299 Ser326 Ser62 Lys65 Tyr159 From Enzyme Design to Bionetwork Control • Nature has also devised remarkable catalysts through molecular design / evolution • Maximizing kcat/Km of a given enzyme does not always maximize the fitness of a network of enzymes and substrates • More generally, modulate enzyme activities in real time to achieve maximal fitness or selectivity of chemical products The Polymerase Chain Reaction: An example of bionetwork control Nobel Prize in Chemistry 1994; one of the most cited papers in Science (12757 citations in Science alone) Produce millions of DNA molecules starting from one through temperature cycling Used every day in every Biochemistry and Molecular Biology lab ( Diagnosis, Genome Sequencing, Gene Expression, etc.) How to automate choice of temperature cycling protocols? Single Strand – Primer Duplex Extension D S1 S2 k1m , k2m DNA Melting DNA Melting Again S1 P1 S1P1 k11 ,k21 k1 ,k2 S2 P2 S2 P2 2 2 Primer Annealing ke ,k e SP E E.SP k n , k n E.SP N [ E.SP.N ] kcat E.D1 k n , k n E.D1 N [ E.D1.N ] kcat E.D2 . E.DN kcat E DNA ' k11t ,k21 t S1 S2 DNA 3/18/2016 School of Chemical Engineering, Purdue University 11 R. Chakrabarti and C.E. Schutt, Chemical PCR: Compositions for enhancing polynucleotide amplification reactions. US Patent 7.772.383, issued 8-10-10. R. Chakrabarti and C.E. Schutt, Compositions and methods for improving polynucleotide amplification reactions using amides, sulfones and sulfoxides: II. US Patent 7.276,357, issued 10-2-07. R.Chakrabarti and C.E. Schutt, US Patent 6,949,368, issued 9-27-05. Optimal Control of DNA Amplification Min CDNA t f C T (t ) st max DNA 2 dx f x, u dt x CS1 , CS2 ,.....CE .D1 .....CDNA Tr For N nucleotide template – 2N + 13 state equations Typically N ~ 103 R. Chakrabarti et al. Optimal Control of Evolutionary Dynamics, Phys. Rev. Lett., 2008 K. Marimuthu and R. Chakrabarti, Optimally Controlled DNA amplification, in preparation Optimal control of PCR 95 90 85 Cycle 1 Temperature in Deg C 80 Cycle 2 75 70 Geometric growth: after 15 cycles, DNA concentrations are 65 60 red – 4×10-10 M blue – 8×10-9 M green – 2×10-8 M 55 50 45 0 20 40 Annealing Time = 10 s 60 80 100 Time in Seconds Annealing time = 12 s 120 140 Annealing time = 15 s Chakrabarti Group Educational Initiatives: DecydEd • DecydEd is an online course consortium with a two-prong objective: 1. Offer online education in systems engineering to a broader community of students, researchers, and practitioners around the world 2. Deliver fully automated real-time decision-making tools which build upon the course material taught, to users for the first time • DecydEd envisions broadening awareness of the latest academic research in systems engineering, educating users on how to apply PSE tools to industrial applications that have traditionally not been addressed using such methods. DecydEd (cont’d) • DecydEd offers fully automated tools, based on the content covered in the courses, aimed at solving real-world engineering problems in a host of areas including 1. Systems Biology 2. Molecular Design 3. Financial Engineering • Target applications include protein engineering, catalyst design, biochemical reaction engineering • Funded by PMC Group, Inc PMC Group Global Operations Fully integrated group of companies involved in development, manufacture, marketing and sales of specialty, performance and fine chemicals. Among the world’s top chemical manufacturers in several of these areas. DecydEd Courses The DecydEd User Portal The DecydEd User portal provides a rich experience to registered students, including simulations, the ability to network with other users (using leading social media platforms), collaborating on homeworks, viewing lectures, and solving automatically graded homework exercises DecydEd Discussion Forum DecydEd’s expert panel currently consists of professors from top universities including CMU, the University of Chicago, the University of Toronto and the London School of Economics Students can ask questions and get advice from these experts on a wide range of topics while enrolled in the courses. DecydEd’s Decision Making Tools in Chemical and Biochemical Engineering •Molecular Design Example: Protein Engineering involves a high-dimensional search over the space of possible functional groups in an active site. •DecydEd’s automated protein optimization software will enable any molecular biologist to apply computational protein engineering techniques •Systems Biology Example: DNA sequencing involves the control of a biochemical reaction network through the choice of temperature profiles in the polymerase chain reaction (PCR). •DecydEd’s automated PCR control software will enable molecular biologists to apply systems biology in lab experiments through the website •Most practicing molecular biologists are not trained in the above methods and often do not have access to the latest tools DecydEd Industry Application Example: Computational Enzyme Design Design Computationally Input information System Output Target chemical Desired raw material Refine Experimentally Zymzyne™ Computational Design Process ~1000 potential candidates expected catalytic activity Zymzyne™ Experimental Optimization Existing synthetic pathways 1030 candidates screened Existing biocatalysts 500 candidates screened Optimized Biocatalyst Computational Enzyme Design: Enabling renewable chemical manufacturing Starches Plant oils Biomass DOE Top Value Added Renewable Chemicals 1,4 succinic, fumaric and malic acids 2,5 furan dicarboxylic acid 3 hydroxy propionic acid aspartic acid glucaric acid glutamic acid itaconic acid levulinic acid 3-hydroxybutyrolactone glycerol sorbitol xylitol/arabinitol Specialty chemicals Polymers Enzyme Design Models Protein structure Loop New algorithms for side chain optimization Sidechain Substrate binding Glidescore Pose sampling QM sequence refinement Classical Sequence Optimization (fixed ligand) Active site reshaping • scores desired loop against other low-energy excitations Reactive chemistry • for QM/MM refinement Calculating mutant enzyme of enzyme design • speeding up mutant reaction rates TS searches Classical Sequence Optimization (free ligand) • Hierarchical pose screening • Locates global seq/struct optima for a given active site/ligand comb • Estimates “designability” of active site (fixed backbone) DecydEd Molecular Design Decision-Making Example of screening focused library of sequence variants 3 permissible mutations identified by modeling at a target position 3 positions subject to mutagenesis 43 mutation combinations = 64 sequence variations Synthetic gene assembly and variant library construction via DNA synthesis 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.3 0.35 0.25 0.4 0.2 0.15 0.3 0.15 0.1 0.2 0.1 0.05 0.1 0.05 0.2 0 D A F R S N E Y H I L K N G T W V C 0 Biological selection of variant library 0 D A F R S N E Y H I L K N G T W V C D A F R S N E Y H I L K N G T W V New enzymes Improved catalytic turnover Altered substrate selectivity DecydEd Systems Biology Models f r S1 S2 D k ,k Reaction Equilibrium Information G k f / kr K exp RT Relaxation Time ΔG – From Nearest Neighbor Model Similar to the Time constant in Process Control 1 kr k f CS 1eq CS 2eq τ – Relaxation time (Theoretical/Experimental) Solve above equations to obtain rate constants K. Marimuthu and R. Chakrabarti, Sequence-Dependent Modeling of DNA Hybridization Kinetics: Deterministic and Stochastic Theory, in preparation DNA Amplification Control Problem and Cancer Diagnostics Wild Type DNA Mutated DNA DecidEd Systems Biology Decision-Making Example Feed the PCR State Equations Objective Function (noncompetitive, competitive) DecydEd launched its business platform, called The Academic Financial Trading Platform (AFTP) in November 2012, with engineering to follow in Summer 2013 The DecydEd Backend Technology •The DecydEd backend collects the latest simulation, optimization and estimation algorithms from the world’s top research centers •The DecydEd Model API is an application Programming Interface (API) supports integration of continuous influx of models with optimization and estimation algorithms. • Instructors from both academia and industry can contribute models built using standard modeling packages (e.g. AIMMS, GAMS) for use by DecydEd students •The backend employs MPI-based parallel computing that is massively scalable for large numbers of users with on-demand deployment of cloud instances •PMC Group plans to integrate open source mathematical programming and dynamic optimization libraries/solvers such as IPOPT, GLPK with the DecydEd backend “f”, linear objective function Energy constraint Can only have 1 rotamer at each position No “impossibles” allowed Nonlinear constraint term Possible collaborations to id the global optimum for fitness measure (w pairwise decomposability assumptions, reduced energy model)