SUPPORTING INFORMATION Physics-Based Enzyme Design: Predicting Binding Affinity And Catalytic Activity Sarah Sirin,1 David A. Pearlman,2 and Woody Sherman2* 1 Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02140, United States 2 Schrödinger, Inc., 120 West 45th Street, New York, New York 10036, United States * Corresponding author: Phone: +1 212 295 5800, Fax: +1 212 295 5801, E-mail: woody.sherman@schrodinger.com. Figure S1. Step-by-step illustration of the computational workflow used to predict <Affinity> and rank designs. 1. Model native protein/enzyme with target substrate 2. 3. Generate ensemble of native-like conformations Generate mutant variants for each ensemble conformation Prepare crystal structure using Protein Preparation Wizard A. Add missing side chains and hydrogen atoms B. Determine protonation states using PROPKA C. Optimize hydrogen bonding network D. Minimize system using Impref Prepare substrate with LigPrep Determine substrate binding mode with Glide docking 1. 2. 3. 4. Minimize the system using Prime Add TIP3P waters and counter ions using System Builder Run metadynamics simulation for 5 ns using Desmond Cluster the production run trajectory using active site RMSD 1. Select ensemble using cluster centers 2. Filter unreactive poses (distance between reactive atoms) 1. Predict structures for enzyme variants using Residue Scanning A. Refine only the mutated residue B. Refine mutated residue and surrounding amino acids (i.e. residues within 5Å of the mutated residue) 1. For each mutant and wild-type ensemble A. Determine Boltzmann weighted affinity using MM-GBSA B. Compute <DAffinity> for each mutant Rank order variants based on <DAffinity> score Score and Rank 2. 2 Figure S2. DIG binder predictions: Scatter plot of calculated vs. experimental ligand-protein affinities for DIG binding protein variants. The Pearson correlation coefficient between the computed and experimental binding affinity is 0.84 (R2 = 0.70). (Amino acids within 5 Å of the mutated residues were refined.) 3 Figure S3. Predicted structures for DIG binder variants: Illustration of (a, d) 1S1Z, (b, e) DIG10.2 – pdb ID: 4J8T, and (c, f) DIG10.3 – pdb ID: 4J9A X-ray crystal structures. Substrate DIG is shown using ball and sticks, while the mutated residues are illustrated using sticks representation. The N- and C-terminal sequences were not defined in the X-ray crystal structures chain A. All-atom RMSD between 1S1Z and DIG10.2 (chain A) was 0.76 Å and was 0.83 Å between 1S1Z and DIG10.3 (chain A). DIG10.2 included substitutions at the following residues: 7, 10, 34, 37, 41, 61, 62, 64, 90, 99, 117, 119, 124 and 127, while DIG10.3 included substitutions at the following sites: 7, 10, 23, 34, 37, 41, 61, 62, 64, 90, 92, 99, 103, 105, 117, 119, 124 and 127. A B C D E F 4 Figure S4. DIG binder B-factors: Illustration of average backbone B-factors for DIG10.2 (4J8T) and DIG10.3 (4J9A) X-ray crystal structures. For simplicity only chain A is illustrated. 5 Figure S5. KE07 reaction distance: Illustration of key reactive distance in Kemp elimination. The labeled distance is averaged over 11 structures that extracted from MD simulation. Figure S6. KE07 predictions: Scatter plots of the calculated DAffinity and experimentally observed kinetics – KM (s-1), kcat (mM-1), and kcat/KM (s-1M-1) – for designed KE variants. (Amino acids within 5 Å of the mutated residues were refined.) <DAffinity> (kcal/mol) 6 Figure S7. -Gliadin peptidase analysis: Histogram analysis of (A) experientially determined fold improvement in protease activity, (B) number of concerted amino acid substitutions per protease variant. 38 variants had fold improvement in activity of less than 10 (red) and were grouped as inactive, while 57 has fold improvement in activity of greater than 10 and were grouped as active. 7 Figure S8. -Gliadin peptidase enrichment curve: Receiver operating characteristic (ROC) curves for illustrating the ability of DAffinity to discriminate between active vs. inactive kumamolisin variants, corresponding to reactant (blue) and transition states (green). An active variant was defined as an enzyme with greater than 10 fold experimental improvement towards PQ peptide compared with wildtype enzyme as measured in crude cell lysate. Only mutable amino acids within 5 Å of the mutated residues were refined. 8 Figure S9. -Gliadin peptidase predictions: Scatter plots of the calculated DAffinity to and experimentally observed improvement in enzyme activity for variants that were purified, sequenced and assayed for activity. The red horizontal line at -3 kcal/mol is drawn to illustrate the computational cut- <DAffinity> (kcal/mol) off used to classify predictions as “active”. 9 Table S1. DIG binder prediction correlations for different MD structures: List of individual correlation (R and R2) between the predicted DAffinity score and the experimental change in binding affinity for binding to steroid digoxigenin (DIG). 150 amino acid long protein with unknown function was used as the starting scaffold and the structures numbered 1 through 24 are representative wild-type conformations generated using MD. 0 Å refinement shell indicates that only the mutated residues were optimized. While a shell of 5 Å reflects the predictions when amino acids within this sphere were also refined. Structure # 1 Refinement Shell = 0 Å R R2 0.84 0.71 Refinement Shell= 5 Å R R2 0.38 0.14 2 0.87 0.76 0.65 0.42 3 0.78 0.61 0.72 0.51 4 0.90 0.81 0.79 0.63 5 0.94 0.89 0.96 0.92 6 0.87 0.76 0.81 0.66 7 0.86 0.74 0.89 0.79 8 0.93 0.86 0.89 0.79 9 0.91 0.83 0.91 0.83 10 0.93 0.86 0.83 0.69 11 0.88 0.77 0.83 0.69 12 0.91 0.82 0.82 0.68 13 0.94 0.89 0.40 0.16 14 0.92 0.85 0.39 0.15 15 0.90 0.81 0.78 0.61 16 0.89 0.79 0.92 0.85 17 0.92 0.84 0.63 0.40 18 0.85 0.72 0.83 0.69 19 0.85 0.72 0.83 0.70 20 0.94 0.89 0.83 0.68 21 0.89 0.79 0.65 0.42 22 0.90 0.80 0.77 0.60 23 24 Average Boltzmann weighted 0.93 0.94 0.90 0.90 0.87 0.89 0.80 0.80 0.59 0.93 0.75 0.84 0.34 0.86 0.59 0.70 10 Table S2. KE07 prediction correlations for different MD structures: List of individual person correlation (R) between the predicted DAffinity score and the experimentally determined Michaelis constant (KM), catalytic turnover (kcat) and catalytic efficiency (kcat/KM). Structures numbered 1 through 12 are representative structures generated using MD. (Amino acids within 5 Å of the mutated residues were refined.) Structure # 1 2 3 4 5 6 7 8 9 10 11 Average Boltzmann weighted ln(KM) (s-1) 0.08 0.06 0.12 0.28 0.10 0.08 0.16 0.11 0.03 0.32 0.42 ln(kcat) (mM-1) -0.63 -0.60 -0.61 -0.34 -0.62 -0.64 -0.62 -0.51 -0.56 0.17 0.54 0.16 0.14 -0.40 0.63 ln(kcat/KM) (s-1M-1) -0.73 -0.72 -0.75 -0.46 -0.74 -0.78 -0.75 -0.63 -0.65 0.09 0.48 -0.51 -0.76 11 Table S3. -Gliadin peptidase enrichments for different MD structures: List of individual area under the ROC curve (AUC) metric for predicting PQLP protease activity peptidase activity as measured in crude cell lysate, where the starting scaffold was kumamolisin in complex with PQLP peptide. Predictions were made using two reactive states, namely the reactant and transition state configurations. Predictions were made with two conditions, first where only the mutated residues are refined and second where both the mutated residue and amino acids within 5 Å of the mutated residues were refined. Side chain repacking (0 Å) Side chain repacking (5 Å) Structure # Reactant Transition State Reactant Transition State 1 0.34 0.30 0.37 0.3 2 0.33 0.40 0.33 0.46 3 0.29 0.45 0.34 0.41 4 0.36 0.64 0.32 0.64 5 0.79 0.45 0.77 0.37 6 0.33 0.65 0.34 0.42 7 0.30 0.71 0.34 0.67 8 0.32 0.44 0.37 0.31 9 0.37 0.30 0.35 0.31 10 0.47 0.31 0.53 0.29 11 0.36 0.32 0.34 0.31 12 0.36 0.35 0.41 0.34 13 0.75 0.35 0.73 0.31 14 0.71 0.73 0.73 0.73 15 0.41 0.50 0.48 0.43 16 0.66 0.32 0.71 0.38 Average Boltzmann weighted 0.45 0.45 0.47 0.42 0.37 0.69 0.38 0.70 12 Table S4. -Gliadin peptidase truth table: Truth table for identifying active vs. inactive enzyme variants of kumamolisin towards PQLP peptide, when the transition state structures were used. An active enzyme variant was defined as having greater than 10-fold experimental improvement in activity towards PQLP peptide compared with wild type. Any computational prediction with a calculated DAffinity of more than +/- 3 (1) kcal/mol was considered a positive hit. The sensitivity for accurately predicting an active enzyme variant is around 75 percent. (Amino acids within 5 Å of the mutated residues were refined.) Predicted Actual Active Not-active Active (38) TP: 28 (30) FN: 10 (8) Not-active (57) FP: 31 (35) TN: 26 (22) 13