prot24694-sup-0001-suppinfo01

advertisement
SUPPORTING INFORMATION
Physics-Based Enzyme Design: Predicting Binding Affinity And
Catalytic Activity
Sarah Sirin,1 David A. Pearlman,2 and Woody Sherman2*
1
Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts
02140, United States
2
Schrödinger, Inc., 120 West 45th Street, New York, New York 10036, United States
* Corresponding author: Phone: +1 212 295 5800, Fax: +1 212 295 5801,
E-mail: woody.sherman@schrodinger.com.
Figure S1. Step-by-step illustration of the computational workflow used to predict <Affinity> and rank
designs.
1.
Model native protein/enzyme with
target substrate
2.
3.
Generate ensemble of native-like
conformations
Generate mutant variants for each
ensemble conformation
Prepare crystal structure using Protein Preparation Wizard
A. Add missing side chains and hydrogen atoms
B. Determine protonation states using PROPKA
C. Optimize hydrogen bonding network
D. Minimize system using Impref
Prepare substrate with LigPrep
Determine substrate binding mode with Glide docking
1.
2.
3.
4.
Minimize the system using Prime
Add TIP3P waters and counter ions using System Builder
Run metadynamics simulation for 5 ns using Desmond
Cluster the production run trajectory using active site RMSD
1. Select ensemble using cluster centers
2. Filter unreactive poses (distance between reactive atoms)
1.
Predict structures for enzyme variants using Residue Scanning
A. Refine only the mutated residue
B. Refine mutated residue and surrounding amino acids
(i.e. residues within 5Å of the mutated residue)
1.
For each mutant and wild-type ensemble
A. Determine Boltzmann weighted affinity using MM-GBSA
B. Compute <DAffinity> for each mutant
Rank order variants based on <DAffinity> score
Score and Rank
2.
2
Figure S2. DIG binder predictions: Scatter plot of calculated vs. experimental ligand-protein affinities for
DIG binding protein variants. The Pearson correlation coefficient between the computed and
experimental binding affinity is 0.84 (R2 = 0.70). (Amino acids within 5 Å of the mutated residues were
refined.)
3
Figure S3. Predicted structures for DIG binder variants: Illustration of (a, d) 1S1Z, (b, e) DIG10.2 – pdb ID:
4J8T, and (c, f) DIG10.3 – pdb ID: 4J9A X-ray crystal structures. Substrate DIG is shown using ball and
sticks, while the mutated residues are illustrated using sticks representation. The N- and C-terminal
sequences were not defined in the X-ray crystal structures chain A. All-atom RMSD between 1S1Z and
DIG10.2 (chain A) was 0.76 Å and was 0.83 Å between 1S1Z and DIG10.3 (chain A). DIG10.2 included
substitutions at the following residues: 7, 10, 34, 37, 41, 61, 62, 64, 90, 99, 117, 119, 124 and 127, while
DIG10.3 included substitutions at the following sites: 7, 10, 23, 34, 37, 41, 61, 62, 64, 90, 92, 99, 103, 105,
117, 119, 124 and 127.
A
B
C
D
E
F
4
Figure S4. DIG binder B-factors: Illustration of average backbone B-factors for DIG10.2 (4J8T) and
DIG10.3 (4J9A) X-ray crystal structures. For simplicity only chain A is illustrated.
5
Figure S5. KE07 reaction distance: Illustration of key reactive distance in Kemp elimination. The labeled
distance is averaged over 11 structures that extracted from MD simulation.
Figure S6. KE07 predictions: Scatter plots of the calculated DAffinity and experimentally observed
kinetics – KM (s-1), kcat (mM-1), and kcat/KM (s-1M-1) – for designed KE variants. (Amino acids within 5 Å of
the mutated residues were refined.)
<DAffinity> (kcal/mol)
6
Figure S7.
-Gliadin peptidase analysis: Histogram analysis of (A) experientially determined fold
improvement in protease activity, (B) number of concerted amino acid substitutions per protease variant.
38 variants had fold improvement in activity of less than 10 (red) and were grouped as inactive, while 57
has fold improvement in activity of greater than 10 and were grouped as active.
7
Figure S8. -Gliadin peptidase enrichment curve: Receiver operating characteristic (ROC) curves for
illustrating the ability of DAffinity to discriminate between active vs. inactive kumamolisin variants,
corresponding to reactant (blue) and transition states (green). An active variant was defined as an
enzyme with greater than 10 fold experimental improvement towards PQ peptide compared with wildtype enzyme as measured in crude cell lysate. Only mutable amino acids within 5 Å of the mutated
residues were refined.
8
Figure S9. -Gliadin peptidase predictions: Scatter plots of the calculated DAffinity to and
experimentally observed improvement in enzyme activity for variants that were purified, sequenced and
assayed for activity. The red horizontal line at -3 kcal/mol is drawn to illustrate the computational cut-
<DAffinity> (kcal/mol)
off used to classify predictions as “active”.
9
Table S1. DIG binder prediction correlations for different MD structures: List of individual correlation (R
and R2) between the predicted DAffinity score and the experimental change in binding affinity for
binding to steroid digoxigenin (DIG). 150 amino acid long protein with unknown function was used as
the starting scaffold and the structures numbered 1 through 24 are representative wild-type
conformations generated using MD. 0 Å refinement shell indicates that only the mutated residues were
optimized. While a shell of 5 Å reflects the predictions when amino acids within this sphere were also
refined.
Structure #
1
Refinement Shell = 0 Å
R
R2
0.84
0.71
Refinement Shell= 5 Å
R
R2
0.38
0.14
2
0.87
0.76
0.65
0.42
3
0.78
0.61
0.72
0.51
4
0.90
0.81
0.79
0.63
5
0.94
0.89
0.96
0.92
6
0.87
0.76
0.81
0.66
7
0.86
0.74
0.89
0.79
8
0.93
0.86
0.89
0.79
9
0.91
0.83
0.91
0.83
10
0.93
0.86
0.83
0.69
11
0.88
0.77
0.83
0.69
12
0.91
0.82
0.82
0.68
13
0.94
0.89
0.40
0.16
14
0.92
0.85
0.39
0.15
15
0.90
0.81
0.78
0.61
16
0.89
0.79
0.92
0.85
17
0.92
0.84
0.63
0.40
18
0.85
0.72
0.83
0.69
19
0.85
0.72
0.83
0.70
20
0.94
0.89
0.83
0.68
21
0.89
0.79
0.65
0.42
22
0.90
0.80
0.77
0.60
23
24
Average
Boltzmann weighted
0.93
0.94
0.90
0.90
0.87
0.89
0.80
0.80
0.59
0.93
0.75
0.84
0.34
0.86
0.59
0.70
10
Table S2. KE07 prediction correlations for different MD structures: List of individual person correlation
(R) between the predicted DAffinity score and the experimentally determined Michaelis constant (KM),
catalytic turnover (kcat) and catalytic efficiency (kcat/KM).
Structures numbered 1 through 12 are
representative structures generated using MD. (Amino acids within 5 Å of the mutated residues were
refined.)
Structure #
1
2
3
4
5
6
7
8
9
10
11
Average
Boltzmann weighted
ln(KM) (s-1)
0.08
0.06
0.12
0.28
0.10
0.08
0.16
0.11
0.03
0.32
0.42
ln(kcat) (mM-1)
-0.63
-0.60
-0.61
-0.34
-0.62
-0.64
-0.62
-0.51
-0.56
0.17
0.54
0.16
0.14
-0.40
0.63
ln(kcat/KM) (s-1M-1)
-0.73
-0.72
-0.75
-0.46
-0.74
-0.78
-0.75
-0.63
-0.65
0.09
0.48
-0.51
-0.76
11
Table S3. -Gliadin peptidase enrichments for different MD structures: List of individual area under the
ROC curve (AUC) metric for predicting PQLP protease activity peptidase activity as measured in crude
cell lysate, where the starting scaffold was kumamolisin in complex with PQLP peptide. Predictions were
made using two reactive states, namely the reactant and transition state configurations. Predictions were
made with two conditions, first where only the mutated residues are refined and second where both the
mutated residue and amino acids within 5 Å of the mutated residues were refined.
Side chain repacking (0 Å)
Side chain repacking (5 Å)
Structure #
Reactant
Transition State
Reactant
Transition State
1
0.34
0.30
0.37
0.3
2
0.33
0.40
0.33
0.46
3
0.29
0.45
0.34
0.41
4
0.36
0.64
0.32
0.64
5
0.79
0.45
0.77
0.37
6
0.33
0.65
0.34
0.42
7
0.30
0.71
0.34
0.67
8
0.32
0.44
0.37
0.31
9
0.37
0.30
0.35
0.31
10
0.47
0.31
0.53
0.29
11
0.36
0.32
0.34
0.31
12
0.36
0.35
0.41
0.34
13
0.75
0.35
0.73
0.31
14
0.71
0.73
0.73
0.73
15
0.41
0.50
0.48
0.43
16
0.66
0.32
0.71
0.38
Average
Boltzmann
weighted
0.45
0.45
0.47
0.42
0.37
0.69
0.38
0.70
12
Table S4. -Gliadin peptidase truth table: Truth table for identifying active vs. inactive enzyme variants of
kumamolisin towards PQLP peptide, when the transition state structures were used. An active enzyme
variant was defined as having greater than 10-fold experimental improvement in activity towards PQLP
peptide compared with wild type. Any computational prediction with a calculated DAffinity of more
than +/- 3 (1) kcal/mol was considered a positive hit. The sensitivity for accurately predicting an active
enzyme variant is around 75 percent. (Amino acids within 5 Å of the mutated residues were refined.)
Predicted
Actual
Active
Not-active
Active (38)
TP: 28 (30)
FN: 10 (8)
Not-active (57)
FP: 31 (35)
TN: 26 (22)
13
Download