pro2751-sup-0001-suppinfo01

advertisement
Supplementary material
AWSEM
In the coarse-grained scheme of AWSEM, each amino acid residue of a protein is simplified
so as to be described by three beads: Cα, Cβ, and O atoms (glycine is an exception due to its lack of
a Cβ). Assuming an ideal geometry for the peptide bond, the positions of all the rest of atoms in the
backbone can be calculated so stereochemistry is quite accurate unlike a Cα only model. The
energy function of the standard AWSEM1 can be schematically written as follows.
VAWSEM = Vbackbone +Vnon-backbone +VFM
(S1)
In Eq. (S1), Vbackbone refers to the force field that maintains backbone geometries of protein chains
while Vnon-backbone includes physically motivated potentials that reflect the protein’s chemical
and/or physical properties in the context of protein secondary and/or tertiary interactions. These
two energy terms are completely transferable among different proteins. The term VFM (FM denotes
“fragment memory”) on the other hand uses the similarity in local sequence to encode local
structural tendencies using structures of peptide fragments available in the Protein Data Bank
(PDB). In the following, we shall describe the above energy terms in more detail.
Vbackbone consists of five terms, Vbackbone = Vcon + Vchain + Vχ + Vrama + Vexcl, which refer to terms
fixing the connectivity of the chain (Vcon), the bond angles around the Cα atom (Vchain), the chirality
for the correct orientations of the Cβ atoms (Vχ), the backbone dihedral angles (Vrama,
Ramachandran potential), and the excluded volume interaction (Vexcl), respectively. Note that Vcon
and Vchain are functions of the distances constrained by a combination of harmonic potentials.
Vnon-backbone includes three terms, Vnon-backbone = Vcontact + Vburial + Vhelical, which individually
1
represent different aspects of protein chemistry and physics. Vcontact refers to the contact
interactions of a protein’s tertiary fold. It is defined by specifying the Cβ-Cβ distances that are far
apart in sequence. The contact potential includes both direct and water (or protein) mediated
interactions between the residues. Vburial is a non-additive potential that considers the preference of
a residue to be buried inside the protein or to be exposed at the protein surface; this preference
depends on residue-type. Vhelical is associated with the propensity for forming helical structure,
which requires the formation of an explicit hydrogen bonding between carbonyl oxygen of residue
i and amide hydrogen of residue i+4 in the backbone. This helical propensity also depends on
residue-type being determined by the types of the residues participating in the bonding.
The use of VFM biases local structure to resemble the local structures of protein fragments
with closely related local sequence. This term thus uses a database of short fragments called the
“fragment memory” library. The strategy here has been used in various forms2 and resembles the
associative memory Hamiltonian of neutral network models.3 The fragment memory term takes
into account local steric effects of side chain packing that are modulated by the local sequence of
proteins. This form of associative memory term has been successfully combined with the
physically-based interactions to provide de novo structure prediction of protein tertiary fold.1, 2, 4, 5
When predicting tertiary structure using AWSEM, the quality of predictions depends on whether
global homologues are included in the fragment memory library. When homologues are available
and used in the code the tertiary prediction improve substantially.1 In this study, the structure
predictions of 12 α monomeric proteins were performed under the somewhat artificial constraint
of using a database excluding close homologues. This strategy then mimics the situation where a
2
truly novel fold is being encountered. The homologues excluded (HE) fragment memory library
(HE fragment library hereafter) refers to a database of selected sequences where no homologues
having more than 20% sequence identity with the target sequence are included. If one uses only the
information of the native protein itself as the memory, the folding landscape is strongly
biased/funneled towards this single structure, thereby this approach is called a “single fragment
memory simulation”. Under the premise that folding or binding is biased at the local level towards
the appropriate single structure. Here, in addition to mimicking de novo structure prediction by
using the homologues excluded library, we use single fragment memory AWSEM to pinpoint the
effects of physical forces on the folding and binding landscapes. Single fragment memory
AWSEM was used for both protein S6 and dimers since our aim here was to highlight electrostatic
effects on forming native structure. More detailed descriptions of all the energy terms mentioned
above can be found in the original AWSEM paper1 and the supplemental information therein.
Molecular dynamics simulations: simulated annealing and free energy calculations
All the simulations were carried out using the LAMMPS simulation package,6 in which the
AWSEM force field has been implemented.1 The simulation protocol used generates simulated
trajectories of protein folding and binding by solving numerically the equations of motion.
According to the ergodic hypothesis, the time average of a property is statistically equivalent to the
ensemble average of that property. Here, we adopted a canonical ensemble for the system using a
Nose-Hoover thermostat to control the temperature. To fold proteins, we performed simulated
annealing by gradually lowering the temperature for both the monomeric proteins and the dimers.
3
Simulated annealing gradually biases unfolded proteins towards their native structure by slowly
changing from high to low temperature. For each of the monomeric proteins and dimers, we
generated 30 simulated annealing trajectories. The simulation time step is 3 fs for the 12 α proteins
and 5 fs for both protein S6 and the dimers. For monomeric proteins using homologues excluded
fragment library of AWSEM (see Supplemental Material), we started from an extended
conformation at a temperature (600 K) well above the folding temperature of all proteins studied,
and the temperature was cooled to 300 K over 6 million steps (same annealing schedule for single
fragment memory calculation of protein S6) while for the protein dimers we started from their
native structures, first pulling the two monomers apart (25 Å for 1CTA and 1KDX; 60 Å for 1F36
and 1VKX) by applying a biasing force and then followed their motions by carrying out simulated
annealing (starting at 450 K to 300 K over 6 million steps) with a Langevin thermostat. Thus these
simulations document how well binding interfaces can be predicted by the force field.
Free energy profiles (i.e., potentials of mean force) for the dimers were calculated by
performing umbrella sampling using a harmonic biasing force applied against Q, defined in
Methods. Equilibrium data are then collected at a constant temperature and used to generate free
energy profiles using the weighted histogram analysis method (WHAM).7
Figures
4
Figure S1. Simulated annealing results (using single
fragment memory) of 1R69 and 3ICB with different
electrostatic strengths are plotted. The annealing
results (Qavg) are shown as a function of the annealing
index. Qavg is an average of Q over the last 100
snapshots from each annealing trajectory (a total of 30).
The corresponding native structures of the proteins are
also shown below the curves (see Fig. 1 for color
descriptions). (top) 1R69. No difference in annealing
profiles can be identified between annealing sets with
different εr values. (bottom) 3ICB. The simulations were conducted using single fragment memory
of AWSEM, which serves as a comparison used to verify the robustness in structure prediction.
The results here suggest electrostatics does not have substantial effect on the native structure and it
does not affect the quality of structure prediction.
5
(a)
WT vs CD/SC
1
WT
CD
SC
0.9
0.8
0.7
Qavg
0.6
0.5
0.4
0.3
1
5
10
15
20
Annealing index
25
30
(b)
Figure S2. Simulated annealing results for different charge variants of protein S6 (1RIS) are
plotted. The results (Qavg) are shown as a function of annealing index. (a) Comparison of wild-type
(WT), super-charged (SC) and charge-depleted (CD) variants. The native structure of the protein is
shown. The dependence of simulated annealing results on the electrostatic strength employed
(εr=∞~16.6) is shown in (b) for WT and (c) for CD/SC. The fraction of successful annealed events
6
shows the trend: WT >> CD > SC. This observation suggests that the folding stability of WT is
higher than CD/SC, consistent with Oliveberg’s results. Note that the results presented in (a) refer
to the wild-type (εr=33.2), charge-depleted (εr=∞), and the super-charged (εr=33.2) chosen from
(b).
7
References
1. Davtyan A, Schafer NP, Zheng W, Clementi C, Wolynes PG, Papoian GA (2012) AWSEM-MD:
protein structure prediction using coarse-grained physical potentials and bioinformatically
based local structure biasing. J Phys Chem B 116:8494-8503.
2. Friedrichs MS, Wolynes PG (1989) Toward protein tertiary structure recognition by means of
associative memory hamiltonians. Science 246:371-373.
3. Hopfield JJ (1984) Neurons with graded response have collective computational properties like
those of 2-state neurons. Proc Natl Acad Sci USA 81:3088-3092.
4. Papoian GA, Ulander J, Eastwood MP, Luthey-Schulten Z, Wolynes PG (2004) Water in protein
structure prediction. Proc Natl Acad Sci USA 101:3352-3357.
5. Hegler JA, Latzer J, Shehu A, Clementi C, Wolynes PG (2009) Restriction versus guidance in
protein structure prediction. Proc Natl Acad Sci USA 106:15302-15307.
6. Plimpton S (1995) Fast parallel algorithms for short-range molecular-dynamics. J Comput Phys
117:1-19.
7. Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM (1992) The weighted
histogram analysis method for free-energy calculations on biomolecules .1. the method. J
Comput Chem 13:1011-1021.
8
Download