Native folding of a 57-residue Protein in Explicit Solvent by All

advertisement
Supporting information:
Ab initio simulation of a 57-residue protein in explicit solvent
reproduces the native conformation in the lowest free-energy
cluster
Jinzen Ikebe,1 Daron M. Standley,2 Haruki Nakamura,3 and Junichi Higo3
1
Graduate School of Frontier Biosciences, Osaka University, Open Laboratories for Advanced
Bioscience and Biotechnology, 6-2-3 Furuedai, Suita, Osaka, 565-0874, Japan
2
Systems Immunology Lab, Immunology Frontier Research Center (IFReC), Osaka University, Suita,
Osaka, 565-0871, Japan
3




Institute for Protein Research, Osaka University, Suita, Osaka, 565-0871, Japan
Table: Terms and quantities used for analyses
------------------------------------------------------------------------------------------------Terms/quantities
Meaning
------------------------------------------------------------------------------------------------H1 and H2
Regions of residues 2-21 and 26-47, respectively
Core-region
Region of residues 2-47
300 K dataset
Thermodynamic ensemble of structures at 300 K
Pseudo distance
Structural dissimilarity of the core-region between structures
Pseudo distance between a cluster and the native structure
DNTV
Clustering
Average linkage clustering method to classify structures
MDS
Multidimensional scaling to generate a conformational space
NOE distance
Upper bound of atomic distance converted from an NOE signal
Reproduction
Ratio of reproduced NOE pairs to the experimentally obtained NOE pairs
ratio of NOE
distances
calc
Tolerance to judge if a computed NOE distance ( RNOE
) agrees with
RNOE
exp
calc
exp
an experimental NOE distance ( RNOE ): RNOE  RNOE  RNOE
Root mean square deviation of the backbone (N, C, and C atoms)
RMSDcore
for the core-region between a cluster and
 the native structure
Q value
Reproduction rate of the
contacts in a sampled structure
 inter-residual

Region(9-13)
Region of residues 9-13, which is a part of the H1 region
Number of residue-residue hydrophobic contacts between
Nhc1
Region(9-13) and the other protein regions at 300 K
1
Number of residue-residue hydrophobic contacts between
Region(9-13) and H2 at 300 K
Number of water molecules in the vicinity of Region(9-13) at 300 K
Nwat
-------------------------------------------------------------------------------------------------
Nhc2


Multidimensional scaling
To visualize relations among the obtained clusters, we constructed a conformational space with the
multidimensional scaling (MDS) method.1,2 Given N clusters, MDS assigns a location to each
cluster in an N -dimensional (i.e., full-dimensional) hyperspace, which can express the structural
dissimilarities among the clusters, as follows:First, we defined a matrix B , for which an element b

is an inner product among position vectors of clusters  and  in the hyperspace as

b 

1
d(,o)2  d(,o)2  d(,  )2  ,

2
(1)
where o represents the origin point set arbitrarily in the hyperspace, and d(,o) , d(,o) , and d(, )

denote pseudo distances between two of the three (, , and o). Definition of the pseudo distance
main text. 
The inter-cluster
d(i, j) between two structures i and j is given in equation 5of the 
pseudo distance d(, ) is defined as an average over the pseudo distances between structures

and
belonging to clusters 
. To compute d(,o) , we suppose a cluster consisting of only the origin
point. When
the origin is set to the geometrical center of the N clusters, equation 1 is transformed as


follows:

b

1 1 N
1 N
1 N N
2
2
2
2
   d ,r   ds,    2   d s,r  d ,   .
2 n r1
N s1
N r1 s1

(2)
 Here we assume a matrix X , for which the element xk is the kth coordinate of the th
cluster in the N -dimensional hyperspace. Then, the matrix B is decomposed as B = XX t , where X t

X , we performedan eigenvalue decomposition:
is the transpose of X . To determine



t
B OD2O t  ODOD  XX t ,


(3)
2

where D 2 is a diagonal matrix, of which the mth diagonal element is the mth eigenvalue of B , and O
is a matrix whose column m is the mth eigenvector of B . From equation 3, X is defined as the
  in the
 product of O and D . Finally, the th low of X represents the coordinates of 
the cluster

N -dimensional hyperspace. Picking three components,
which are 
assigned to the three largest

 the components are the 
eigenvalues,
coordinates of the cluster  projected in a 3D subspace, which

most dominate the feature of the cluster distribution.
Q value
The reproduction rate (Q value) of native inter-residual contacts (native contacts) was calculated as
follows: When the minimum heavy atomic distance rij between residues i and j ( i  j  3 ) was
smaller than 6.0 Å in an NMR model, the residual pair was noted a candidate of a native contact

 of
 two heavy atomic van

residue (NCR) pair. The space of 6.0 Å was considered
from the summation
der Waals radii (≈ 2.0 Å) and a constant value of 2.0 Å, which is smaller than the diameter of a water
molecule (≈ 3.0 Å). Then, candidates detected in more than two-thirds (= 14) of the 20 NMR models
were registered as NCR pairs. The number of NCR pairs in the core-region ( N NCR) was 143. When
rij for an NCR pair in a sampled structure was smaller than 6.5 Å, we judged that the native contact is
 enough to exclude a water
reproduced in the sampled structure. The space of 6.5 Å is still small

molecule from the inter-residual zones. Finally, Q was defined as Q  N snap N NCR , where N snap is the
number of native contacts reproduced in the sampled structure. We use the Q value as a measure of
reproduction of the native topology.


Correlation between charged residues
We analyzed a role of the electrostatic interactions between the H1 and H2 regions in the folding of
EPRS-R1. Recall that H1 and H2 are the N- and C-terminal helical regions, respectively, in the
native structure (Figure 1a in the main text). There are fifteen charged residues in the two regions.
First, we assigned charged sites to the N position for Lys, and the midpoint of two N, O, and O
atoms for Arg, Asp, and Glu, respectively. Second, we calculated the geometrical centers of H1 and
H2, denoted as CN and CC, respectively, by using the heavy-atomic positions. Then, we computed
two unit vectors, u N and u C, from a charged site to CN and CC, respectively, and computed the


3
inner product ( IP  u N  uC ) of the two unit vectors. Figure 1 supports understanding of the
procedure.

Figure 1. Schematic drawing to explain an inner product ( IP ). Gray and black cylinders are H1 and H2, respectively.
White circles CN and CC represent the geometrical centers of H1 and H2, respectively. Filled circle represents the charged
H1 or H2. Arrows represent unit vectors
site of a charged residue, which belongs to either
charged site to CN and CC, respectively. Then, the inner product IP is defined as
panels show situation where IP  0 and IP  0 , respectively.


u N and u C defined from the
IP  u N  uC . The left and right


 site locates
 in the zone between H1 and H2, the inner product takes on a negative
When the charged
value. We then defined the distance between H1 and H2 as
NN NC
dNC 
  d i, j 
i=1 j =1
N NN C
,
(3)
where NN (or NC) is the number of C atoms in H1 (or H2), and d(i,j) is the distance between the ith
and jth C atoms belonging to H1 and H2, respectively. To express the proximity of H1 and H2, the

CN-CC distance or the H1-H2 minimum distance is an alternative definition. However, dNC is an
appropriate quantity to integrate both the overall and local proximities.
A conformation is
characterized by a dNC value and fifteen inner products. Then, given a conformational
ensemble,

we analyzed the relation between dNC and the averaged inner products,
IP
d NC
, over
 that have a particular value of d .
conformations
NC
Figures 2a-d plot 
the inner products IP
d NC
 ionic charged sites,
between the fifteen
averaged over conformations in the 
300 K dataset at a given distance dNC. The inner products were

predominantly positive for all of the fifteen
sites. If the two helical regions were stabilized by

4
electrostatic interactions between oppositely charged amino acids in the two regions, the inner
products for such pairs must be predominantly negative. However, such correlations were not
observed. Thus, we conclude that the electrostatic interactions by the charged residues do not
significantly contribute to the attraction between the H1 and H2 regions.
Figure 2. Inner products IP
dNC. Four panels show IP
for ionic charged sites averaged over conformations that have a particular value of
d NC
d NC
of (a) positively charged sites in the H1 region, (b) negatively charged sites in H1, (c)

positively charged sites in the H2 region, and (d) negatively charged sites in H2.


Helix-helix dipole interaction
It is known that an -helix has an electric dipole moment resulting from local dipoles pointing along
peptide planes,3 and that proteins exploit these macroscopic dipoles in order to stabilize folds, bind
ligands, and form selective channels.4,5 The native structural topology of the current protein,
EPRS-R1, is a typical fold with two anti-parallel -helices, which may be stabilized by helical
dipoles. Thus, although there is no significant ionic charge distribution that would directly support
the folding of EPRS-R1 as shown above, the approach of the two anti-parallel helices may be
influenced by the anti-parallel helical dipoles.
Dependency of Nclust on Dthre
The resultant number of clusters Nclust depends on the threshold Dthre to quit the merger step, as

5



explained in the main text. Figure 3 plots the relation between Dthre and Nclust. We observed two
inflection points in the dependency of Nclust on Dthre : Dthre  20 and 32.
Thus, the structural
clustering of the 300 K dataset is executable with using 
the two thresholds.
As reported in the main

text, we obtained 20 structural clusters
with

 Dthre 32 where the largest cluster had a proportion of
34% of the 300 K dataset. In contrast, we obtained 107 clusters with Dthre  20, where the largest
cluster had a proportion of 5%. Thus,
 Dthre  20 provides no major cluster that determines the main
feature of the cluster distribution. We utilized Dthre  32 for
 the clustering in this study. We
examined other clustering methods
such as single-linkage and complete-linkage methods. Both

methods provided qualitatively similar results
 to the current one with modulating the Dthre value.

Figure 3. Dependency of
Nclust on Dthre. Red open circles are raw data from the clustering, and blue broken lines are fit
line on the raw data.


References
1. Yeh, I.C., Lee, M.S., and Olson, M.A. (2008) Calculation of protein heat capacity from
replica-exchange molecular dynamics simulations with different implicit solvent models. J Phys
Chem 112: 15064-15073.
2. Cheung, M.S., Garcia, A.E., and Onuchic, J.N. (2002) Protein folding mediated by solvation:
water expulsion and formation of the hydrophobic core occur after the structural collapse. Proc
Natl Acad Sci USA 99: 685-690.
3. Wada, A. (1976) The alpha-helix as an electric macro-dipole. Adv Biophys: 1.
4. Murata, K., Mitsuoka, K., Hirai, T., Walz, T., Agre, P., Heymann, J.B., Engel, A., and Fujiyoshi, Y.
(2000) Structural determinants of water permeation through aquaporin-1. Nature 407: 599-605.
5. Nakamura, H. (1996) Roles of electrostatic interaction in proteins. Q Rev Biophys 29: 1-90.
6
7
Download