Structural Robustness Confers Evolvability in Proteins Mary M. Rorick Günter P. Wagner

advertisement
Complex Adaptive Systems —Resilience, Robustness, and Evolvability: Papers from the AAAI Fall Symposium (FS-10-03)
Structural Robustness Confers Evolvability in Proteins
Mary M. Rorick1,2
Günter P. Wagner2,3
1 Yale Department of Genetics, 2 Yale Department of Ecology and Evolutionary Biology, 3 Yale Systems Biology Institute
333 Cedar Street, P.O. Box 208005, New Haven, CT 06520-8005
mary.rorick@yale.edu
ber of system elements that are affected by a given perturbation (Bhattacharyya et al. 2006, Fontana 2002, Wagner
et al. 2007, Ancel and Fontana 2000, Kitano 2004, GP
Wagner 1996, Wagner & Altenberg 1996). Biological
systems are nonrandomly modular (Schlosser and Wagner
2004), and modularity seems to increase through evolutionary time (Bonner 1998). Theoretically, there are scenarios where modularity reduces evolvability (Hansen
2002, Griswold 2006, Ancel and Fontana 2000), but for the
most part it is thought that modularity facilitates adaptive
change (Wagner 2005; Gerhard and Kirschner 1997; Hartwell et al. 1999; Franz-Odendaal & Hall 2006; Yang 2001,
Beldade & Brakefeild 2003, Chen & Dokholyan 2006;
Pereira-Leal et al. 2006; Wagner & Altenberg 1996; Bhattacharyya et al. 2006, Cui et al. 2002; Bogarad & Deem
1999; Xia & Levitt 2002). The origin of modularity remains unclear (Gardner & Zuidema 2003; Lipson et al.
2002; Force et al. 2005; Misevic et al. 2006; Wagner et al.
2007; Lynch 2007). In this study we measure protein
structural modularity.
Protein designability and structural modularity can both
be indexed via simple structural features. These are, respectively, contact density (England and Shakhnovich
2003) and “helix/sheet density” (see below). We measure
these two robustness indices and an index for adaptive evolution for a dataset of 167 mammalian proteins with known
structure in order to look for an association between robustness and evolvability. We find that proteins with high
rates of adaptive evolution have higher contact density and
secondary structure density than proteins undergoing less
adaptive evolution. This pattern is consistent with the idea
that robust folds, being less constrained, accommodate
adaptive changes at a higher rate than low-robustness proteins, which are presumably more highly constrained.
In this paper we discuss evolvability as a consequence of
biological robustness. By “biological robustness”, we
mean robustness of individual fitness to mutational or environmental perturbation. Of course, evolvability is itself a
form of robustness as well. Life has persisted as an unbroken, branching lineage for over two billion years, and that
it has sustained itself for so long, through such a dramatic
diversity of environments, certainly constitutes evidence
for robustness— the origins of which are worth exploring.
This type of robustness, which is a feature of lineages, and
Abstract
Theory suggests that biological robustness allows for the
maintenance of fitness in the face of mutational change, and
to the extent that this mutational change translates to heritable phenotypic change, that biological robustness allows for
evolvability. However, empirical demonstrations that robustness promotes evolvability remain scant. This is in part
due to the difficulty of defining and measuring both evolvability and robustness in real biological systems. Here we
test whether protein structural robustness is associated with
the extent of adaptive change a protein experiences. We find
this to be the case for two forms of protein robustness—
designability and modularity, which we measure via contact
density and helix/sheet density, respectively. We interpret
this association to be primarily the result of reduced constraints on amino acid substitutions in highly designable
and/or modular proteins, resulting in less antagonistic pleiotropy and faster adaptation through natural selection.
Introduction
The extensive robustness of biological systems has long
fascinated biologists. While it can in theory stifle adaptation under certain circumstances (Draghi et al. 2010; Ancel
& Fontana 2000; Sumedha et al. 2007), robustness has
been shown to generally confer evolvability to living systems because it allows them to undergo innovative modification without losing functionality (A. Wagner 2005, A.
Wagner 2008). Robustness also serves to maintain high
fitness under conditions of random genetic and environmental change (Gibson and Wagner 2000; Meiklejohn and
Hartl 2002; A. Wagner 2005; G.P. Wagner et al. 1997).
The goal of this study is to test the prediction that robustness confers evolvability at the level of proteins. To do
this, we look for a statistical association between protein
structural robustness and the extent of protein adaptive
change. We assess two types of protein structural robustness: designability and modularity.
Protein designability is the number of protein sequences
that stably fold into a given structure, and this is a good
index for protein mutational robustness because it is directly related to the number of mutations a structure can
tolerate (Li et al. 1996). Modularity contributes to mutational and environmental robustness by limiting the num-
110
and here we interpret alpha helixes and beta sheets as these
modules. We can thus approximate the overall density of
functional modules in a protein by simply dividing the
number of helices and sheets (defined according to the
Dictionary of Protein Secondary Structure (Kabsch and
Sander 1983)) by the number of residues in the protein
structure. Of course, truly independent protein modules—
be they kinetic, thermodynamic or functional— are generally much larger than individual alpha helices or beta
sheets, and so our modularity index will be limited in that
it will only consider those evolutionary constraints which
fall within secondary structure features. Nevertheless, it is
probably true that a substantial proportion of a protein’s
evolutionary constraint relationships fall within individual
helices and sheets. The small size of secondary structure
modules is also important because the number of them
within a protein is a much more variable, and thus informative, than the number of larger entities, like domains. Also,
secondary structure features, unlike domains, can be ascertained reliably from basic structure data. We test the assumption that there is a correlation between the overall
number of helices and sheets in a protein and the overall
number of residues in the structure by plotting the two indices and assessing the Pearson correlation coefficient.
Our index for adaptive evolution measures the extent to
which directional selection, as compared to purifying and
neutral selection, affects a protein’s evolution. It is the
overall amount of adaptive evolution a protein experiences
through its evolutionary history among mammals, and it is
a function of both the underlying constraints to adaptive
change (its theoretical “evolvability”), and the extent to
which it is exposed to forces of directional selection. Thus,
it is more accurate to think of this as an index of realized
evolvability. For example, even under strong forces of
directional selection, high constraints (i.e., strong purifying
selection) can cause this index to be low, and in this sense
it gauges constraint architecture. At the same time, however, this index will be low if there are low levels of directional selection— even when amino acid substitutions are
unconstrained and the protein is theoretically very evolvable.
Our adaptive evolution index is importantly different
from the protein evolutionary rate indices used in many
comparative studies (e.g., Drummond et al. 2005; Bloom &
Adami 2003; Herbeck & Wall 2005; Bloom & Adami
2004; Lin et al. 2007; Fraser 2002; Bloom et al. 2006;
Chen & Dokholyan 2006; Bustamante et al. 2000). Our
index specifically measures the rate of substitutions that
occur through directional selection. Conventional evolutionary rate indices take into account all types of substitutions and, since neutral substitutions are so much more
common than adaptive ones, primarily reflect rates of neutral change. The ability for a protein to accommodate
adaptive amino acid substitutions may not be directly related to how easily the protein can accommodate neutral
amino acid substitutions, so these typical measures of evolutionary rate cannot serve as indices of evolvability, since
evolvability is defined as the ability to respond to direc-
which allows the persistence of life through long evolutionary time, is distinct from the individual-based type of
robustness primarily discussed in this paper.
Materials and Methods
Our experimental approach is to test whether proteins with
high levels of adaptive evolution are more structurally robust than proteins with low levels of adaptive evolution.
Our dataset consists of orthologous genes that code for
proteins with solved tertiary structures. For each protein in
the dataset, we first obtain two distinct measures of protein
robustness and one measure of adaptive evolution. The
first type of robustness we assess is designability. This is
the number of sequences that stably fold into a given structure. Designability is an important determinant of protein
mutational robustness (Li et al. 1997, Bloom et al. 2005).
Designability determines the rate at which stable folding
becomes less likely as random mutations accumulate
(Wilke et al. 2005, Bloom et al. 2005). It can be accurately
approximated from basic structure data, via contact density—a metric that has been shown to tightly correlate with
designability (England and Shakhnovich 2003, Bloom et
al. 2006).
Contact Density is the average number of contacts an
amino acid makes with other amino acids in the protein
(England and Shakhnovich 2003). High contact density
implies many favorable placements of strongly interacting
amino acids, which relax energy constrains on the rest of
the structure, thus allowing more sequences to fold into the
structure (England and Shakhnovich 2003). We determine
contact density by dividing the trace of the square of the
contact matrix by the number of residues in the protein
structure. A contact matrix is generated by using the
atomic coordinates of a protein database (PDB) structure
file. We use the Euclindean distances between -carbons
to construct a distance matrix D. Using a threshold of 8Å
to define “contact”, and excluding trivial contacts (defined
as those between residues that are separated by fewer than
two intervening residues in the sequence), we convert D to
a Boolean contact matrix C, where 1 represents “contact”
and 0 represents “no contact”. Contact density is the trace
of the square of C, divided by the number or residues in the
protein: Tr(C2)/N. Our specific methodological choices
represent a compromise between the methods of Liao et al.
(2005) who use -carbons and a contact threshold of 9Å,
Shakhnovich et al. (2005) who use -carbons and a threshold of 7.5Å, and Bau et al. (2006) who use -carbons and a
threshold of 8Å.
The second type of robustness we assess is protein
modularity, which we define as the density of structural
modules. In measuring protein modularity, our aim is to
gauge the consolidation of evolutionary constraints in the
protein structure. The independent units of evolutionary
change within a protein can be approximated through kinetic, thermodynamic and/or functional modules (see Copley et al. 2002 for a structural/folding perspective, and
Bhattacharyya et al. 2006 for a functional perspective)—
111
uted approximately randomly across different protein fold
types— i.e., that the robustness of a protein does not significantly influence the selective forces it experiences. We
test this assumption by looking for an association between
protein functional importance and robustness. We measure
functional importance by measuring the extent of purifying
selection acting on the protein, which is defined here as
0(0-1).
We perform multiple regression to tease apart the separate influences of designability and modularity on adaptive
evolution. We divide the dataset at the median value for
our adaptive evolution index and analyze the two halves
separately. We determine the quadratic best-fit functions
while constraining the functions to be equal to the median
value of adaptive evolution at the lowest observed levels of
the designability and modularity indices. We assess statistical significance of partial regression coefficients and
compare the magnitude of standardized partial regression
coefficients.
Gene compactness is the dominant factor determining
evolutionary rate in mammals, and gene essentiality is
among the distant, though nevertheless significant, factors
of secondary importance (Liao et al. 2006). We used the
definitions of gene compactness and gene essentiality that
Liao et al. (2006) show to be significantly correlated with
dN/dS, and we also analyze three other indices of gene
compactness: coding sequence (CDS) length, the total
length of the introns, and the relative length of the introns
(intron length divided by CDS length). To determine
whether it is necessary to control for gene compactness
when assessing the relationship between robustness and
adaptive evolution, we test whether any of the compactness
indices are significantly correlated with both our adaptive
evolution index and either of our robustness indices. To
determine whether it is necessary to control for gene essentiality, we assess whether there is a significant difference in
adaptive evolution level, contact density, or helix/sheet
density between essential versus nonessential proteins (i.e.,
those corresponding to essential versus nonessential
genes).
tional selection (Wagner and Altenberg 1996; Pigliucci
2008; A. Wagner 2008).
Specifically, our index for adaptive evolution is the proportion of sites adaptively evolving multiplied by the average rate of adaptive evolution at these sites. Estimates for
these numbers are obtained by analyzing the evolutionary
history of each protein. For each of the proteins in the
dataset, a site model implemented by Phylogentic Analysis
by Maximum Likelihood (PAML) 3.15 codeml (Yang
1997, Yang 2007) is used to analyze 25 mammalian
orthologs mapped to a known species phylogeny, to obtain
the maximum likelihood estimates of the proportions of
sites (0, 1 and 2) in each of three categories (0, 1
and 2), and the values themselves (where 0 is constrained to be <1, 1 is constrained to be 1, and 2 is left
unconstrained). We define the proportion of sites adaptively evolving as 2, and the rate of adaptive evolution at
these sites as 2-1, so our index of adaptive evolution is
2(2-1).
We obtain indices for contact density, helix/sheet density, and adaptive evolution for 167 distinct proteins within
the OrthoMaM database (Ranwez et al. 2007, accessed
February 2009). This dataset consists of all the proteins
for which there is sufficient structural information to determine contact density and helix/sheet density, and for
which orthologs all 25 species are available. The dataset is
broken up into categories based on the broadest hierarchical Gene Ontology categories for molecular function (The
Gene Ontology Consortium 2000), according to AmiGO
version 1.7 (using the GO database release from 2010-0508, Carbon et al. 2009). Within the dataset, there are 155
proteins that have binding activity, 87 that have catalytic
activity, 25 that have molecular transducer activity, 24 that
have transcriptional regulatory activity, 16 that have enzyme regulatory activity, 6 that have transporter activity, 5
that have structural molecule activity, 1 that has electron
carrier activity, and 5 with no known molecular function.
The average values for the robustness and adaptive evolution indices are assessed for each of the 8 molecular function subsets that have a sample size larger than 1. The
dataset is also broken up into two halves according to the
fraction of “structured” amino acids— i.e., those that are
part of an alpha helix or beta sheet.
To assess the relationship between protein robustness
and the level of adaptive evolution, each dataset is divided
into two equally sized groups according to the size of the
adaptive evolution index (dividing at the median value),
and then Student’s t test and Welch’s approximate t test are
used to identify any significant difference between the
means for either of the robustness indices. To assess
whether the variance in adaptive evolution is significantly
different for high versus low robustness proteins, the
dataset is divided into two equally sized groups according
to the size of either contact density or helix/sheet density
(dividing at the median value), and an F-ratio test is performed.
For the interpretation of our results we rely on the assumption that different selection regime types are distrib-
Results
In this study we test whether there is an association between protein structural robustness and adaptive change.
We gauge protein structural robustness by assessing two
distinct, yet not entirely independent, features of protein
structure: designability and modularity. Designability is
the total number of sequences that stably fold into a given
structure. Because it cannot be directly measured, we use
contact density, a simple physical feature of a protein that
is proportional to designability (England and Shaknovich
2003), as our index of designability. We use helix/sheet
density as our measure of protein structural modularity.
Unlike contact density, it is not a standard and well-studied
index, so we test the basic assumption that underlies this
index: i.e., that the overall number of helices and sheets
correlates with the number of residues in a protein (if this
112
with relatively low amounts of adaptive evolution (0.0768)
(p=0.00135 for Student’s t test and p=0.00135 for Welch’s
approximate t test, both of null hypothesis 21) (Figure
1b). Also, as in the case of designability, the levels of
adaptive evolution experienced by relatively modular proteins are significantly more variable than those experienced
by proteins with lower modularity (0.00657 as compared to
0.00224; p<<.0001 for F-ratio test of null hypothesis that
ratio between the variances is 0), regardless of whether
outlying datapoints are included or not (the difference in
variance is significant even if the two most outlying datapoints with respect to the adaptive evolution index are removed from both halves of the dataset) (Figure 2b). Together with the corresponding results for contact density,
this implies that high protein structural robustness is associated with greater variance in the rate of adaptive evolution experienced by a protein.
Because our indices for designability and modularity
correlate with one another to some extent (data not shown),
it is unclear whether they have independent effects on the
amount of adaptive evolution a protein experiences. We
therefore perform multiple regression to tease apart the
separate influences of designability and modularity on the
rate of adaptive evolution. The dataset is divided at the
median level of adaptive evolution, and the two halves are
analyzed separately. Quadratic fits to both halves of the
dataset are highly significant (ANOVA p<<.0001). However, the estimates of the individual partial regression coefficients—the parameters which describe how the robustness indices independently influence adaptive evolution—
were not significant in either case (Student’s t-test). Thus,
the relative statistical significance of the partial regression
coefficients cannot be used to exclude either designability
or modularity as a possible independent predictor of the
adaptive evolution index. Another way we can compare the
relative importance of designability and modularity at determining the adaptive evolution index is by calculating
unitless (and thus comparable) standardized partial regression coefficients. Interestingly, for the fits to both halves
of the dataset, the standardized partial regression coefficient for helix/sheet density is nearly 100 times greater in
magnitude than the standardized partial regression coefficient for contact density (regardless of the order in which
the two variables are added to the model). Therefore, even
though we cannot conclusively reject designability as a
independent predictor of the level of adaptive evolution,
these results do suggest that modularity is likely more important than designability in determining this. These results emphasize the value of including considerations of
modularity in studies of robustness, and the importance of
developing methods to quantify modularity in real biological systems. The concepts of robustness and modularity
are intimately intertwined, and at least so far, there is not a
good way of completely separating modularity from other
forms of robustness—conceptually or practically.
We also analyzed the dataset in subsets, according to the
molecular function of the proteins and according to
whether they are “structured” or “unstructured” (see Meth-
were not the case it would be inappropriate to normalize
for protein size by dividing by the number of residues in
the protein because we would be over-correcting for the
influence of protein size). We find that there is a highly
significant correlation between the number of helices and
sheets and the number of residues (data not shown).
We analyze our indices for robustness and adaptive evolution for 167 distinct mammalian proteins, with orthologs
from 25 species. We limit our sample to mammalian proteins to avoid confounding influences from comparing proteins with different phylogenetic histories. To assess
whether protein structural robustness has an influence on
protein evolvability, we test whether there is a positive
association between either of our robustness indices and
our index for adaptive evolution. The adaptive evolution
index is plotted as a function of both the designability and
modularity indices (Figures 1 and 2). The mean contact
density for this dataset is 5.12 with a standard deviation of
1.01, the mean helix/sheet density is 0.082 with a standard
deviation of 0.023, and the mean for the adaptive evolution
index is -0.00953 with a standard deviation of 0.0661. The
relationship between designability and adaptive evolution
reveals two interesting and significant patterns. First,
when the sample of proteins is divided into two equally
sized groups according to their adaptive evolution index
(less-than-median versus greater-than-median), we find
that the mean contact density of the group experiencing
relatively high adaptive evolution (5.30) is significantly
greater than the mean contact density of the group experiencing relatively low adaptive evolution (4.94) (p=0.0101
for Student’s t test and p=0.0100 for Welch’s approximate
t test, both of null hypothesis that 21) (Figure 1a). This
implies that proteins experiencing greater amounts of adaptive evolution are generally more designable, and thus,
robust than proteins undergoing less rapid adaptive change.
Another interesting pattern we observe is that high contact density is associated with greater variance in the
amount of adaptive evolution experienced by different proteins (Figure 2a). When the dataset is divided into two
equally sized groups according to contact density (dividing
at the median), the variance in adaptive evolution is significantly greater for highly contact dense proteins as compared to less contact dense proteins (0.00164 versus
0.00718) (p<<.0001 for F-ratio test of null hypothesis that
ratio between the variances is 0) (Figure 2a). Furthermore,
this difference in variance is not dependent on the outlying
datapoints: if the two most outlying datapoints with respect
to the adaptive evolution index are removed from both
halves of the dataset, there is still a significant difference
between the variances of the two halves. Thus, we observe
an increase in the variance of the adaptive evolution index
as contact density increases.
In order to analyze the relationship between modularity
and adaptive evolution, we perform the same tests as
above, but this time for helix/sheet density. We find that
the mean helix/sheet density of proteins with relatively
high amounts of adaptive evolution (0.0876) is significantly greater than the mean helix/sheet density of proteins
113
ods). While the mean contact density and helix/sheet density do differ between the various functional groups (data
not shown), we observe no significant differences among
these data subsets in regard to the relationship they reveal
between the robustness and adaptive evolution indices.
The limitations of protein designability and structural
modularity, as indicators of the extent of evolutionary constraint, is reflected in the fact that the patterns we report
above are considerably less pronounced for classes of proteins known to be highly unstructured (Garza, Ahmad and
Kumar 2009, Wright and Dyson 1999), and for the less
structured half of the dataset, implying that the structural
indices naturally fail to capture the relevant evolutionary
constraints for unstructured proteins.
Testing for potential confounding factors. To index
evolvability in proteins we measure the amount of adaptive
evolution a protein experiences. As mentioned above, in
using this index, we are assuming that high levels of adaptive evolution can be attributed at least partially to low
structural constraints (i.e., high evolvability) as opposed to
just high directional selection pressure. We test this assumption by looking for whether functional importance is
associated with robustness. If functionally important proteins— which we define to be those under strong purifying
selection— are generally more robust than less important
proteins, we would have to consider the possibility that our
indices for robustness and adaptive evolution correlate due
to recruitment of robust folds into important functional
roles or through gradual selection for increased robustness
in important proteins (robustness having many potential
adaptive benefits) (see discussion). We do not find any
association between our index for functional importance
and either of our indices for protein robustness.
According to a recent study by Ridout et al. (2010), unstructured sites (i.e. those which are not part of any secondary structure feature) are more likely to have high values. This poses a possible alternative explanation for our
observed association between the indices for modularity
and adaptive evolution: i.e., that it is just a trivial consequence of there being a greater proportion of unstructured
amino acids in highly modular proteins. This is especially
plausible since we also happen to find that proteins with
higher proportions of unstructured sites (defined here as
those not within a helix or sheet) tend to have higher helix/sheet density (data not shown). However, we rule out
this alternative interpretation because our adaptive evolution index shows no association with the proportion of sites
within alpha helixes, beta sheets, or unstructured regions.
Our index of protein designability—contact density—
has been previously shown to correlate with protein length
(Bloom et al. 2006, Lipman et al. 2002), and we find this
correlation in our data also (data not shown). To rule out
the possibility that the association between contact density
and adaptive evolution (Figure 1a) is caused by a cocorrelation of both indices to protein length, we test for
whether there is any relationship between adaptive evolution and protein length. We find no significant correlation
between these two variables. Further, when we divide the
dataset into two groups (one comprised of those with lessthan-median protein length, and the other comprised of
those with greater-than-median protein length), we find no
significant difference in the level of adaptive evolution
between the two groups.
Liao et al. (2006) demonstrate that gene compactness
and gene essentiality are both important determinants of
the overall rate of mammalian protein evolution. To determine whether it is necessary to control for gene compactness when examining the relationship between protein
structural robustness and the amount of adaptive evolution
a protein experiences, we test whether gene compactness
indices co-correlate with our indices for protein robustness
and adaptive evolution. We found no co-correlations and
a.
DS
0.2
3
4
5
6
D
7
- 0.2
- 0.4
b.
DS
0.2
0.04
0.06
0.08
0.10
0.12
0.14
M
- 0.2
- 0.4
Figure 1. The amount of adaptive evolution a protein experiences
through its evolutionary history “DS” as a function of (a) the
designability index “D”, and (b) the modularity index “M”. The
color of the datapoints indicates whether they are part of the upper and lower half of the dataset with respect to the adaptive evolution index, divided at the median. The mean D or M of the
green datapoints is indicated by the upper red line, and the mean
D or M of the blue datapoints is indicated by the lower red line.
Both parabolic fits are highly significant in both a and b
(p<<.0001 according to ANOVA F-statistic).
114
tems can be explained by selection for evolvability, or
whether it has evolved for the sake of buffering mutational
and/or environmental noise (Meiklejohn and Hartl 2002,
Wagner 2005, Ancel and Fontana 2000, Wagner et al.
1997, de Visser et al 2003, Hartl and Taubes 1996). Investigation into this question is stymied by the fact that there
is scant empirical evidence that robustness is a biologically
significant determinant of evolvability, and there is difficulty in defining and measuring robustness in real biological systems. Here we use one established index of protein
robustness (contact density as a measure of designability)
and another robustness index of our own design (helix/sheet density as a measure of structural modularity) to
test whether robustness is associated with evolvability in
proteins. Prior to this study we knew little about the distribution of helix/sheet density across different proteins, but
previous work had already established that contact density
is a determinant of protein family size (Shakhnovich et al.
2005), functional diversity (Ferrada and Wagner 2008),
and overall evolutionary rate (dN) in yeast (Bloom et al.
2006). These studies provide some indication that contact
density contributes to reduced constraints and possibly
evolvability. However, Bloom et al. (2006) could not fully
disentangle the effects of contact density and protein length
on dN, so it is possible that contact density only correlates
with dN through co-correlation with protein length, or
some other unmeasured factor (such as modularity, for
example). Furthermore, these studies do not infer evolvability by measuring the amount of adaptive evolution as
we do here. Instead they use protein family size, functional diversity and dN, which are all influenced by more
factors than the two which contribute to our index for
evolvability (i.e., the extent of adaptive constraints and
directional selection strength).
we find only two significant negative correlations among
all the tests we perform: between CDS length and contact
density and between CDS length and helix/sheet density
(before correcting for multiple tests, p<<0.001 and 0.047,
respectively). Because CDS length does not also negatively correlate with the adaptive evolution index, we conclude that CDS length cannot be responsible for the observed association between protein structural robustness
and adaptive evolution. To determine whether it is necessary to control for gene essentiality, we assess whether
there is a significant difference in contact density, helix/sheet density, or the adaptive evolution index between
proteins corresponding to essential versus nonessential
genes. We find no significant differences among these
comparisons (with the significance cut-off set to p=0.05
before correcting for multiple tests). Therefore, we conclude that gene essentiality is not likely to be a confounding factor.
a.
DS
0.2
3
4
5
6
7
D
- 0.2
- 0.4
b.
DS
Two forms of protein structural robustness and their
effects on evolvability. Here we address whether robustness contributes to evolvability in proteins. We consider
two forms of structural robustness. We hypothesize that
high values for either of these should reflect low structural
constraints and high evolvability. Therefore, if structural
robustness confers evolvability, and assuming different
selection regimes are distributed approximately randomly
among different protein folds, then we expect to find an
association between high robustness and high amounts of
adaptive evolution. Indeed, this is what we find. Specifically, we find that proteins with high amounts of adaptive
evolution are more robust than proteins with lower
amounts of adaptive evolution. We test whether the differences in adaptive evolution for different proteins can be
attributed to differences in constraint architecture (evolvability) as opposed to differences in selection regime by
looking for whether there is any association between robustness and protein functional importance. It is important
that we look for this to rule out two alternative interpretations.
0.2
0.04
0.06
0.08
0.10
0.12
0.14
M
- 0.2
- 0.4
Figure 2. The amount of adaptive evolution a protein experiences
through its evolutionary history “DS” as a function of (a) the
designability index “D”, and (b) the modularity index “M”. The
variance of the dark red datapoints is significantly larger than the
variance of the light blue datapoints for both a and b.
Discussion
From a theoretical standpoint, a system must be robust to
be evolvable by natural selection. And yet, it remains unclear whether the ubiquity of robustness in biological sys-
115
ence” in Ancel and Fontana 2000). However the lack of an
observable association between robustness and the purifying selection/functional importance index indicates that
these alternative mechanisms play a relatively weak role at
best, and therefore, that strength of stabilizing selection
does not explain variation in robustness. Instead, we conclude that the observed association between protein structural robustness and adaptive evolution is primarily the
result of faster adaptive evolution in robust proteins, as a
consequence of lower structural constraints.
The first is that, in the long term, robust protein folds—
being more evolvable—end up being recruited into functional roles which demand high levels of evolvability because they are good at tolerating shifting selection pressures. In other words, it is possible that highly robust proteins are predisposed to biological roles where adaptive
changes are frequent, and that protein robustness persists
through association with these adaptive changes. This
would constitute a mechanism of fold selection for evolvability (England, Shakhnovich and Shakhnovich 2003,
Taverna & Goldstein 2000). Under this mechanism we
would expect to find an association between protein robustness and functional importance, so because we do not
find this, we rule it out. Furthermore, a theoretical point
limits the likelihood of this mechanism. Evolvability and
mutational robustness are traits of the genotype-phenotype
mapping functions, and thus, only subject to selective
forces indirectly— though association with organismal
traits with direct fitness effects. Theoretical work has demonstrated that such second-order selection is easily overwhelmed by first-order selection unless the population size
and/or the genomic mutation rate is very high (e.g., Earl &
Deem 2004, van Nimwegen et al. 1999, Crutchfield and
Huynen 1999, Meiklejohn & Hartl 2002, Wagner 2005).
The relationship between protein structural modularity
and designability. There are some other minor conclusions that can be drawn from this work. By quantifying
both protein designability and modularity we have the opportunity to address how these indices relate to one another. The exact relationship between modularity and designability has not been thoroughly investigated in real proteins. All that is known is that, for lattice models, mutationally robust “prototype” sequences are characterized by
an overrepresentation of special sequence motifs that fold
in a context-insensitive manner—reminiscent of “folding
modules” (Cui et al. 2002). Also, Li et al. (2007) show
that modular “stabilizing fragments” can be recombined to
create highly robust chimeric proteins. Because we find
that contact density does not tightly correlate with helix/sheet density, we conclude that designability and modularity describe somewhat different information, at least as
indexed here.
The second possible interpretation is that strong directional
selection, which would be reflected as high levels of adaptive evolution, causes proteins to gradually evolve greater
robustness. Under this mechanism we would again expect
high robustness to evolve preferentially in the most functionally important proteins, and since we do not find an
association between robustness and functional importance,
we rule out this interpretation as well. Furthermore, from a
theoretical standpoint this interpretation is unwieldy to
begin with because robustness, at least as indexed here, is
unlikely to emerge through gradual evolution. Contact
density and helix/sheet density, as inherent features of the
protein fold, cannot evolve gradually because distinct protein structures are separated in sequence space by vast distances comprised almost entirely of unfoldable sequences
(Babajide et al. 2001). Hence, one of the basic requirements for adaptive evolution— that the trait can be changes
in a quasi-gradual way— is not fulfilled for either designability and modularity.
Robustness of unstructured proteins. It is important to
note that both designability and modularity are types of
structural robustness, and that structural constraints are
only good approximations of evolutionary constraints
where structure is essential for function. While this is true
for many proteins, there are some important exceptions.
For example, many transcription factor proteins only require structural stability at a small fraction of their amino
acids (Garza, Ahmad and Kumar 2009). Moreover, it has
been hypothesized that proteins without a rigid structure
achieve high robustness of function, and thus high evolvability, despite very low levels of structural robustness
(e.g., Brown et al. 2002). Our results indicate that structural constraints do not capture the relevant evolutionary
constraints for some classes of proteins in our dataset—
specifically, those which are relatively unstructured.
Therefore, our results support the idea that for some proteins proper function is not directly dependent on structural
stability, and in turn, that protein fitness cannot always be
approximated through measures of structural stability or
foldability. This is significant in light of the common assumption within the field of structural biology that structure equals function. However, because the great majority
of proteins with solved structures do rely on a rigid structure to perform their functions, we did not think that these
exceptions would cause enough of a problem to warrant
their exclusion from our dataset.
On the other hand, one reason to suspect that one or both
of the above alternative mechanisms may be playing a role
to some extent is that we find that high robustness is associated with greater variance between proteins in the
amount of adaptive evolution they experience (Figures 2a
and 2b). Both of the above alternatives provide a reason to
expect an association between high robustness and very
low values for adaptive evolution because they both would,
in theory, also promote the evolution of robustness under
conditions of strong purifying selection (protein structural
robustness translates to environmental as well as mutational robustness-- see the “designability principle” in
Wingreen, Li and Tang 2004, and “plastogenic congru-
116
Bau D, Martin AJM, Mooney C, Vullo A, Walsh I, Pollastri G
(2006) Distill: a suite of web servers for the prediction of one-,
two- and three-dimensional structural features of proteins. BMC
Bioinformatics 7: Article No. 402
Beldade P, Brakefield PM (2003) Concerted evolution and developmental integration in modular butterfly wing patterns. Evolution & Development 5(2):169-179
Bhattacharyya RP, Remenyi AR, Yeh BJ, Lim WA (2006) Domains, Motifs, and Scaffolds: The Role of Modular Interactions
in the Evolution and Wiring of Cell Signaling Circuits. Annual
Review of Biochemistry 75: 655-680
Bloom JD, Adami C (2003) Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in
protein-protein interactions data sets. BMC Evolutionary Biology
3: 21
Bloom JD, Adami C (2004) Evolutionary rate depends on number
of protein-protein interactions independently of gene expression
level: Response. BMC Evolutionary Biology 4:14
Bloom JD, Drummond DA, Arnold FH, Wilke CO (2006) Structural determinants of the rate of protein evolution in yeast. Molecular Biology and Evolution 23(9): 1751-1761
Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C
(2005) Thermodynamic prediction of protein neutrality. PNAS
102(3): 606-611
Bogarad LD, Deem MW (1999) A hierarchiacal approach to protein molecular evolution. PNAS USA 96: 2591-2595
Bonner JT (1988) The Evolution of Complexity. Princeton, NJ:
Princeton Univ. Press.
Brown CJ, Takayam S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK. 2002. Evolutionary rate heterogeneity in proteins with long disordered regions. J Mol Evol
55:104-10
Bustamante C, Townsend JP, Hartl DL (2000) Solvent accessibility and purifying selection within proteins of Escherichia coli and
Salmonella entericca. Mol Biol Evol 17(2):301-308
Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S,
AmiGO Hub, Web Presence Working Group. AmiGO: online
access to ontology and annotation data. Bioinformatics. Jan
2009;25(2):288-9.
Chen Y, Dokholyan NV (2006) The coordinated evolution of
yeast proteins is constrained by functional modularity. TRENDS
in Genetics 22(8): 416-419
Copley RR, Doerks T, Letunic I, Bork P (2002) Minireview:
Protein domain analysis in the era of complete genomes. FEBS
Letters 513: 129-134.
Crutchfield JP, Huynen M (1999) Neutral evolution of mutational
robustness. PNAS USA 96(17):9716-9720.
Cui Y, Wong WH, Bornberg-Bauer E, Chan HS (2002) Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes. PNAS
99(2): 809-814
de Visser AGM, Hermisson J, Wagner GP, et al. (2003) Perspective: Evolution and detection of genetic robustness. Evolution
57(9):1959-1972
Draghi JA, Parson TL, Wagner GP, Plotkin JB. Mutational robustness can facilitate adaptation. Nature 463(7279): 353-355
Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH
(2005) Why highly expressed proteins evolve slowly. PNAS 102:
14338-14343
The determinants of protein evolutionary rate. There
has been considerable research in the past several years
aiming to identify the important determinants of protein
evolutionary rate (dN or dN/dS). For the reasons stated
above, we believe that our index for adaptive evolution is
fundamentally different from these measures of predominantly neutral evolutionary change. Also, in this literature
studying the determinants of evolutionary rate, dN or
dN/dS is generally inferred from a comparison of only two
species, while our measure for adaptive evolution is inferred from a phylogeny of 25 species. Nevertheless, it is
certainly possible that constraints on neutral evolution to
some extent translate to constraints on adaptive evolution.
Therefore, we take into consideration the dominant factors
determining neutral evolutionary rate in order to verify that
none of these are in fact responsible for our observed association between protein robustness and adaptive evolution,
and we do not find any of them to be confounding. The
reason we do not look at gene expression level is because,
although it is the dominant factor determining protein evolutionary rate in bacteria (Rocha and Danchin 2004) and
yeast (Drummond et al. 2006, Zhang and He 2005), it
seems to have only a negligible role in determining the
evolutionary rate of mammalian proteins (Liao et al. 2006,
Vinogradov 2010).
In this study we limit our investigation to proteins from
the same clade to eliminate potential confounding effects
due to differences in phylogenetic structure between protein families from different groups. Because it has only
recently been elucidated that the determinants of mammalian protein evolutionary rate differ considerably from those
determining the rates in yeast and bacteria, our results are
of interest in that they shed some preliminary light on how
protein structure plays a role in determining the rate of at
least adaptive protein evolutionary change. The fact that
we use a dataset of mammalian proteins raises the question
of whether similar patterns would also be found in bacterial and fungal proteins.
Acknowledgements
The experimental work in the Wagner lab is supported by a
grant from the John Templeton Foundation (Grant number
12793). Some of the work leading up to this paper was
carried out while M.M.R. was funded by a NIH Training
Grant (Grant number 5T32GM007499-34). The views expressed in this paper do not necessarily reflect the views of
JTF or NIH.
References
Ancel LW and Fontana W (2000) Plasticity, evolvability, and
modularity in RNA. Journal of Experimental Zoology
288(3):242-283.
Babajide A, Farber R, Hofacker IL, Inman J, Lapedes AS, Stadler
PF (2001) Exploring protein sequence space using knowledgebased potentials. Journal of Theoretical Biology 212: 35-46
117
Lin Y-S, Hsu W-L, Hwang J-K, Li W-H (2007) Proportion of
solvent-exposed amino acids in a protein and rate of protein evolution. Mol. Biol. Evol. 24(4): 1005-1011
Lipman DJ, Souvorov A, Koonin EV, Panchenko AR, and Tatusova TA (2002) BMC Evolutionary Biology 2 20
Lipson H, Pollack JB, Suh NP (2002) On the origin of modular
variation. Evolution 56(8): 1549-1556
Lynch M (2007) The frailty of adaptive hypotheses for the origins
of organismal complexity. PNAS 104: 8597-8604.
Meiklejohn CD, Hartl DL (2002) A single mode of canalization.
TRENDS in Ecol & Evol 17(10): 468-473
Misevic D, Ofria C, Lenski RE (2006) Sexual reproduction reshapes the genetic architecture of digital organisms. Proc. R. Soc.
B 273: 457-464
Pereira-Leal JB, Levy ED, Teichmann SA (2006) The origins and
evolution of functional modules: lessons from protein complexes
Pigliucci, M (2008) Is evolvability evolvable? Nature Reviews
Genetics 9: 75-82
Ranwez V, Delsuc F, Ranwez S, Belkhir K, Tilak M, Douzery
EJP (2007) OrthoMaM: A database of orthologous genomic
markers for placental mammal phylogenetics. BMC Evolutionary
Biology 7: 241
Ridout KE, Dixon CJ, Filatov DA (2010) Positive selection differs between secondary structure elements in Drosophila. Genome Biology and Evolution 2010: 166-179.
Rocha EP, Danchin A. 2004. An analysis of determinants of
amino acids substitution rates in bacterial proteins. Mol Biol Evol
21:108-16
Schlosser G, Wagner GP (2004) Modularity in development and
evolution. Chicago: Univ of Chicago Press
Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E (2005) Protein structure and evolutionary history determine sequence space
topology. Genome Research 15(3): 385-392
Sumedha, Martin OC, Wagner A (2007) New structural variation
in evolutionary searches of RNA neutral networks. Biosystems
90: 475–485
Taverna DM, Goldstein RA (2000) The distribution of structures
in evolving protein populations. Biopolymers 53: 1-8
The Gene Ontology Consortium, 2000. Gene Ontology: tool for
the unification of biology. Nature Genetics 25: 25-29. The GO
ontology was accessed for use in this paper during April 2010.
Schlosser G, Wagner GP (2004) Modularity in development and
evolution. Chicago: Univ of Chicago Press
van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. PNAS USA 96: 9716-9720
Vinogradov A (2010) Systemic factors dominate mammal protein
evolution. Proc. R. Soc. B 277: 1403-14088
Wagner A (2008) Robustness and evolvability: a paradox resolved. Proc. R. Soc. B: 275: 91-100
Wagner A (2005) Robustness and Evolvability in Living Systems. Princeton, New Jersey: Princeton University Press p. 88-89
Wagner GP (1996) Homologues, natural kinds, and the evolution
of modularity. Americal Zoologist 36: 36-43
Wagner GP, Booth G, Bagheri-Chaichian H (1997) A population
genetic theory of canalization. Evolution 51:329-347
Wagner GP, Pavlicev M, Cheverud M (2007) The road to modularity. Nature Reviews Genetics 8: 921
Drummond DA, Raval A, Wilke CO. 2006. A single determinant
dominates the rate of yeast protein evolution. Mol Biol Evol
23:327-37
Earl DJ, Deem MW (2004) Evolvability is a selectable trait.
PNAS USA 101(32):11531-11536.
England JL, Shakhnovich BE, Shakhnovich EI (2003) Natural
selection of more designable folds: a mechanism for thermophilic
adaptation. PNAS 100(15): 8727-8731
England JL, Shakhnovich EI (2003) Structural determinants of
protein designability. Physical Review Letters 90 (21): 218101
Ferrada E, Wagner A (2008) Protein robustness promotes evolutionary innovations on large evolutionary time-scales. Proc. R.
Soc. B 275: 1595-1602
Fontana W. (2002) Modeling ‘evo-devo’ with RNA. BioEssays
Force A, Cresko WA, Pickett B, Proulx SR, Amemiya C, Lynch
M (2005) The origin of subfunctions and modular gene regulation. Genetics 170:433-446
Franz-Odendaal TA, Hall BK (2006) Modularity and sense organs in the blind cavefish, Astyanax mexicanus. Evolution &
Development 8(1): 94-100
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW
(2002) Evolutionary rate in the protein interaction network. Science 296: 750-752
Gardner A, Zuidema W (2003) Is evolvability involved in the
origin of modular variation? Evolution 57(6): 1448-1450
Garza AS, Ahmad N, Kumar R (2009) Role of intrinsically disordered protein regions/domains in transcriptional regulation. Life
Sciences 84: 189-193
Gerhart J, Kirschner M (1997) Cells, Embryos and Evolution,
Blackwell Science.
Griswold CK (2006) Pleiotropic mutation, modularity and evolvability. Evolution & Development 8(1): 81-93
Hansen TF (2002) Is modularity necessary for evolvability? Remarks on the relationship between pleiotropy and evolvability.
BioSystems 69: 83-94
Hartwell LH, Hopfield JJ, Leibler S, Murray AW (1999) From
molecular to modular cell biology. Nature 402 Supp: C47-C52
Herbeck JT, Wall DP (2005) Converging on a general model of
protein evolution. TRENDS in Biotechnology 23(10):485-487
Kabsch W, Sander C. 1983. Dictionary of protein secondary
structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577-637
Kitano H (2004) Biological Robustness. Nature Reviews Genetics
5: 826-837
Li Y, Drummond DA, Sawayama AM, Snow CD, Bloom JD,
Arnold FH (2007) A diverse family of thermostable cytochrome
P450s created by recombination of stabilizing fragments. Nature
Biotechnology 25(9): 1051-1056
Li H, Helling R, Tang C, Wingreen N (1996) Emergence of preferred structures in a simple model of protein folding. Science
273: 666-669
Liao B-Y, Scott NM, Zhang J (2006) Impacts of gene essentiality,
expression pattern, and gene compactness on the evolutionary
rate of mammalian proteins. Mol. Biol. Evol. 23(11):2072-2080
Liao H, Yeh W, Chiang D, Jernigan RL, Lustig B (2005) Protein
sequence entropy is closely related to packing density and hydrophobicity. Protein Engineering, Design & Selection 18(2):59-64
118
Wagner, GP and Altenberg L (1996) Complex adaptations and
the evolution of evolvability. Evolution 50: 967
Wagner GP, Pavlicev M, Cheverud JM (2007) The road to modularity. Nature Reviews Genetics 8: 921-931
Wilke CO, Bloom JD, Drummond DA, Raval A (2005) Predicting the tolerance of proteins to random amino acid substitution.
Biophysical Journal 89: 3714-3720
Wingreen, Li, Tang. “Designability and thermal stability of protein structures.” Polymer 45(2004) pp699-705.
Wright PE, Dyson HJ. (1999) Intrinsically unstructured proteins:
re-assessing the protein structure-function paradigm. Journal of
Molecular Biology 293:321-331
Xia Y, Levitt M (2002) Roles of mutation and recombination in
the evolution of protein thermodynamics. PNAS 99(16): 1038210387
Yang AS (2001) Modularity, evolvability, and adaptive radiations: a comparison of the hemi- and holometabolous insects.
Evolution & Development 3(2): 59-72
Yang, Z. 1997. PAML: a program package for phylogenetic
analysis by maximum likelihood
Computer Applications in BioSciences 13:555-556.
Yang, Z. 2007. PAML 4: a program package for phylogenetic
analysis by maximum likelihood. Molecular Biology and Evolution
24:
1586-1591
(http://abacus.gene.ucl.ac.uk/software/paml.html).
Zhang J, He X. 2005. Significant impact of protein dispensability
on the instantaneous rate of protein evolution. Mol Biol Evol
22:1147-55
119
Download