Complex Adaptive Systems —Resilience, Robustness, and Evolvability: Papers from the AAAI Fall Symposium (FS-10-03) Structural Robustness Confers Evolvability in Proteins Mary M. Rorick1,2 Günter P. Wagner2,3 1 Yale Department of Genetics, 2 Yale Department of Ecology and Evolutionary Biology, 3 Yale Systems Biology Institute 333 Cedar Street, P.O. Box 208005, New Haven, CT 06520-8005 mary.rorick@yale.edu ber of system elements that are affected by a given perturbation (Bhattacharyya et al. 2006, Fontana 2002, Wagner et al. 2007, Ancel and Fontana 2000, Kitano 2004, GP Wagner 1996, Wagner & Altenberg 1996). Biological systems are nonrandomly modular (Schlosser and Wagner 2004), and modularity seems to increase through evolutionary time (Bonner 1998). Theoretically, there are scenarios where modularity reduces evolvability (Hansen 2002, Griswold 2006, Ancel and Fontana 2000), but for the most part it is thought that modularity facilitates adaptive change (Wagner 2005; Gerhard and Kirschner 1997; Hartwell et al. 1999; Franz-Odendaal & Hall 2006; Yang 2001, Beldade & Brakefeild 2003, Chen & Dokholyan 2006; Pereira-Leal et al. 2006; Wagner & Altenberg 1996; Bhattacharyya et al. 2006, Cui et al. 2002; Bogarad & Deem 1999; Xia & Levitt 2002). The origin of modularity remains unclear (Gardner & Zuidema 2003; Lipson et al. 2002; Force et al. 2005; Misevic et al. 2006; Wagner et al. 2007; Lynch 2007). In this study we measure protein structural modularity. Protein designability and structural modularity can both be indexed via simple structural features. These are, respectively, contact density (England and Shakhnovich 2003) and “helix/sheet density” (see below). We measure these two robustness indices and an index for adaptive evolution for a dataset of 167 mammalian proteins with known structure in order to look for an association between robustness and evolvability. We find that proteins with high rates of adaptive evolution have higher contact density and secondary structure density than proteins undergoing less adaptive evolution. This pattern is consistent with the idea that robust folds, being less constrained, accommodate adaptive changes at a higher rate than low-robustness proteins, which are presumably more highly constrained. In this paper we discuss evolvability as a consequence of biological robustness. By “biological robustness”, we mean robustness of individual fitness to mutational or environmental perturbation. Of course, evolvability is itself a form of robustness as well. Life has persisted as an unbroken, branching lineage for over two billion years, and that it has sustained itself for so long, through such a dramatic diversity of environments, certainly constitutes evidence for robustness— the origins of which are worth exploring. This type of robustness, which is a feature of lineages, and Abstract Theory suggests that biological robustness allows for the maintenance of fitness in the face of mutational change, and to the extent that this mutational change translates to heritable phenotypic change, that biological robustness allows for evolvability. However, empirical demonstrations that robustness promotes evolvability remain scant. This is in part due to the difficulty of defining and measuring both evolvability and robustness in real biological systems. Here we test whether protein structural robustness is associated with the extent of adaptive change a protein experiences. We find this to be the case for two forms of protein robustness— designability and modularity, which we measure via contact density and helix/sheet density, respectively. We interpret this association to be primarily the result of reduced constraints on amino acid substitutions in highly designable and/or modular proteins, resulting in less antagonistic pleiotropy and faster adaptation through natural selection. Introduction The extensive robustness of biological systems has long fascinated biologists. While it can in theory stifle adaptation under certain circumstances (Draghi et al. 2010; Ancel & Fontana 2000; Sumedha et al. 2007), robustness has been shown to generally confer evolvability to living systems because it allows them to undergo innovative modification without losing functionality (A. Wagner 2005, A. Wagner 2008). Robustness also serves to maintain high fitness under conditions of random genetic and environmental change (Gibson and Wagner 2000; Meiklejohn and Hartl 2002; A. Wagner 2005; G.P. Wagner et al. 1997). The goal of this study is to test the prediction that robustness confers evolvability at the level of proteins. To do this, we look for a statistical association between protein structural robustness and the extent of protein adaptive change. We assess two types of protein structural robustness: designability and modularity. Protein designability is the number of protein sequences that stably fold into a given structure, and this is a good index for protein mutational robustness because it is directly related to the number of mutations a structure can tolerate (Li et al. 1996). Modularity contributes to mutational and environmental robustness by limiting the num- 110 and here we interpret alpha helixes and beta sheets as these modules. We can thus approximate the overall density of functional modules in a protein by simply dividing the number of helices and sheets (defined according to the Dictionary of Protein Secondary Structure (Kabsch and Sander 1983)) by the number of residues in the protein structure. Of course, truly independent protein modules— be they kinetic, thermodynamic or functional— are generally much larger than individual alpha helices or beta sheets, and so our modularity index will be limited in that it will only consider those evolutionary constraints which fall within secondary structure features. Nevertheless, it is probably true that a substantial proportion of a protein’s evolutionary constraint relationships fall within individual helices and sheets. The small size of secondary structure modules is also important because the number of them within a protein is a much more variable, and thus informative, than the number of larger entities, like domains. Also, secondary structure features, unlike domains, can be ascertained reliably from basic structure data. We test the assumption that there is a correlation between the overall number of helices and sheets in a protein and the overall number of residues in the structure by plotting the two indices and assessing the Pearson correlation coefficient. Our index for adaptive evolution measures the extent to which directional selection, as compared to purifying and neutral selection, affects a protein’s evolution. It is the overall amount of adaptive evolution a protein experiences through its evolutionary history among mammals, and it is a function of both the underlying constraints to adaptive change (its theoretical “evolvability”), and the extent to which it is exposed to forces of directional selection. Thus, it is more accurate to think of this as an index of realized evolvability. For example, even under strong forces of directional selection, high constraints (i.e., strong purifying selection) can cause this index to be low, and in this sense it gauges constraint architecture. At the same time, however, this index will be low if there are low levels of directional selection— even when amino acid substitutions are unconstrained and the protein is theoretically very evolvable. Our adaptive evolution index is importantly different from the protein evolutionary rate indices used in many comparative studies (e.g., Drummond et al. 2005; Bloom & Adami 2003; Herbeck & Wall 2005; Bloom & Adami 2004; Lin et al. 2007; Fraser 2002; Bloom et al. 2006; Chen & Dokholyan 2006; Bustamante et al. 2000). Our index specifically measures the rate of substitutions that occur through directional selection. Conventional evolutionary rate indices take into account all types of substitutions and, since neutral substitutions are so much more common than adaptive ones, primarily reflect rates of neutral change. The ability for a protein to accommodate adaptive amino acid substitutions may not be directly related to how easily the protein can accommodate neutral amino acid substitutions, so these typical measures of evolutionary rate cannot serve as indices of evolvability, since evolvability is defined as the ability to respond to direc- which allows the persistence of life through long evolutionary time, is distinct from the individual-based type of robustness primarily discussed in this paper. Materials and Methods Our experimental approach is to test whether proteins with high levels of adaptive evolution are more structurally robust than proteins with low levels of adaptive evolution. Our dataset consists of orthologous genes that code for proteins with solved tertiary structures. For each protein in the dataset, we first obtain two distinct measures of protein robustness and one measure of adaptive evolution. The first type of robustness we assess is designability. This is the number of sequences that stably fold into a given structure. Designability is an important determinant of protein mutational robustness (Li et al. 1997, Bloom et al. 2005). Designability determines the rate at which stable folding becomes less likely as random mutations accumulate (Wilke et al. 2005, Bloom et al. 2005). It can be accurately approximated from basic structure data, via contact density—a metric that has been shown to tightly correlate with designability (England and Shakhnovich 2003, Bloom et al. 2006). Contact Density is the average number of contacts an amino acid makes with other amino acids in the protein (England and Shakhnovich 2003). High contact density implies many favorable placements of strongly interacting amino acids, which relax energy constrains on the rest of the structure, thus allowing more sequences to fold into the structure (England and Shakhnovich 2003). We determine contact density by dividing the trace of the square of the contact matrix by the number of residues in the protein structure. A contact matrix is generated by using the atomic coordinates of a protein database (PDB) structure file. We use the Euclindean distances between -carbons to construct a distance matrix D. Using a threshold of 8Å to define “contact”, and excluding trivial contacts (defined as those between residues that are separated by fewer than two intervening residues in the sequence), we convert D to a Boolean contact matrix C, where 1 represents “contact” and 0 represents “no contact”. Contact density is the trace of the square of C, divided by the number or residues in the protein: Tr(C2)/N. Our specific methodological choices represent a compromise between the methods of Liao et al. (2005) who use -carbons and a contact threshold of 9Å, Shakhnovich et al. (2005) who use -carbons and a threshold of 7.5Å, and Bau et al. (2006) who use -carbons and a threshold of 8Å. The second type of robustness we assess is protein modularity, which we define as the density of structural modules. In measuring protein modularity, our aim is to gauge the consolidation of evolutionary constraints in the protein structure. The independent units of evolutionary change within a protein can be approximated through kinetic, thermodynamic and/or functional modules (see Copley et al. 2002 for a structural/folding perspective, and Bhattacharyya et al. 2006 for a functional perspective)— 111 uted approximately randomly across different protein fold types— i.e., that the robustness of a protein does not significantly influence the selective forces it experiences. We test this assumption by looking for an association between protein functional importance and robustness. We measure functional importance by measuring the extent of purifying selection acting on the protein, which is defined here as 0(0-1). We perform multiple regression to tease apart the separate influences of designability and modularity on adaptive evolution. We divide the dataset at the median value for our adaptive evolution index and analyze the two halves separately. We determine the quadratic best-fit functions while constraining the functions to be equal to the median value of adaptive evolution at the lowest observed levels of the designability and modularity indices. We assess statistical significance of partial regression coefficients and compare the magnitude of standardized partial regression coefficients. Gene compactness is the dominant factor determining evolutionary rate in mammals, and gene essentiality is among the distant, though nevertheless significant, factors of secondary importance (Liao et al. 2006). We used the definitions of gene compactness and gene essentiality that Liao et al. (2006) show to be significantly correlated with dN/dS, and we also analyze three other indices of gene compactness: coding sequence (CDS) length, the total length of the introns, and the relative length of the introns (intron length divided by CDS length). To determine whether it is necessary to control for gene compactness when assessing the relationship between robustness and adaptive evolution, we test whether any of the compactness indices are significantly correlated with both our adaptive evolution index and either of our robustness indices. To determine whether it is necessary to control for gene essentiality, we assess whether there is a significant difference in adaptive evolution level, contact density, or helix/sheet density between essential versus nonessential proteins (i.e., those corresponding to essential versus nonessential genes). tional selection (Wagner and Altenberg 1996; Pigliucci 2008; A. Wagner 2008). Specifically, our index for adaptive evolution is the proportion of sites adaptively evolving multiplied by the average rate of adaptive evolution at these sites. Estimates for these numbers are obtained by analyzing the evolutionary history of each protein. For each of the proteins in the dataset, a site model implemented by Phylogentic Analysis by Maximum Likelihood (PAML) 3.15 codeml (Yang 1997, Yang 2007) is used to analyze 25 mammalian orthologs mapped to a known species phylogeny, to obtain the maximum likelihood estimates of the proportions of sites (0, 1 and 2) in each of three categories (0, 1 and 2), and the values themselves (where 0 is constrained to be <1, 1 is constrained to be 1, and 2 is left unconstrained). We define the proportion of sites adaptively evolving as 2, and the rate of adaptive evolution at these sites as 2-1, so our index of adaptive evolution is 2(2-1). We obtain indices for contact density, helix/sheet density, and adaptive evolution for 167 distinct proteins within the OrthoMaM database (Ranwez et al. 2007, accessed February 2009). This dataset consists of all the proteins for which there is sufficient structural information to determine contact density and helix/sheet density, and for which orthologs all 25 species are available. The dataset is broken up into categories based on the broadest hierarchical Gene Ontology categories for molecular function (The Gene Ontology Consortium 2000), according to AmiGO version 1.7 (using the GO database release from 2010-0508, Carbon et al. 2009). Within the dataset, there are 155 proteins that have binding activity, 87 that have catalytic activity, 25 that have molecular transducer activity, 24 that have transcriptional regulatory activity, 16 that have enzyme regulatory activity, 6 that have transporter activity, 5 that have structural molecule activity, 1 that has electron carrier activity, and 5 with no known molecular function. The average values for the robustness and adaptive evolution indices are assessed for each of the 8 molecular function subsets that have a sample size larger than 1. The dataset is also broken up into two halves according to the fraction of “structured” amino acids— i.e., those that are part of an alpha helix or beta sheet. To assess the relationship between protein robustness and the level of adaptive evolution, each dataset is divided into two equally sized groups according to the size of the adaptive evolution index (dividing at the median value), and then Student’s t test and Welch’s approximate t test are used to identify any significant difference between the means for either of the robustness indices. To assess whether the variance in adaptive evolution is significantly different for high versus low robustness proteins, the dataset is divided into two equally sized groups according to the size of either contact density or helix/sheet density (dividing at the median value), and an F-ratio test is performed. For the interpretation of our results we rely on the assumption that different selection regime types are distrib- Results In this study we test whether there is an association between protein structural robustness and adaptive change. We gauge protein structural robustness by assessing two distinct, yet not entirely independent, features of protein structure: designability and modularity. Designability is the total number of sequences that stably fold into a given structure. Because it cannot be directly measured, we use contact density, a simple physical feature of a protein that is proportional to designability (England and Shaknovich 2003), as our index of designability. We use helix/sheet density as our measure of protein structural modularity. Unlike contact density, it is not a standard and well-studied index, so we test the basic assumption that underlies this index: i.e., that the overall number of helices and sheets correlates with the number of residues in a protein (if this 112 with relatively low amounts of adaptive evolution (0.0768) (p=0.00135 for Student’s t test and p=0.00135 for Welch’s approximate t test, both of null hypothesis 21) (Figure 1b). Also, as in the case of designability, the levels of adaptive evolution experienced by relatively modular proteins are significantly more variable than those experienced by proteins with lower modularity (0.00657 as compared to 0.00224; p<<.0001 for F-ratio test of null hypothesis that ratio between the variances is 0), regardless of whether outlying datapoints are included or not (the difference in variance is significant even if the two most outlying datapoints with respect to the adaptive evolution index are removed from both halves of the dataset) (Figure 2b). Together with the corresponding results for contact density, this implies that high protein structural robustness is associated with greater variance in the rate of adaptive evolution experienced by a protein. Because our indices for designability and modularity correlate with one another to some extent (data not shown), it is unclear whether they have independent effects on the amount of adaptive evolution a protein experiences. We therefore perform multiple regression to tease apart the separate influences of designability and modularity on the rate of adaptive evolution. The dataset is divided at the median level of adaptive evolution, and the two halves are analyzed separately. Quadratic fits to both halves of the dataset are highly significant (ANOVA p<<.0001). However, the estimates of the individual partial regression coefficients—the parameters which describe how the robustness indices independently influence adaptive evolution— were not significant in either case (Student’s t-test). Thus, the relative statistical significance of the partial regression coefficients cannot be used to exclude either designability or modularity as a possible independent predictor of the adaptive evolution index. Another way we can compare the relative importance of designability and modularity at determining the adaptive evolution index is by calculating unitless (and thus comparable) standardized partial regression coefficients. Interestingly, for the fits to both halves of the dataset, the standardized partial regression coefficient for helix/sheet density is nearly 100 times greater in magnitude than the standardized partial regression coefficient for contact density (regardless of the order in which the two variables are added to the model). Therefore, even though we cannot conclusively reject designability as a independent predictor of the level of adaptive evolution, these results do suggest that modularity is likely more important than designability in determining this. These results emphasize the value of including considerations of modularity in studies of robustness, and the importance of developing methods to quantify modularity in real biological systems. The concepts of robustness and modularity are intimately intertwined, and at least so far, there is not a good way of completely separating modularity from other forms of robustness—conceptually or practically. We also analyzed the dataset in subsets, according to the molecular function of the proteins and according to whether they are “structured” or “unstructured” (see Meth- were not the case it would be inappropriate to normalize for protein size by dividing by the number of residues in the protein because we would be over-correcting for the influence of protein size). We find that there is a highly significant correlation between the number of helices and sheets and the number of residues (data not shown). We analyze our indices for robustness and adaptive evolution for 167 distinct mammalian proteins, with orthologs from 25 species. We limit our sample to mammalian proteins to avoid confounding influences from comparing proteins with different phylogenetic histories. To assess whether protein structural robustness has an influence on protein evolvability, we test whether there is a positive association between either of our robustness indices and our index for adaptive evolution. The adaptive evolution index is plotted as a function of both the designability and modularity indices (Figures 1 and 2). The mean contact density for this dataset is 5.12 with a standard deviation of 1.01, the mean helix/sheet density is 0.082 with a standard deviation of 0.023, and the mean for the adaptive evolution index is -0.00953 with a standard deviation of 0.0661. The relationship between designability and adaptive evolution reveals two interesting and significant patterns. First, when the sample of proteins is divided into two equally sized groups according to their adaptive evolution index (less-than-median versus greater-than-median), we find that the mean contact density of the group experiencing relatively high adaptive evolution (5.30) is significantly greater than the mean contact density of the group experiencing relatively low adaptive evolution (4.94) (p=0.0101 for Student’s t test and p=0.0100 for Welch’s approximate t test, both of null hypothesis that 21) (Figure 1a). This implies that proteins experiencing greater amounts of adaptive evolution are generally more designable, and thus, robust than proteins undergoing less rapid adaptive change. Another interesting pattern we observe is that high contact density is associated with greater variance in the amount of adaptive evolution experienced by different proteins (Figure 2a). When the dataset is divided into two equally sized groups according to contact density (dividing at the median), the variance in adaptive evolution is significantly greater for highly contact dense proteins as compared to less contact dense proteins (0.00164 versus 0.00718) (p<<.0001 for F-ratio test of null hypothesis that ratio between the variances is 0) (Figure 2a). Furthermore, this difference in variance is not dependent on the outlying datapoints: if the two most outlying datapoints with respect to the adaptive evolution index are removed from both halves of the dataset, there is still a significant difference between the variances of the two halves. Thus, we observe an increase in the variance of the adaptive evolution index as contact density increases. In order to analyze the relationship between modularity and adaptive evolution, we perform the same tests as above, but this time for helix/sheet density. We find that the mean helix/sheet density of proteins with relatively high amounts of adaptive evolution (0.0876) is significantly greater than the mean helix/sheet density of proteins 113 ods). While the mean contact density and helix/sheet density do differ between the various functional groups (data not shown), we observe no significant differences among these data subsets in regard to the relationship they reveal between the robustness and adaptive evolution indices. The limitations of protein designability and structural modularity, as indicators of the extent of evolutionary constraint, is reflected in the fact that the patterns we report above are considerably less pronounced for classes of proteins known to be highly unstructured (Garza, Ahmad and Kumar 2009, Wright and Dyson 1999), and for the less structured half of the dataset, implying that the structural indices naturally fail to capture the relevant evolutionary constraints for unstructured proteins. Testing for potential confounding factors. To index evolvability in proteins we measure the amount of adaptive evolution a protein experiences. As mentioned above, in using this index, we are assuming that high levels of adaptive evolution can be attributed at least partially to low structural constraints (i.e., high evolvability) as opposed to just high directional selection pressure. We test this assumption by looking for whether functional importance is associated with robustness. If functionally important proteins— which we define to be those under strong purifying selection— are generally more robust than less important proteins, we would have to consider the possibility that our indices for robustness and adaptive evolution correlate due to recruitment of robust folds into important functional roles or through gradual selection for increased robustness in important proteins (robustness having many potential adaptive benefits) (see discussion). We do not find any association between our index for functional importance and either of our indices for protein robustness. According to a recent study by Ridout et al. (2010), unstructured sites (i.e. those which are not part of any secondary structure feature) are more likely to have high values. This poses a possible alternative explanation for our observed association between the indices for modularity and adaptive evolution: i.e., that it is just a trivial consequence of there being a greater proportion of unstructured amino acids in highly modular proteins. This is especially plausible since we also happen to find that proteins with higher proportions of unstructured sites (defined here as those not within a helix or sheet) tend to have higher helix/sheet density (data not shown). However, we rule out this alternative interpretation because our adaptive evolution index shows no association with the proportion of sites within alpha helixes, beta sheets, or unstructured regions. Our index of protein designability—contact density— has been previously shown to correlate with protein length (Bloom et al. 2006, Lipman et al. 2002), and we find this correlation in our data also (data not shown). To rule out the possibility that the association between contact density and adaptive evolution (Figure 1a) is caused by a cocorrelation of both indices to protein length, we test for whether there is any relationship between adaptive evolution and protein length. We find no significant correlation between these two variables. Further, when we divide the dataset into two groups (one comprised of those with lessthan-median protein length, and the other comprised of those with greater-than-median protein length), we find no significant difference in the level of adaptive evolution between the two groups. Liao et al. (2006) demonstrate that gene compactness and gene essentiality are both important determinants of the overall rate of mammalian protein evolution. To determine whether it is necessary to control for gene compactness when examining the relationship between protein structural robustness and the amount of adaptive evolution a protein experiences, we test whether gene compactness indices co-correlate with our indices for protein robustness and adaptive evolution. We found no co-correlations and a. DS 0.2 3 4 5 6 D 7 - 0.2 - 0.4 b. DS 0.2 0.04 0.06 0.08 0.10 0.12 0.14 M - 0.2 - 0.4 Figure 1. The amount of adaptive evolution a protein experiences through its evolutionary history “DS” as a function of (a) the designability index “D”, and (b) the modularity index “M”. The color of the datapoints indicates whether they are part of the upper and lower half of the dataset with respect to the adaptive evolution index, divided at the median. The mean D or M of the green datapoints is indicated by the upper red line, and the mean D or M of the blue datapoints is indicated by the lower red line. Both parabolic fits are highly significant in both a and b (p<<.0001 according to ANOVA F-statistic). 114 tems can be explained by selection for evolvability, or whether it has evolved for the sake of buffering mutational and/or environmental noise (Meiklejohn and Hartl 2002, Wagner 2005, Ancel and Fontana 2000, Wagner et al. 1997, de Visser et al 2003, Hartl and Taubes 1996). Investigation into this question is stymied by the fact that there is scant empirical evidence that robustness is a biologically significant determinant of evolvability, and there is difficulty in defining and measuring robustness in real biological systems. Here we use one established index of protein robustness (contact density as a measure of designability) and another robustness index of our own design (helix/sheet density as a measure of structural modularity) to test whether robustness is associated with evolvability in proteins. Prior to this study we knew little about the distribution of helix/sheet density across different proteins, but previous work had already established that contact density is a determinant of protein family size (Shakhnovich et al. 2005), functional diversity (Ferrada and Wagner 2008), and overall evolutionary rate (dN) in yeast (Bloom et al. 2006). These studies provide some indication that contact density contributes to reduced constraints and possibly evolvability. However, Bloom et al. (2006) could not fully disentangle the effects of contact density and protein length on dN, so it is possible that contact density only correlates with dN through co-correlation with protein length, or some other unmeasured factor (such as modularity, for example). Furthermore, these studies do not infer evolvability by measuring the amount of adaptive evolution as we do here. Instead they use protein family size, functional diversity and dN, which are all influenced by more factors than the two which contribute to our index for evolvability (i.e., the extent of adaptive constraints and directional selection strength). we find only two significant negative correlations among all the tests we perform: between CDS length and contact density and between CDS length and helix/sheet density (before correcting for multiple tests, p<<0.001 and 0.047, respectively). Because CDS length does not also negatively correlate with the adaptive evolution index, we conclude that CDS length cannot be responsible for the observed association between protein structural robustness and adaptive evolution. To determine whether it is necessary to control for gene essentiality, we assess whether there is a significant difference in contact density, helix/sheet density, or the adaptive evolution index between proteins corresponding to essential versus nonessential genes. We find no significant differences among these comparisons (with the significance cut-off set to p=0.05 before correcting for multiple tests). Therefore, we conclude that gene essentiality is not likely to be a confounding factor. a. DS 0.2 3 4 5 6 7 D - 0.2 - 0.4 b. DS Two forms of protein structural robustness and their effects on evolvability. Here we address whether robustness contributes to evolvability in proteins. We consider two forms of structural robustness. We hypothesize that high values for either of these should reflect low structural constraints and high evolvability. Therefore, if structural robustness confers evolvability, and assuming different selection regimes are distributed approximately randomly among different protein folds, then we expect to find an association between high robustness and high amounts of adaptive evolution. Indeed, this is what we find. Specifically, we find that proteins with high amounts of adaptive evolution are more robust than proteins with lower amounts of adaptive evolution. We test whether the differences in adaptive evolution for different proteins can be attributed to differences in constraint architecture (evolvability) as opposed to differences in selection regime by looking for whether there is any association between robustness and protein functional importance. It is important that we look for this to rule out two alternative interpretations. 0.2 0.04 0.06 0.08 0.10 0.12 0.14 M - 0.2 - 0.4 Figure 2. The amount of adaptive evolution a protein experiences through its evolutionary history “DS” as a function of (a) the designability index “D”, and (b) the modularity index “M”. The variance of the dark red datapoints is significantly larger than the variance of the light blue datapoints for both a and b. Discussion From a theoretical standpoint, a system must be robust to be evolvable by natural selection. And yet, it remains unclear whether the ubiquity of robustness in biological sys- 115 ence” in Ancel and Fontana 2000). However the lack of an observable association between robustness and the purifying selection/functional importance index indicates that these alternative mechanisms play a relatively weak role at best, and therefore, that strength of stabilizing selection does not explain variation in robustness. Instead, we conclude that the observed association between protein structural robustness and adaptive evolution is primarily the result of faster adaptive evolution in robust proteins, as a consequence of lower structural constraints. The first is that, in the long term, robust protein folds— being more evolvable—end up being recruited into functional roles which demand high levels of evolvability because they are good at tolerating shifting selection pressures. In other words, it is possible that highly robust proteins are predisposed to biological roles where adaptive changes are frequent, and that protein robustness persists through association with these adaptive changes. This would constitute a mechanism of fold selection for evolvability (England, Shakhnovich and Shakhnovich 2003, Taverna & Goldstein 2000). Under this mechanism we would expect to find an association between protein robustness and functional importance, so because we do not find this, we rule it out. Furthermore, a theoretical point limits the likelihood of this mechanism. Evolvability and mutational robustness are traits of the genotype-phenotype mapping functions, and thus, only subject to selective forces indirectly— though association with organismal traits with direct fitness effects. Theoretical work has demonstrated that such second-order selection is easily overwhelmed by first-order selection unless the population size and/or the genomic mutation rate is very high (e.g., Earl & Deem 2004, van Nimwegen et al. 1999, Crutchfield and Huynen 1999, Meiklejohn & Hartl 2002, Wagner 2005). The relationship between protein structural modularity and designability. There are some other minor conclusions that can be drawn from this work. By quantifying both protein designability and modularity we have the opportunity to address how these indices relate to one another. The exact relationship between modularity and designability has not been thoroughly investigated in real proteins. All that is known is that, for lattice models, mutationally robust “prototype” sequences are characterized by an overrepresentation of special sequence motifs that fold in a context-insensitive manner—reminiscent of “folding modules” (Cui et al. 2002). Also, Li et al. (2007) show that modular “stabilizing fragments” can be recombined to create highly robust chimeric proteins. Because we find that contact density does not tightly correlate with helix/sheet density, we conclude that designability and modularity describe somewhat different information, at least as indexed here. The second possible interpretation is that strong directional selection, which would be reflected as high levels of adaptive evolution, causes proteins to gradually evolve greater robustness. Under this mechanism we would again expect high robustness to evolve preferentially in the most functionally important proteins, and since we do not find an association between robustness and functional importance, we rule out this interpretation as well. Furthermore, from a theoretical standpoint this interpretation is unwieldy to begin with because robustness, at least as indexed here, is unlikely to emerge through gradual evolution. Contact density and helix/sheet density, as inherent features of the protein fold, cannot evolve gradually because distinct protein structures are separated in sequence space by vast distances comprised almost entirely of unfoldable sequences (Babajide et al. 2001). Hence, one of the basic requirements for adaptive evolution— that the trait can be changes in a quasi-gradual way— is not fulfilled for either designability and modularity. Robustness of unstructured proteins. It is important to note that both designability and modularity are types of structural robustness, and that structural constraints are only good approximations of evolutionary constraints where structure is essential for function. While this is true for many proteins, there are some important exceptions. For example, many transcription factor proteins only require structural stability at a small fraction of their amino acids (Garza, Ahmad and Kumar 2009). Moreover, it has been hypothesized that proteins without a rigid structure achieve high robustness of function, and thus high evolvability, despite very low levels of structural robustness (e.g., Brown et al. 2002). Our results indicate that structural constraints do not capture the relevant evolutionary constraints for some classes of proteins in our dataset— specifically, those which are relatively unstructured. Therefore, our results support the idea that for some proteins proper function is not directly dependent on structural stability, and in turn, that protein fitness cannot always be approximated through measures of structural stability or foldability. This is significant in light of the common assumption within the field of structural biology that structure equals function. However, because the great majority of proteins with solved structures do rely on a rigid structure to perform their functions, we did not think that these exceptions would cause enough of a problem to warrant their exclusion from our dataset. On the other hand, one reason to suspect that one or both of the above alternative mechanisms may be playing a role to some extent is that we find that high robustness is associated with greater variance between proteins in the amount of adaptive evolution they experience (Figures 2a and 2b). Both of the above alternatives provide a reason to expect an association between high robustness and very low values for adaptive evolution because they both would, in theory, also promote the evolution of robustness under conditions of strong purifying selection (protein structural robustness translates to environmental as well as mutational robustness-- see the “designability principle” in Wingreen, Li and Tang 2004, and “plastogenic congru- 116 Bau D, Martin AJM, Mooney C, Vullo A, Walsh I, Pollastri G (2006) Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics 7: Article No. 402 Beldade P, Brakefield PM (2003) Concerted evolution and developmental integration in modular butterfly wing patterns. Evolution & Development 5(2):169-179 Bhattacharyya RP, Remenyi AR, Yeh BJ, Lim WA (2006) Domains, Motifs, and Scaffolds: The Role of Modular Interactions in the Evolution and Wiring of Cell Signaling Circuits. Annual Review of Biochemistry 75: 655-680 Bloom JD, Adami C (2003) Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evolutionary Biology 3: 21 Bloom JD, Adami C (2004) Evolutionary rate depends on number of protein-protein interactions independently of gene expression level: Response. BMC Evolutionary Biology 4:14 Bloom JD, Drummond DA, Arnold FH, Wilke CO (2006) Structural determinants of the rate of protein evolution in yeast. Molecular Biology and Evolution 23(9): 1751-1761 Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C (2005) Thermodynamic prediction of protein neutrality. PNAS 102(3): 606-611 Bogarad LD, Deem MW (1999) A hierarchiacal approach to protein molecular evolution. PNAS USA 96: 2591-2595 Bonner JT (1988) The Evolution of Complexity. Princeton, NJ: Princeton Univ. Press. Brown CJ, Takayam S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK. 2002. Evolutionary rate heterogeneity in proteins with long disordered regions. J Mol Evol 55:104-10 Bustamante C, Townsend JP, Hartl DL (2000) Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella entericca. Mol Biol Evol 17(2):301-308 Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, AmiGO Hub, Web Presence Working Group. AmiGO: online access to ontology and annotation data. Bioinformatics. Jan 2009;25(2):288-9. Chen Y, Dokholyan NV (2006) The coordinated evolution of yeast proteins is constrained by functional modularity. TRENDS in Genetics 22(8): 416-419 Copley RR, Doerks T, Letunic I, Bork P (2002) Minireview: Protein domain analysis in the era of complete genomes. FEBS Letters 513: 129-134. Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. PNAS USA 96(17):9716-9720. Cui Y, Wong WH, Bornberg-Bauer E, Chan HS (2002) Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes. PNAS 99(2): 809-814 de Visser AGM, Hermisson J, Wagner GP, et al. (2003) Perspective: Evolution and detection of genetic robustness. Evolution 57(9):1959-1972 Draghi JA, Parson TL, Wagner GP, Plotkin JB. Mutational robustness can facilitate adaptation. Nature 463(7279): 353-355 Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH (2005) Why highly expressed proteins evolve slowly. PNAS 102: 14338-14343 The determinants of protein evolutionary rate. There has been considerable research in the past several years aiming to identify the important determinants of protein evolutionary rate (dN or dN/dS). For the reasons stated above, we believe that our index for adaptive evolution is fundamentally different from these measures of predominantly neutral evolutionary change. Also, in this literature studying the determinants of evolutionary rate, dN or dN/dS is generally inferred from a comparison of only two species, while our measure for adaptive evolution is inferred from a phylogeny of 25 species. Nevertheless, it is certainly possible that constraints on neutral evolution to some extent translate to constraints on adaptive evolution. Therefore, we take into consideration the dominant factors determining neutral evolutionary rate in order to verify that none of these are in fact responsible for our observed association between protein robustness and adaptive evolution, and we do not find any of them to be confounding. The reason we do not look at gene expression level is because, although it is the dominant factor determining protein evolutionary rate in bacteria (Rocha and Danchin 2004) and yeast (Drummond et al. 2006, Zhang and He 2005), it seems to have only a negligible role in determining the evolutionary rate of mammalian proteins (Liao et al. 2006, Vinogradov 2010). In this study we limit our investigation to proteins from the same clade to eliminate potential confounding effects due to differences in phylogenetic structure between protein families from different groups. Because it has only recently been elucidated that the determinants of mammalian protein evolutionary rate differ considerably from those determining the rates in yeast and bacteria, our results are of interest in that they shed some preliminary light on how protein structure plays a role in determining the rate of at least adaptive protein evolutionary change. The fact that we use a dataset of mammalian proteins raises the question of whether similar patterns would also be found in bacterial and fungal proteins. Acknowledgements The experimental work in the Wagner lab is supported by a grant from the John Templeton Foundation (Grant number 12793). Some of the work leading up to this paper was carried out while M.M.R. was funded by a NIH Training Grant (Grant number 5T32GM007499-34). The views expressed in this paper do not necessarily reflect the views of JTF or NIH. References Ancel LW and Fontana W (2000) Plasticity, evolvability, and modularity in RNA. Journal of Experimental Zoology 288(3):242-283. Babajide A, Farber R, Hofacker IL, Inman J, Lapedes AS, Stadler PF (2001) Exploring protein sequence space using knowledgebased potentials. Journal of Theoretical Biology 212: 35-46 117 Lin Y-S, Hsu W-L, Hwang J-K, Li W-H (2007) Proportion of solvent-exposed amino acids in a protein and rate of protein evolution. Mol. Biol. Evol. 24(4): 1005-1011 Lipman DJ, Souvorov A, Koonin EV, Panchenko AR, and Tatusova TA (2002) BMC Evolutionary Biology 2 20 Lipson H, Pollack JB, Suh NP (2002) On the origin of modular variation. Evolution 56(8): 1549-1556 Lynch M (2007) The frailty of adaptive hypotheses for the origins of organismal complexity. PNAS 104: 8597-8604. Meiklejohn CD, Hartl DL (2002) A single mode of canalization. TRENDS in Ecol & Evol 17(10): 468-473 Misevic D, Ofria C, Lenski RE (2006) Sexual reproduction reshapes the genetic architecture of digital organisms. Proc. R. Soc. B 273: 457-464 Pereira-Leal JB, Levy ED, Teichmann SA (2006) The origins and evolution of functional modules: lessons from protein complexes Pigliucci, M (2008) Is evolvability evolvable? Nature Reviews Genetics 9: 75-82 Ranwez V, Delsuc F, Ranwez S, Belkhir K, Tilak M, Douzery EJP (2007) OrthoMaM: A database of orthologous genomic markers for placental mammal phylogenetics. BMC Evolutionary Biology 7: 241 Ridout KE, Dixon CJ, Filatov DA (2010) Positive selection differs between secondary structure elements in Drosophila. Genome Biology and Evolution 2010: 166-179. Rocha EP, Danchin A. 2004. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:108-16 Schlosser G, Wagner GP (2004) Modularity in development and evolution. Chicago: Univ of Chicago Press Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E (2005) Protein structure and evolutionary history determine sequence space topology. Genome Research 15(3): 385-392 Sumedha, Martin OC, Wagner A (2007) New structural variation in evolutionary searches of RNA neutral networks. Biosystems 90: 475–485 Taverna DM, Goldstein RA (2000) The distribution of structures in evolving protein populations. Biopolymers 53: 1-8 The Gene Ontology Consortium, 2000. Gene Ontology: tool for the unification of biology. Nature Genetics 25: 25-29. The GO ontology was accessed for use in this paper during April 2010. Schlosser G, Wagner GP (2004) Modularity in development and evolution. Chicago: Univ of Chicago Press van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. PNAS USA 96: 9716-9720 Vinogradov A (2010) Systemic factors dominate mammal protein evolution. Proc. R. Soc. B 277: 1403-14088 Wagner A (2008) Robustness and evolvability: a paradox resolved. Proc. R. Soc. B: 275: 91-100 Wagner A (2005) Robustness and Evolvability in Living Systems. Princeton, New Jersey: Princeton University Press p. 88-89 Wagner GP (1996) Homologues, natural kinds, and the evolution of modularity. Americal Zoologist 36: 36-43 Wagner GP, Booth G, Bagheri-Chaichian H (1997) A population genetic theory of canalization. Evolution 51:329-347 Wagner GP, Pavlicev M, Cheverud M (2007) The road to modularity. Nature Reviews Genetics 8: 921 Drummond DA, Raval A, Wilke CO. 2006. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:327-37 Earl DJ, Deem MW (2004) Evolvability is a selectable trait. PNAS USA 101(32):11531-11536. England JL, Shakhnovich BE, Shakhnovich EI (2003) Natural selection of more designable folds: a mechanism for thermophilic adaptation. PNAS 100(15): 8727-8731 England JL, Shakhnovich EI (2003) Structural determinants of protein designability. Physical Review Letters 90 (21): 218101 Ferrada E, Wagner A (2008) Protein robustness promotes evolutionary innovations on large evolutionary time-scales. Proc. R. Soc. B 275: 1595-1602 Fontana W. (2002) Modeling ‘evo-devo’ with RNA. BioEssays Force A, Cresko WA, Pickett B, Proulx SR, Amemiya C, Lynch M (2005) The origin of subfunctions and modular gene regulation. Genetics 170:433-446 Franz-Odendaal TA, Hall BK (2006) Modularity and sense organs in the blind cavefish, Astyanax mexicanus. Evolution & Development 8(1): 94-100 Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296: 750-752 Gardner A, Zuidema W (2003) Is evolvability involved in the origin of modular variation? Evolution 57(6): 1448-1450 Garza AS, Ahmad N, Kumar R (2009) Role of intrinsically disordered protein regions/domains in transcriptional regulation. Life Sciences 84: 189-193 Gerhart J, Kirschner M (1997) Cells, Embryos and Evolution, Blackwell Science. Griswold CK (2006) Pleiotropic mutation, modularity and evolvability. Evolution & Development 8(1): 81-93 Hansen TF (2002) Is modularity necessary for evolvability? Remarks on the relationship between pleiotropy and evolvability. BioSystems 69: 83-94 Hartwell LH, Hopfield JJ, Leibler S, Murray AW (1999) From molecular to modular cell biology. Nature 402 Supp: C47-C52 Herbeck JT, Wall DP (2005) Converging on a general model of protein evolution. TRENDS in Biotechnology 23(10):485-487 Kabsch W, Sander C. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577-637 Kitano H (2004) Biological Robustness. Nature Reviews Genetics 5: 826-837 Li Y, Drummond DA, Sawayama AM, Snow CD, Bloom JD, Arnold FH (2007) A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments. Nature Biotechnology 25(9): 1051-1056 Li H, Helling R, Tang C, Wingreen N (1996) Emergence of preferred structures in a simple model of protein folding. Science 273: 666-669 Liao B-Y, Scott NM, Zhang J (2006) Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol. Biol. Evol. 23(11):2072-2080 Liao H, Yeh W, Chiang D, Jernigan RL, Lustig B (2005) Protein sequence entropy is closely related to packing density and hydrophobicity. Protein Engineering, Design & Selection 18(2):59-64 118 Wagner, GP and Altenberg L (1996) Complex adaptations and the evolution of evolvability. Evolution 50: 967 Wagner GP, Pavlicev M, Cheverud JM (2007) The road to modularity. Nature Reviews Genetics 8: 921-931 Wilke CO, Bloom JD, Drummond DA, Raval A (2005) Predicting the tolerance of proteins to random amino acid substitution. Biophysical Journal 89: 3714-3720 Wingreen, Li, Tang. “Designability and thermal stability of protein structures.” Polymer 45(2004) pp699-705. Wright PE, Dyson HJ. (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. Journal of Molecular Biology 293:321-331 Xia Y, Levitt M (2002) Roles of mutation and recombination in the evolution of protein thermodynamics. PNAS 99(16): 1038210387 Yang AS (2001) Modularity, evolvability, and adaptive radiations: a comparison of the hemi- and holometabolous insects. Evolution & Development 3(2): 59-72 Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood Computer Applications in BioSciences 13:555-556. Yang, Z. 2007. PAML 4: a program package for phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24: 1586-1591 (http://abacus.gene.ucl.ac.uk/software/paml.html). Zhang J, He X. 2005. Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol 22:1147-55 119