Rapid molecular evolution across amniotes of the IIS/TOR network Suzanne E. McGaugha,1, Anne M. Bronikowskib,1, Chih-Horng Kuoc,1, Dawn M. Redingb,2, Elizabeth A. Addisb,3, Lex E. Flageld,e, Fredric J. Janzenb, and Tonia S. Schwartzf,1 a Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN 55108; bDepartment of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011; cInstitute of Plant and Microbial Biology, Academia Sinica, Taipei 11529, Taiwan; dDepartment of Plant Biology, University of Minnesota, Saint Paul, MN 55108; eMonsanto Company, Chesterfield, MO 63017; and fOffice of Energetics, School of Public Health, University of Alabama, Birmingham, AL 35294 Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved April 16, 2015 (received for review October 20, 2014) insulin signaling history and metabolic traits differ substantially between mammals and reptiles (11, 12), and the IIS/TOR network influences these traits (13–15). Within vertebrates, many IIS/TOR extracellular genes have evolved through gene duplication and thus are paralogs. Duplications of an insulin-like progenitor gene resulted in genes encoding insulin (INS) and insulin-like growth factors 1 and 2 (IGF1 and IGF2) (6). These paralogous hormones bind the similarly paralogous receptors, insulin receptor (INSR) and insulin-like growth factor 1 receptor (IGF1R), and this binding initiates the intracellular signaling cascade through insulin receptor substrate (IRS) and through the phosphatidylinositol 3-kinase (PI3K) and serine/threonine protein kinase intracellular nodes (Fig. S1) (5, 7). Repeated duplication of the gene encoding the ancestral IGF-binding proteins (IGFBP) resulted in six binding proteins that regulate bioavailability of the hormones (8). Generally, these receptors, hormones, and binding proteins maintain the ability for cross-talk, but binding affinities differ (16). An additional receptor, IGF2R, is a co-opted mannose-6 phosphate receptor that regulates IGF2 bioavailability for activating IIS/TOR Significance Comparative analyses of central molecular networks uncover variation that can be targeted by biomedical research to develop insights and interventions into disease. The insulin/insulin-like signaling and target of rapamycin (IIS/TOR) molecular network regulates metabolism, growth, and aging. With the development of new molecular resources for reptiles, we show that genes in IIS/TOR are rapidly evolving within amniotes (mammals and reptiles, including birds). Additionally, we find evidence of natural selection that diversified the hormonereceptor binding relationships that initiate IIS/TOR signaling. Our results uncover substantial variation in the IIS/TOR network within and among amniotes and provide a critical step to unlocking information on vertebrate patterns of genetic regulation of metabolism, modes of reproduction, and rates of aging. | insulin growth factor | molecular evolution | rapamycin T he last 20 y has provided overwhelming support that the insulin- and insulin-like signaling/target of rapamycin (IIS/TOR) molecular network responds to stress and nutrients and underlies a wide range of physiological functions (1); cancer, metabolic syndrome, and diabetes (2); and the timing of life events (e.g., growth, maturation, reproduction, and aging) (3). The vertebrate IIS/TOR network consists of peptide hormones, binding proteins that regulate hormone bioavailability, and cell membrane receptors (hereafter, extracellular proteins of the IIS/TOR network) that induce an intracellular signaling cascade (hereafter, intracellular proteins of the IIS/TOR network) to stimulate cell proliferation, survival, and metabolism (Fig. S1). The core intracellular signal transduction genes in this network are largely conserved across deep phylogenetic time (4, 5). In contrast, genes encoding the IIS/TOR extracellular network have diverged in the vertebrate lineage (6–8) and may have variable roles among taxa (9, 10). Despite its central role in health, comparative analyses of IIS/TOR have been limited to model invertebrates and mammals. Here we conduct evolutionary analyses of IIS/TOR across amniotes: i.e., mammals and their reptile sister clade, which includes birds (Fig. S2). Many life www.pnas.org/cgi/doi/10.1073/pnas.1419659112 Author contributions: S.E.M., A.M.B., D.M.R., E.A.A., F.J.J., and T.S.S. designed and performed research; S.E.M., A.M.B., C.-H.K., L.E.F., and T.S.S. analyzed data; and S.E.M., A.M.B., C.-H.K., D.M.R., E.A.A., L.E.F., F.J.J., and T.S.S. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The sequences reported in this paper have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive, www.ncbi. nlm.nih.gov/sra (accession nos. SRA062458 and SRP017466). Transcriptome assemblies, annotation summaries, and alignments for protein coevolution analyses are available through Dryad (10.5061/dryad.vn872). 1 To whom correspondence may be addressed. Email: smcgaugh@umn.edu (assemblies, alignments, and molecular evolution analyses), abroniko@iastate.edu (the study itself, transcriptomes, and accessing data), chk@gate.sinica.edu.tw (OrthoMCL analyses), or tschwartz@uab.edu (protein predictions and coevolutionary and network interactions). 2 Present address: Department of Biology, Luther College, Decorah, IA 52101. 3 Present Address: Department of Biology, Gonzaga University, Spokane, WA 99258. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1419659112/-/DCSupplemental. PNAS | June 2, 2015 | vol. 112 | no. 22 | 7055–7060 EVOLUTION The insulin/insulin-like signaling and target of rapamycin (IIS/TOR) network regulates lifespan and reproduction, as well as metabolic diseases, cancer, and aging. Despite its vital role in health, comparative analyses of IIS/TOR have been limited to invertebrates and mammals. We conducted an extensive evolutionary analysis of the IIS/TOR network across 66 amniotes with 18 newly generated transcriptomes from nonavian reptiles and additional available genomes/transcriptomes. We uncovered rapid and extensive molecular evolution between reptiles (including birds) and mammals: (i) the IIS/TOR network, including the critical nodes insulin receptor substrate (IRS) and phosphatidylinositol 3-kinase (PI3K), exhibit divergent evolutionary rates between reptiles and mammals; (ii) compared with a proxy for the rest of the genome, genes of the IIS/TOR extracellular network exhibit exceptionally fast evolutionary rates; and (iii) signatures of positive selection and coevolution of the extracellular network suggest reptile- and mammal-specific interactions between members of the network. In reptiles, positively selected sites cluster on the binding surfaces of insulin-like growth factor 1 (IGF1), IGF1 receptor (IGF1R), and insulin receptor (INSR); whereas in mammals, positively selected sites clustered on the IGF2 binding surface, suggesting that these hormone-receptor binding affinities are targets of positive selection. Further, contrary to reports that IGF2R binds IGF2 only in marsupial and placental mammals, we found positively selected sites clustered on the hormone binding surface of reptile IGF2R that suggest that IGF2R binds to IGF hormones in diverse taxa and may have evolved in reptiles. These data suggest that key IIS/TOR paralogs have sub- or neofunctionalized between mammals and reptiles and that this network may underlie fundamental life history and physiological differences between these amniote sister clades. 7056 | www.pnas.org/cgi/doi/10.1073/pnas.1419659112 Mammal−Reptile Divergence 3 0.5 Median Ks IIS/TOR Network Contains Fast-Evolving Outliers. Twenty-six genes (of 61 analyzed) from the IIS/TOR network exhibited divergent evolutionary rates between reptiles and mammals (i.e., significant likelihood ratio test between the null and alternative models) using the Clade model in PAML (31) and a P value estimated following refs. 32 and 33 and corrected for multiple tests by sequential Bonferroni (Table S3, CMC reptiles). For 20 of these 26 divergent genes, the ω [nonsynonymous substitutions per nonsynonymous site (Ka)/synonymous substitutions per synonymous site (Ks)] for reptiles was significantly greater than the ω for the rest of the tree (e.g., mammals, χ2 = 7.54, P = 0.006). We obtained similar results for a paired Wilcoxon test (P = 0.056), and this Evidence for Positive Selection. To understand how positive selection may have shaped the genes within the IIS/TOR network, we used the branch-site test in PAML, which tests for molecular evolution at the nucleotide level with functional impacts at the protein level. In the first analysis, the branch leading to reptiles was tested for evidence of positive selection (i.e., was placed in the “foreground,” which functions to test the predicted ancestral reptile against all mammals). Eighteen genes showed significant signatures of positive selection along this branch leading to reptiles, six of which remained significant after sequential Bonferroni correction: IGF2R and five intracellular genes [protein kinase C gamma (PRKCG), inositol polyphosphate phosphatase-like 1, phosphatidylinositol 3-kinase regulatory subunit (PIK3R), IRS1, and IRS2] (Table S3). In the second analysis, with the branch leading to mammals designated as the foreground branch, testing this predicted ancestral mammalian branch against all reptiles, 23 genes showed significant signatures of positive selection, 9 of which remained significant after sequential Bonferroni correction. This group included most of the genes that were significant along the Median Ka Results We identified an average of 31,060 unique ORFs per species (range, 15,893–102,156) and used OrthoMCL (29) and quality control methods to produce alignments of putative orthologs across 66 species (Table S2) (see data deposition footnote and SI Text). We focused on 61 genes from the IIS/TOR network that were identified through KEGG pathways (Kyoto encyclopedia of genes and genomes, ref. 30) and/or previous publications (Fig. S1 and Table S1) (8, 24). Alignments of these focal genes contained 19–66 species (median = 62, mode = 66; mean = 58.4; Table S1). To provide a proxy for evolution of the noninsulin signaling genes in the genome, we used 1,417 putative orthologs that contained all 66 species and referred to these as control genes. We also analyzed (i) 48 of the 61 focal genes that had greater representation of species within the alignments (56–66 species, median = 63.5, mode = 66, mean = 62.6), and (ii) a control and IIS/TOR focal gene set that contained phylogenetically matched species (43 control genes and 43 focal genes with identical species). Both of these analyses are presented in the SI Materials and Methods and are consistent with the analysis reported below that extracellular genes of the network are highly divergent outliers. difference in divergence between clades was also seen among control genes (SI Results). Extracellular genes of the IIS/TOR network exhibited greater divergence between mammals and reptiles than 1,417 control genes and intracellular genes when measured by the median of all pairwise mammal-reptile Ka/Ks measures. Extracellular genes had equivalent Ks compared with control genes (Wilcoxon rank sum test, W = 5476, P = 0.2154), but had notably greater median ω (W = 2998, P = 0.002) and Ka (W = 2333.5, P < 0.001; Fig. 1). Compared with intracellular genes, extracellular genes also had significantly higher ω and Ka (ω: W = 129.5, P = 0.02; Ka: W = 88, P = 0.001), but Ks did not differ (W = 209, P = 0.375). Collectively, the intracellular IIS/TOR genes did not have elevated median Ka, Ks, or ω compared with control genes (P > 0.467 in all cases; Fig. 1). In all cases, the median for intracellular and control genes was identical to the hundredths place. The median Ks for extracellular genes was 1.59, and the median Ks for intracellular and control genes was 1.60. The median Ka for extracellular genes was 0.17, and the median Ka for intracellular and control genes was 0.09. The median ω for extracellular genes was 0.11 and for intracellular and control genes ω was 0.06. When comparing the distribution of ω values for each group of IIS/TOR genes to the distribution of the ω values for the control genes, the extracellular genes were 8.4 times more likely than control genes to reside in the highest 5% of ω values (OR, 8.37; 95% CI, 2.12, 33.08). In comparison, the intracellular genes were not significantly more likely than controls to be in top 5% (OR, 2.21; 95% CI, 0.82, 5.51). These odds ratios imply that the extracellular group contains the fastest evolving components of the IIS/TOR network. Median Ka/Ks by bringing IGF2 into the cell for degradation (17). It is widely hypothesized that IGF2-IGF2R binding is unique to therian mammals (marsupials and placentals) due to sexual conflict in regulating paternal IGF2 during placental and embryo development (10, 18–21). Previous evolutionary analyses of IIS/TOR in vertebrate and invertebrate lineages suggest that extracellular genes (Table S1 and Fig. S1) often experienced positive selection, whereas intracellular genes often experienced purifying selection (22–28), especially farther downstream in the intracellular network. These previous findings have supported the prediction that upstream extracellular factors (e.g., the initial components that interact with environmental stimuli in signal transduction pathways) may have larger impacts on signaling through the network than downstream components. However, comparative studies of (co)evolution among extracellular components of the network have not been possible with studies from invertebrates due to the near absence of these paralogs. In addition, mammalian studies of this network have not included the other half of the amniote group (avian and nonavian reptiles), except chickens. Thus, a general understanding of the evolution of this network and the coevolutionary relationships among the proteins of this network has not been possible. We analyze coding sequence data from 32 species of mammal— an order of magnitude higher than previous comparative studies of mammals—and 34 species of reptile (10 species of birds and 24 nonavian reptiles; Fig. S2). Analyses of this improved sampling revealed that members of the IIS/TOR network, particularly extracellular and critical intracellular genes, exhibit exceptionally fast evolutionary rates between mammals and reptiles relative to the rest of the genome. Additionally, strong positive selection occurs at amino acid sites important for hormone-receptor protein interactions, and this selection likely shapes binding affinities in reptile- and mammal-specific ways. 0.4 0.2 2 0.3 0.2 0.1 1 0.1 0.0 0.0 Control Extra Intra Control Extra Intra Control Extra Intra Fig. 1. Medians of pairwise measures between all reptiles and mammals per gene for Ka, Ks, and Ka/Ks calculated in PAML. Control = 1,417 genes not in the IIS/TOR network, with 66 taxa represented in the alignments. Intracellular = 51 genes. Extracellular = 10 genes (hormones, receptors, IGF binding proteins). Extracellular genes exhibit significantly greater median Ka/Ks and median Ka than control or intracellular genes. McGaugh et al. Positive Selection in the IIS/TOR Hormones and Receptors Show Clade-Specific Patterns. Positively selected sites [i.e., those in- dicated by PAML branch-site models with Bayes Empirical Bayes (BEB) score of 0.9 or greater] were analyzed in the context of the protein structure and predicted protein-protein interactions between insulin/IGF hormones and their receptors. We found reptile- and mammal-specific patterns of positive selection in the hormone and receptor domains that are important for binding affinity. First, the mature INS hormone (containing protein domains A and B) was conserved in reptiles and mammals, whereas the C-peptide that is cleaved from the mature insulin protein (35) contained four positively selected sites in mammals. Second, 5 of the 12 amino acids of the C-domain in IGF1 in reptiles, but none in mammals, were positively selected. Third, for IGF2, 3 of the 16 amino acids of the C-domain in mammals, but none in reptiles, were positively selected (Fig. 2A). IGF hormones bind to the receptors IGF1R and INSR by interacting with specific domains on each receptor (L1, CR, and L2) (36, 37). Variation in the C-domain of IGF1 and IGF2 can regulate binding specificity to IGF1R (38) and to INSR (39) through the interactions of the C-domain of the hormones with the CR-domain in the binding pocket of IGF1R and INSR (36) (Fig. 2B). Specifically, previous mutagenesis studies revealed that altering one of the positively selected sites (IGF1 C-domain R37, human numbering used throughout) disrupts IGF1-IGF1R binding (16, 40). In reptiles, positively selected sites were clustered on the hormone-binding surface of the IGF1R CR domain and in the binding pocket of INSR. They include IGF1R site F251, A C B A C Mammal B D C C A A RepƟle Mammal IGF2 P/T4 D IGF2 R/Q1623 CR B N251 L1 IGF1 T/N1558 S37 C L2 A bloodstream to regulate their bioavailability (48, 49). These IGFBPs are characterized by N- and C-terminal domains that cooperate to bind IGFs; protease cleavage separating these domains decreases affinity to IGFs. In both reptiles and mammals (except primates), many of our assembled IGFBP transcripts were either completely missing the N-terminal domain or it was truncated (Table S5 and Fig. S3). As assembled, these transcripts would produce truncated proteins with diminished binding affinity to IGF1 and IGF2. We summarize putative losses and truncations in Table S4 to serve as a hypothesis-generating resource for future validation work. Most evident is IGFBP6, which was neither found in any archosaurs (birds and crocodilians) nor in platypus (8). The 5′ end of IGFBP6 was truncated in nearly all other reptiles including genome-derived ENSEMBL sequences of the Anolis lizard and the Pelodiscus IGF2 B D Binding Proteins Exhibit Putative Truncations of Important Functional Domains. IGF binding proteins bind to IGF1 and IGF2 in the B RepƟle IGF1R with IGF1 Ligands IGF1 RepƟle which affects IGF1-IGF1R binding in humans through its interaction with the IGF1 C-domain (36) (Table S4). In therian mammals, IGF2R binds IGF2 with relatively high affinity, but studies of this interaction in reptiles (mainly chickens) have yielded conflicting results (18, 21, 41, 42). We found that IGF2R has been shaped by putatively strong positive selection within reptiles and positively selected sites clustered on the IGF2R protein surface in domain 11, which is intimately involved in binding IGF1 and IGF2. Several of the positively selected sites on the protein surface of IGF2R in reptiles are essential for binding IGF2 based on mutagenesis studies and the crystal structure of the IGF2R-IGF2 complex (e.g., Y1542) (43– 45) (Fig. 2C and Table S4). Although some variants in IGF2R would predict decreased binding to IGF2, such as in chicken, many variants in snakes and lizards predict increased binding to IGF2 and/or IGF1 because they exhibit similar biochemical properties as the human amino acids (e.g., Y1542F in snakes and Y1542L/M in lizards; Table S4). Utilizing Coevolutionary Analysis Using Protein Sequences (CAPS) (46), we identified that amino acid site P4 of IGF2 is coevolving with the positively selected site on the binding surface of IGF2R (site R1623, ρ = 0.4, P < 0.01) in reptiles (Fig. 2C). Among reptiles, MatrixMatchMaker version II (MMMvII) (47) identified sunbeam and viper boa snakes as having the tightest coevolutionary signal between IGF2 and IGF2R (ρ = 1), and identified brown anole, green anole, and gecko lizards as having the tightest coevolutionary signal between IGF1 and IGF2R (ρ = 0.33). Thus, among reptiles, IGF2R binding of IGFs is most likely to be found in the Squamates. IGF2R Domain 11 N1558 IGF2R Domain 11 V1609 D Fig. 2. Protein structures for reptile and mammal IGF hormones and receptors. Reptile protein structures predicted from snake sequence homology modeled onto human protein structures from the Protein Data Bank. Enlarged positions indicate the amino acid sites predicted to be under positive selection (Table S3). (A) Reptile and mammal IGF hormones with their protein domains color coded. Positively selected sites cluster on the C-domain of reptile IGF1 but are not present in the C-domain of the reptilian IGF2. In contrast, positively selected sites cluster on the C-domain of mammal IGF2. (B) The α chain of reptile IGF1R homodimer with hormone binding domains L1, CR, and L2 labeled. The square is an enlargement with IGF1 orientated in the IGF1R binding pocket to demonstrate the clustering of positively selected sites on the interacting IGF1-IGF1R binding surfaces (36). Labeled sites (IGF1 S37 and N251; human numbering) are known to affect IGF hormone and receptor binding (Table S4). (C) Domain 11 of reptile and mammal IGF2R with IGF2 oriented toward the binding pocket to demonstrate the clustering of positively selected sites on the reptile IGF2R binding surfaces (43, 44). The magenta sites on reptile IGF2 and IGF2R were identified as coevolving amino acids using CAPS (46). Labeled sites IGF2R (1558 and 1609; human numbering) are predicted to regulate IGF2-IGF2R binding (Table S4). Like mammals, some lizards have IGF2R N1558. McGaugh et al. PNAS | June 2, 2015 | vol. 112 | no. 22 | 7057 EVOLUTION reptile branch (IGF2R, IRS1, IRS2, PRKCG, and PIK3R), as well as others (Table S3). We also performed the branch-site test with specific lineages within reptiles, because previous research indicated that genes of the IIS/TOR network may be under strong positive selection in Squamata (lizards and snakes) (34) (Table S3). Overall, the branch leading to Squamata had more genes under positive selection (number of genes = 7 of 61) than on the branches leading to crocodilians (n = 6), birds (n = 1), and turtles (n = 5) (when separate tests were conducted for each), i.e., minor differences (Table S3). In additional tests using the clade model, we found that snakes had larger ω relative to the rest of the tree (paired Wilcoxon-sign rank test, V= 24, P = 0.04) across the 15 IIS/TOR network genes that were significant after multiple test correction. turtle. Furthermore, in examining the three N-terminal amino acids that are conserved across all binding proteins in humans (49), only one of these amino acids was conserved in only two snake species in IGFBP6, although all three sites were conserved across the reptile IGFBP2-5. For those amino acids important for binding IGFs and specific to IGFBP6 (49), only 7 of 12 are conserved in reptiles. Two of these seven conserved amino acids have additional functions beyond IGF binding, which requires conservation (Fig. S3). These multiple lines of evidence suggest that IGBP6 does not function as an IGF binding protein across the reptile clade. Discussion We conducted extensive evolutionary analyses of the IIS/TOR network in amniotes (i.e., mammals and reptiles, including birds) and uncovered fundamental differences between reptiles and mammals in the evolution of this centrally important network. Our analyses revealed that members of the IIS/TOR network have exceptionally fast evolutionary rates between reptiles and mammals compared with a proxy for the rest of the genome. More specifically, the extracellular network is a target of positive selection, and the location of the selected sites suggests changes in the hormone-receptor binding relationships in reptile- and mammal-specific patterns. Members of IIS/TOR Network Are Outliers in Evolutionary Rate. Members of the IIS/TOR network, especially the extracellular hormones, receptors, and binding proteins, exhibit remarkably high reptile-mammal divergence compared with control genes. Our results complement those of ref. 24, who found that the IIS/ TOR network across human populations is enriched for genes evolving under positive selection relative to a sample representing the genomic background. Across the amniote scale that we examined, many evolutionary innovations have arisen (e.g., feathers/hair, leglessness, endothermy), and each was likely accompanied by substantial molecular evolution. However, within the 1,478 total genes that we analyzed (61 IIS/TOR network genes plus 1,417 non-IIS/TOR genes), the evolution of the IIS/ TOR extracellular network is a prominent outlier in reptilemammal divergence. Our results provide additional evidence that the phenotypes governed by this pathway, including metabolism and life histories, are key differences between reptiles and mammals. Our data show that multiple IIS/TOR genes are under positive selection in one or more lineages of amniotes. Importantly, these include genes that encode proteins in critical nodes of the IIS/ TOR network that mediate the intracellular signal (e.g., IRS and PI3K) (5) and extracellular nodes that regulate the initiation of the cascade (IGF1R, INSR, IGFBP4, IGFBP5, and IGF2R). Although these genes are implicated in aging and disease phenotypes (3, 50), here we find they are also under positive selection among amniote species. Because vertebrate IIS/TOR connects with many other networks, we cannot directly compare our results to studies in the more simplified invertebrate network (23, 26, 27). However, our findings of elevated ωs agree with those of ref. 28 and supply further support that extracellular components are among the fastest evolving genes in the IIS/TOR network (22), as is likely true in other networks. Overall, our data are in agreement with reports that receptors and other extracellular components of signal transduction pathways appear to be under less purifying selection than intracellular components (51–53). Indeed, our data indicate that one potential driver of differences in evolutionary rates among genes in the network may be the number of interactions that a gene has with other genes or proteins (i.e., connectivity; SI Results) similar to what has been seen in other systems (54–56, but see refs. 27, 57, and 58). Evolving Interactions in the Extracellular Network. Our data strongly support the conclusion that many of the IIS/TOR extracellular proteins have undergone positive selection. Detailed evaluation of the protein structure of the hormones, receptors, and binding 7058 | www.pnas.org/cgi/doi/10.1073/pnas.1419659112 proteins of IIS/TOR suggests that these binding relationships are targets of clade-specific selection between mammals and reptiles. In reptiles, structural evaluations indicated that residues on the interacting binding surfaces of IGF1 C-domain and the IGF1R CR-domain are under positive selection. Specifically, positively selected amino acid sites identified by our models have previously been shown to modulate binding when altered in humans (16). More broadly, mutations predicted to affect this binding relationship are associated with longevity in humans (59) and model organisms (3). In contrast, positive selection on the IGF2 C-domain in mammals suggests that IGF2 binding affinities with both INSR and IGF1R may be targets of positive selection in mammals. The positive selection putatively affecting hormone-receptor binding relationships across amniote species equates to selection at the cell surface start of the IIS/TOR signaling cascade. Functional studies are necessary to further our understanding of the regulatory effects of these changes and the stability of the physiological roles of extracellular IIS/TOR proteins across amniotes. Juvenile and adult IGF2 gene expression is observed in humans (60, 61) and fish (62), but not in adult mice and rats (the typical vertebrate models for studying the IIS/TOR network) (63). We found IGF1 and IGF2 gene expression in each of our reptile transcriptomes, regardless of whether the source liver was from a juvenile or an adult (Table S2). This observation underscores the importance of broad taxonomic sampling for understanding the function and evolution of pathways important to human health—for which rodent models may not always be the most appropriate. Together, our molecular evolution and expression data suggest that the IGF2 protein may have a more stable role in IIS/TOR signaling across reptiles, in contrast to the more variable and specialized roles of IGF2 across mammals (64). IGF2R-IGF2 binding is believed to have evolved in therian mammals for maternal regulation of paternally imprinted IGF2 (10). This hypothesis of mammalian-specific function was bolstered by early studies showing that IGF2R does not effectively bind IGF2 in chicken (18, 21, 41), Xenopus (18), or monotremes (10), and IGF2R has lower affinity for IGF2 in marsupials compared with placental mammals (19, 20). However, more sensitive assays have indicated that IGF2R-IGF2 binding occurs in chicken, trout, and garden lizards (42, 65, 66), which counters the claim that measurable IGF2-IGF2R binding is confined to mammals. Our data provide support for the hypothesis that positive selection drove the high-affinity binding between IGF2R and IGF2 in placental mammals relative to monotremes and marsupials. Our data also call into question the assumption that IGF2R does not bind IGF hormones in reptiles. IGF2R in chicken contains a substitution thought to inhibit IGF2 binding ability (isoleucine to leucine at 1572, I1572L) (67, 68). However, our work shows that this amino acid is a conserved isoleucine in many reptile species, even within other birds (66). Additionally, many of the sites that are important for binding of IGF2 to IGF2R in mammals are conserved across most reptiles in our study. Because chickens have typically been used as the sole representative of the reptile clade, we suggest that this narrow sampling promoted the premature conclusion that IGF2 binds IGF2R only in mammals. Further, in reptiles, we found a signal of coevolution between IGF2 and IGF2R in our CAPs and MMMvII analyses. Additionally, we found three sites under positive selection on the surface of the IGF1 that would likely promote binding with IGF2R (34, 67). Thus, by extending the comparative genomic landscape, we suggest that IGF-IGF2R binding may not be unique to therian mammals but also may occur in some reptile species. IGF binding proteins regulate the ability of hormones to activate receptors through steric hindrance, thereby limiting the bioavailability of IGF1 and IGF2 to initiate the IIS/TOR signaling cascade (49). Intriguingly, we found that many reptile species appear to have truncated or missing N-terminal domains across the IGFBPs that would decrease IGF binding affinity. Confirming results from ref. 8, IGFBP6 was not recovered from opossum, platypus, or any bird or crocodile. When identified in our other reptile transcriptomes and ENSEMBL-derived McGaugh et al. been associated with longevity in humans (59, 78, 80–82). Likewise, our comparative genomic analyses show that many IIS/TOR genes are variable across amniotes and that the binding affinities of IGF1, IGF1R and INSR, and thereby the initiation of IIS/TOR signaling, is likely impacted. Future comparative analyses of the IIS/TOR network across amniotes and within reptiles may provide unique insights into the regulation of body size, reproductive investment (e.g., placentation), and rates of aging (83). Materials and Methods We used transcriptomic and genomic data across amniotes to evaluate molecular evolution of the IIS/TOR pathway between reptiles and mammals. All animal protocols were approved by the Iowa State University Institutional Animal Care and Use Committee (log 3-2-5125J). De novo liver transcriptome assembly was performed in Trinity (Table S2), and some gene sets were obtained through past studies (Table S2). The longest ORF from each assembled transcript was used for defining homologs through OrthoMCL (29). Sequences within each putative ortholog were further clustered so that a single transcript represented each ortholog from each species. Transcripts were translated, and amino acid sequences were aligned with MSAprobs (84). Alignments were back-translated to the original nucleic acids with RevTrans (85) and trimmed of poorly aligned regions using Gblocks (86). These cleaned nucleotide alignments were analyzed for molecular evolutionary parameters and models of sequence evolution in PAML (31). Positively selected sites for extracellular genes were predicted for reptiles and mammals using the branch-site model in PAML. Sites with signatures of positive selection were evaluated for putative functional significance on human protein structures from the Protein Data Bank (PDB) or predicted reptile structures from homology modeling of snake sequences onto human structures. Hormone and IGF2R amino acid alignments were used for coevolution analyses with CAPS (46) [significance of permutations (P < 0.01) detailed in SI Materials and Methods] and MMMvII (47) (tolerance level: 0.2). We describe each of these steps in detail in SI Materials and Methods. Comparative Genomics Approach. The insights our study provide into the evolution of the IIS/TOR network were previously unattainable without adequate molecular resources in reptiles. Our work adds to the recent discoveries of rapid evolution of genes involved in development and metabolism in the branch leading to modern snakes (71) and of regulatory innovation in IGFBP2 and IGFBP5 in the branch leading to modern birds (72). Although de novo transcriptome assemblies may not fully reveal all biologically important signals in data (such as species-specific isoforms and very recent paralogs) (73), when combined with available genomes, ours revealed insights into the evolution of the IIS/TOR network. Although the core of the IIS/TOR network is conserved in animals (4, 5), we found high divergence and selection on genes in this network between mammals and their sister clade reptiles (including birds). The extracellular genes of this network had exceptionally fast divergence between reptiles and mammals relative to genomic background, and many genes have been shaped by positive selection. Hormones, receptors, and binding proteins that are essential for producing a physiological response to environmental stimuli have undergone taxon-specific patterns of positive selection. Our results suggest that key paralogs have subfunctionalized or neofunctionalized between reptiles and mammals and that this network may underlie fundamental life history and physiological differences between these clades. In a larger context, the strength of comparative biology in understanding human health and disease lies in its power to distinguish conserved vs. flexible mechanisms of normal and disease states and thereby suggest worthy targets of biomedical research into future interventions (74, 75). For example, lifespan extension is observed with mutant IGF1, IGF1R, and IRS across diverse model species (3, 76–79)—where a shared effect on IIS/TOR signaling is to either decrease rates of signaling by disrupting protein-protein interactions or to decrease normal levels of hormone or receptor. In addition, the IIS/TOR network has ACKNOWLEDGMENTS. We thank D. Warner, R. Telemeco, A. Cordero, N. Ford, K. Wray, T. Owerkowicz, and C. Watson for contributing specimens; E. Tillier for advice on MMMvII; and A. Brown, J. P. de Magalhaes, and an anonymous reviewer for useful comments. We thank the Baylor College of Medicine and The Genome Institute at Washington University in St. Louis for use of the unpublished genomic sequence. We thank the Broad Institute Genomics Platform, Vertebrate Genome Biology group, J. Alfoldi, and K. Lindblad-Toh for making the Mustela putorius and Microtus ochrogaster data available. We are grateful for resources from the University of Minnesota Supercomputing Institute, University of Alabama at Birmingham Office of Energetics, and the Iowa State University High Performance Computing facility. This research was supported by National Science Foundation (NSF) Grants IOS0922528 and IOS-1253896 (to A.M.B.) and DEB-DDIG-1011350 (to A.M.B. and T.S.S.) and grants from the Iowa State University Center for Integrated Animal Genomics (to A.M.B. and F.J.J.). We acknowledge additional support from the NSF (Graduate Research Fellowship to S.E.M.), the James S. McDonnell Foundation (postdoctoral fellowship to T.S.S.), the Howard Hughes Medical Institute (postdoctoral support to E.A.A.), and Academia Sinica (C.H.K.). 1. Wullschleger S, Loewith R, Hall MN (2006) TOR signaling in growth and metabolism. Cell 124(3):471–484. 2. Zoncu R, Efeyan A, Sabatini DM (2011) mTOR: From growth signal integration to cancer, diabetes and ageing. Nat Rev Mol Cell Biol 12(1):21–35. 3. Kenyon CJ (2010) The genetics of ageing. Nature 464(7288):504–512. 4. Oldham S (2011) Obesity and nutrient sensing TOR pathway in flies and vertebrates: Functional conservation of genetic mechanisms. Trends Endocrinol Metab 22(2):45–52. 5. Taniguchi CM, Emanuelli B, Kahn CR (2006) Critical nodes in signalling pathways: Insights into insulin action. Nat Rev Mol Cell Biol 7(2):85–96. 6. Olinski RP, Lundin L-G, Hallböök F (2006) Conserved synteny between the Ciona genome and human paralogons identifies large duplication events in the molecular evolution of the insulin-relaxin gene family. Mol Biol Evol 23(1):10–22. 7. Hernández-Sánchez C, Mansilla A, de Pablo F, Zardoya R (2008) Evolution of the insulin receptor family and receptor isoform expression in vertebrates. Mol Biol Evol 25(6):1043–1053. 8. Daza DO, Sundström G, Bergqvist CA, Duan C, Larhammar D (2011) Evolution of the insulinlike growth factor binding protein (IGFBP) family. Endocrinology 152(6):2278–2289. 9. O’Neill MJ, et al. (2007) Ancient and continuing Darwinian selection on insulin-like growth factor II in placental fishes. Proc Natl Acad Sci USA 104(30):12404–12409. 10. Killian JK, et al. (2000) M6P/IGF2R imprinting evolution in mammals. Mol Cell 5(4): 707–716. 11. Schwartz TS, Bronikowski AM (2011) Molecular stress pathways and the evolution of life histories in reptiles. Molecular Mechanisms of Life History Evolution, ed Heyland F (Oxford Univ Press, Oxford, UK). 12. de Magalhães JP, Toussaint O (2002) The evolution of mammalian aging. Exp Gerontol 37(6):769–775. 13. Swanson EM, Dantzer B (2014) Insulin-like growth factor-1 is associated with lifehistory variation across Mammalia. Proc Royal Soc B Biol Sci 281(1782):20132458. 14. Sparkman AM, Vleck CM, Bronikowski AM (2009) Evolutionary ecology of endocrinemediated life-history variation in the garter snake Thamnophis elegans. Ecology 90(3):720–728. 15. Sparkman AM, Byars D, Ford NB, Bronikowski AM (2010) The role of insulin-like growth factor-1 (IGF-1) in growth and reproduction in female brown house snakes (Lamprophis fuliginosus). Gen Comp Endocrinol 168(3):408–414. 16. Denley A, Cosgrove LJ, Booker GW, Wallace JC, Forbes BE (2005) Molecular interactions of the IGF system. Cytokine Growth Factor Rev 16(4-5):421–439. 17. Ghosh P, Dahms NM, Kornfeld S (2003) Mannose 6-phosphate receptors: New twists in the tale. Nat Rev Mol Cell Biol 4(3):202–212. 18. Clairmont KB, Czech MP (1989) Chicken and Xenopus mannose 6-phosphate receptors fail to bind insulin-like growth factor II. J Biol Chem 264(28):16390–16392. 19. Dahms NM, Brzycki-Wessell MA, Ramanujam KS, Seetharam B (1993) Characterization of mannose 6-phosphate receptors (MPRs) from opossum liver: Opossum cationindependent MPR binds insulin-like growth factor-II. Endocrinology 133(2):440–446. McGaugh et al. PNAS | June 2, 2015 | vol. 112 | no. 22 | 7059 EVOLUTION genomic data, the N terminus of the protein was truncated. These data suggest that across reptiles, IGFBP6 is not functioning as an IGF binding protein. Like IGF2-IGF2R binding, IGF2-IGFBP6 binding in mammals functions to regulate IGF2 levels during embryo development in placental mammals (69). The putative loss of this regulatory mechanism in both reptiles and some nonplacental mammals is particularly interesting given that placentation has evolved not only in mammals but also in various snake and lizard species (70). Thus, our data suggest that in many reptiles (i) IGFBP6 has been lost, (ii) IGF2R binds IGF hormones, and (iii) novel positive selection characterizes IGF1-IGF1R binding. Therefore, future functional assays should address the role of IIS/TOR extracellular signaling in the evolution of viviparity and placentation in Squamates, relative to that in placental mammals (10) and placental fish (9). 20. Yandell CA, Dunbar AJ, Wheldrake JF, Upton Z (1999) The kangaroo cation-independent mannose 6-phosphate receptor binds insulin-like growth factor II with low affinity. J Biol Chem 274(38):27076–27082. 21. Canfield WM, Kornfeld S (1989) The chicken liver cation-independent mannose 6-phosphate receptor lacks the high affinity binding site for insulin-like growth factor II. J Biol Chem 264(13):7100–7103. 22. Alvarez-Ponce D, Aguadé M, Rozas J (2013) comment on “The Molecular evolutionary patterns of the Insulin/FOXO signaling pathway”. Evol Bioinform Online 9:229–234. 23. Alvarez-Ponce D, et al. (2012) Molecular population genetics of the insulin/TOR signal transduction pathway: A network-level analysis in Drosophila melanogaster. Mol Biol Evol 29(1):123–132. 24. Luisi P, et al. (2012) Network-level and population genetics analysis of the insulin/TOR signal transduction pathway across human populations. Mol Biol Evol 29(5):1379–1392. 25. Alvarez-Ponce D, Aguadé M, Rozas J (2011) Comparative genomics of the vertebrate insulin/TOR signal transduction pathway: A network-level analysis of selective pressures. Genome Biol Evol 3:87–101. 26. Alvarez-Ponce D, Aguadé M, Rozas J (2009) Network-level molecular evolutionary analysis of the insulin/TOR signal transduction pathway across 12 Drosophila genomes. Genome Res 19(2):234–242. 27. Jovelin R, Phillips PC (2011) Expression level drives the pattern of selective constraints along the insulin/Tor signal transduction pathway in Caenorhabditis. Genome Biol Evol 3:715–722. 28. Wang M, et al. (2013) The molecular evolutionary patterns of the Insulin/FOXO signaling pathway. Evol Bioinform Online 9:1–16. 29. Li L, Stoeckert CJ, Jr, Roos DS (2003) OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. 30. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30. 31. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591. 32. Self SG, Liang K-L (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82(398):605–610. 33. Goldman N, Whelan S (2000) Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol Biol Evol 17(6):975–978. 34. Sparkman AM, et al. (2012) Rates of molecular evolution vary in vertebrates for insulin-like growth factor-1 (IGF-1), a pleiotropic locus that regulates life history traits. Gen Comp Endocrinol 178(1):164–173. 35. Wahren J (2004) C-peptide: New findings and therapeutic implications in diabetes. Clin Physiol Funct Imaging 24(4):180–189. 36. Keyhanfar M, Booker GW, Whittaker J, Wallace JC, Forbes BE (2007) Precise mapping of an IGF-I-binding site on the IGF-1R. Biochem J 401(1):269–277. 37. Epa VC, Ward CW (2006) Model for the complex between the insulin-like growth factor I and its receptor: Towards designing antagonists for the IGF-1 receptor. Protein Eng Des Sel 19(8):377–384. 38. Bayne ML, et al. (1989) The C region of human insulin-like growth factor (IGF) I is required for high affinity binding to the type 1 IGF receptor. J Biol Chem 264(19): 11004–11008. 39. Denley A, et al. (2004) Structural determinants for high-affinity binding of insulin-like growth factor II to insulin receptor (IR)-A, the exon 11 minus isoform of the IR. Mol Endocrinol 18(10):2502–2512. 40. Zhang W, Gustafson TA, Rutter WJ, Johnson JD (1994) Positively charged side chains in the insulin-like growth factor-1 C- and D-regions determine receptor binding specificity. J Biol Chem 269(14):10609–10613. 41. Yang YW, Robbins AR, Nissley SP, Rechler MM (1991) The chick embryo fibroblast cation-independent mannose 6-phosphate receptor is functional and immunologically related to the mammalian insulin-like growth factor-II (IGF-II)/man 6-P receptor but does not bind IGF-II. Endocrinology 128(2):1177–1189. 42. Koduru S, Yadavalli S, Nadimpalli SK (2006) Mannose 6-phosphate receptor (MPR 300) proteins from goat and chicken bind human IGF-II. Biosci Rep 26(2):101–112. 43. Brown J, Jones EY, Forbes BE (2009) Interactions of IGF-II with the IGF2R/cationindependent mannose-6-phosphate receptor mechanism and biological outcomes. Vitam Horm 80:699–719. 44. Williams C, et al. (2012) An exon splice enhancer primes IGF2:IGF2R binding site structure and function evolution. Science 338(6111):1209–1213. 45. Brown J, et al. (2008) Structure and functional analysis of the IGF-II/IGF2R interaction. EMBO J 27(1):265–276. 46. Fares MA, McNally D (2006) CAPS: Coevolution analysis using protein sequences. Bioinformatics 22(22):2821–2822. 47. Rodionov A, Bezginov A, Rose J, Tillier ER (2011) A new, fast algorithm for detecting protein coevolution using maximum compatible cliques. Algorithms Mol Biol 6(1):17. 48. Duan C, Xu Q (2005) Roles of insulin-like growth factor (IGF) binding proteins in regulating IGF actions. Gen Comp Endocrinol 142(1-2):44–52. 49. Forbes BE, McCarthy P, Norton RS (2012) Insulin-like growth factor binding proteins: A structural perspective. Front Endocrinol (Lausanne) 3:38. 50. Moloney AM, et al. (2010) Defects in IGF-1 receptor, insulin receptor and IRS-1/2 in Alzheimer’s disease indicate possible resistance to IGF-1 and insulin signalling. Neurobiol Aging 31(2):224–243. 51. Han M, et al. (2013) Evolutionary rate patterns of genes involved in the Drosophila Toll and Imd signaling pathway. BMC Evol Biol 13(1):245. 52. Song X, Jin P, Qin S, Chen L, Ma F (2012) The evolution and origin of animal Toll-like receptor signaling pathway revealed by network-level molecular evolutionary analyses. PLoS ONE 7(12):e51657. 53. Cui Q, Purisima EO, Wang E (2009) Protein evolution on a human signaling network. BMC Syst Biol 3(1):21. 7060 | www.pnas.org/cgi/doi/10.1073/pnas.1419659112 54. Montanucci L, Laayouni H, Dall’Olio GM, Bertranpetit J (2011) Molecular evolution and network-level analysis of the N-glycosylation metabolic pathway across primates. Mol Biol Evol 28(1):813–823. 55. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296(5568):750–752. 56. Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol 3(1):11. 57. Bloom JD, Adami C (2004) Evolutionary rate depends on number of protein-protein interactions independently of gene expression level: Response. BMC Evol Biol 4(1):14. 58. Larracuente AM, et al. (2008) Evolution of protein-coding genes in Drosophila. Trends Genet 24(3):114–123. 59. Suh Y, et al. (2008) Functionally significant insulin-like growth factor I receptor mutations in centenarians. Proc Natl Acad Sci USA 105(9):3438–3442. 60. Hawkes C, Kar S (2004) The insulin-like growth factor-II/mannose-6-phosphate receptor: Structure, distribution and function in the central nervous system. Brain Res Brain Res Rev 44(2-3):117–140. 61. Russo VC, Gluckman PD, Feldman EL, Werther GA (2005) The insulin-like growth factor system and its pleiotropic functions in brain. Endocr Rev 26(7):916–943. 62. Yuan X-N, Jiang X-Y, Pu J-W, Li Z-R, Zou S-M (2011) Functional conservation and divergence of duplicated insulin-like growth factor 2 genes in grass carp (Ctenopharyngodon idellus). Gene 470(1-2):46–52. 63. Brown AL, et al. (1986) Developmental regulation of insulin-like growth factor II mRNA in different rat tissues. J Biol Chem 261(28):13144–13150. 64. Killian JK, et al. (2001) Monotreme IGF2 expression and ancestral origin of genomic imprinting. J Exp Zool 291(2):205–212. 65. Méndez E, Planas JV, Castillo J, Navarro I, Gutiérrez J (2001) Identification of a type II insulin-like growth factor receptor in fish embryos. Endocrinology 142(3):1090–1097. 66. Sivaramakrishna Y, Amancha PK, Siva Kumar N (2009) Reptilian MPR 300 is also the IGF-IIR: Cloning, sequencing and functional characterization of the IGF-II binding domain. Int J Biol Macromol 44(5):435–440. 67. Zhou M, Ma Z, Sly WS (1995) Cloning and expression of the cDNA of chicken cation-independent mannose-6-phosphate receptor. Proc Natl Acad Sci USA 92(21):9762–9766. 68. Garmroudi F, Devi G, Slentz DH, Schaffer BS, MacDonald RG (1996) Truncated forms of the insulin-like growth factor II (IGF-II)/mannose 6-phosphate receptor encompassing the IGF-II binding site: Characterization of a point mutation that abolishes IGF-II binding. Mol Endocrinol 10(6):642–651. 69. Gadd TS, Osgerby JC, Wathes DC (2002) Regulation of insulin-like growth factor binding protein-6 expression in the reproductive tract throughout the estrous cycle and during the development of the placenta in the ewe. Biol Reprod 67(6):1756–1762. 70. Murphy BF, Thompson MB (2011) A review of the evolution of viviparity in squamate reptiles: The past, present and future role of molecular biology and genomics. J Comp Physiol B 181(5):575–594. 71. Castoe TA, et al. (2013) The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Natl Acad Sci USA 110(51):20645–20650. 72. Lowe CB, Clarke JA, Baker AJ, Haussler D, Edwards SV (2015) Feather development genes and associated regulatory innovation predate the origin of Dinosauria. Mol Biol Evol 32(1):23–28. 73. Vijay N, Poelstra JW, Künstner A, Wolf JBW (2013) Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 22(3):620–634. 74. Austad SN (2010) Cats, “rats,” and bats: The comparative biology of aging in the 21st century. Integr Comp Biol 50(5):783–792. 75. Alberts SC, et al. (2013) Reproductive aging patterns in primates reveal that humans are distinct. Proc Natl Acad Sci USA 110(33):13440–13445. 76. Yamamoto R, Tatar M (2011) Insulin receptor substrate chico acts with the transcription factor FOXO to extend Drosophila lifespan. Aging Cell 10(4):729–732. 77. Bartke A (2008) Impact of reduced insulin-like growth factor-1/insulin signaling on aging in mammals: Novel findings. Aging Cell 7(3):285–290. 78. Tacutu R, et al. (2013) Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing. Nucleic Acids Res 41(Database issue, D1): D1027–D1033. 79. Li Y, de Magalhães JP (2013) Accelerated protein evolution analysis reveals genes and pathways associated with the evolution of mammalian longevity. Age (Dordr) 35(2): 301–314. 80. Soerensen M, et al. (2012) Human longevity and variation in GH/IGF-1/insulin signaling, DNA damage signaling and repair and pro/antioxidant pathway genes: Cross sectional and longitudinal studies. Exp Gerontol 47(5):379–387. 81. Ziv E, Hu D (2011) Genetic variation in insulin/IGF-1 signaling pathways and longevity. Ageing Res Rev 10(2):201–204. 82. de Magalhães JP (2014) Why genes extending lifespan in model organisms have not been consistently associated with human longevity and what it means to translation research. Cell Cycle 13(17):2671–2673. 83. Miller DAW, Janzen FJ, Fellers GM, Kleeman PM, Bronikowski A (2014) Biodemography of ectothermic tetrapods provides insights into the evolution and plasticity of mortality trajectories. Sociality, Hierarchy, Health: Comparative Demography Advances in Biodemography, eds Weinstein M, Lane MA (The National Academies Press, Washington, DC). 84. Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: Multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26(16):1958–1964. 85. Wernersson R, Pedersen AG (2003) RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res 31(13):3537–3539. 86. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552. McGaugh et al. Supporting Information McGaugh et al. 10.1073/pnas.1419659112 SI Text Summary of New Resources Available. For the 18 liver transcriptomes we generated, the raw reads can be found at the NCBI Sequence Read Archive (SRA062458 at www.ncbi.nlm.nih.gov/sra/?term= SRA062458 and SRP017466 at www.ncbi.nlm.nih.gov/sra/?term= SRP017466). Transcriptome assemblies, annotation summaries, and alignments for protein coevolution analyses are available through Dryad (dx.doi.org/10.5061/dryad.vn872). Individual identifiers for these data can be found under citation in Table S2. Transcriptome assemblies, annotation summaries, and alignments are available through Dryad: dx.doi.org/10.5061/dryad.vn872. i) The transcriptome assembly for each of the 18 individuals sequenced. These assemblies contain the longest ORFs produced by Trinity, which were then clustered by UCLUST into centroids to reduce redundancy within a single species’ transcriptome. A centroid may have collapsed multiple isoforms, truncated transcripts, and alleles from a gene, but it may also have collapsed very recent paralogs. ii) Trinotate annotation databases for each individual. The IDs in the database correspond to the centroid IDs in the transcriptome assembly described above. iii) Putative ortholog amino acid alignments and corresponding nucleotide alignments. We used OrthoMCL to cluster ORF centroids into putative orthologs from all of the species included in this study. Data are available as separate files for each ortholog (104,235 total orthologs with two or more species). Additionally, we included a spreadsheet showing the best BLAST hit of each putative ortholog cluster to the uniprot database. iv) “Best” ortholog amino acid and nucleotide alignments. The 104,235 putative orthologs described above often contained more than two representative sequences per species. For the first 15,000 putative orthologs (those with the most species included in the alignments), we used UCLUST to find the best representative per species per ortholog by taking the sequence that was closest to the centroid for that ortholog. v) The final nucleotide and amino acid alignments for the 1417 “control genes.” vi) The hand-curated nucleotide and amino acid alignments for 61 IIS/TOR network genes. SI Materials and Methods Sample Collection. Animals or tissues used in this study were provided by colleagues or our research colonies. Each individual was maintained or shipped to Iowa State University (ISU). In agreement with ISU Institutional Animal Care and Use Committee protocol 3-2-5125J, animals were euthanized by decapitation, exsanguinated, and dissected with relevant organs snap frozen. The exceptions were the cottonmouth and alligator (Agkistrodon piscivorus and Alligator mississippiensis), which were euthanized onsite in Texas and California, respectively, following our established protocol; snap-frozen tissues were sent to ISU. The animals used were of a variety of ages and both sexes, thus findings reported here are robust to variation in transcripts that depend on age, sex, and rearing condition (Table S2). Tissue and RNA Extraction and Sequencing. Total RNA was isolated from 12 to 19 mg of snap-frozen liver from each of 18 individuals from 17 species: a single individual for 16 species and two different ecotypes from one species for Thamnophis elegans (Table S2 and Fig. S2). We followed standard protocols including Qiagen McGaugh et al. www.pnas.org/cgi/content/short/1419659112 RNAeasy kit (Qiagen cat. no. 74104) with a DNA digestion on the membrane, as described in the manual. The quality and quantity of RNA was determined on an Agilent Bioanalyzer using a NanoRNA chip. For each sample, 1 μg total RNA was sent to the Duke Genome Sequencing and Analysis Core Resource for library preparation and to generate 100-bp paired-end reads using an Illumina Hi SEq. 2000 with TruSeq v3 chemistry with a standard insert size distribution. The library preparation protocol was based on the technical document TruSeq_RNA_SamplePrep_ Guide_15008136_A. Individual libraries were uniquely barcoded (indexed), and quality was checked on the Bioanalyzer DNA100 chip. For 15 non–garter snake species, five indexed libraries were pooled in each lane, and ∼8 pM of library pool was deposited on each lane. Because garter snakes (Thamnophis spp.) are focal species in our laboratory, the two Thamnophis species (three samples) were sequenced more deeply. The Thamnophis couchii indexed library was pooled with separately indexed libraries from two individual Thamnophis elegans of different ecotypes (1) (meadow and lakeshore in Table S2). This Thamnophis pool (one T. couchii and two T. elegans individuals) was sequenced twice, resulting in larger amounts of data available overall for these two species. None of the libraries were normalized. The raw reads for the 15 species excluding Thamnophis species can be found at the SRA SRA062458. The raw reads for the three garter snake liver transcriptomes (i.e., one from T. couchii and two from T. elegans) can be found at the SRA SRP017466 (samples HS08, HS11, and TC). Processing and de Novo Assembly of Reads. For de novo assembly of each species’ transcriptome, we used the Trinity version released on February 25, 2013 (2). Original reads were processed by the following methods. The following processing steps were performed using the Fastx tool kit, (hannonlab.cshl.edu/fastx_toolkit/), Cutadapt (3), and Trimmomatic (4). i) Fastx_trimmer was used to remove the first base, as Illumina personnel indicate that this base can be unreliable (Gary Schroth). ii) Cut-adapt was used to trim adapters from the 3′ ends of reads with an allowed error rate of 0.01. iii) Trimmomatic was used to remove reads with sliding windows of 6bp that had average quality scores of 30 or less, and then reads less than 30 bp in length were removed. From this point, reads that were orphaned (only the left or the right remained after processing) were removed from the left and right read files. These reads were placed at the end of the left read files, as specified in the Trinity manual. All default settings were kept for transcriptome assembly. Transcriptome Quality Assessment and Annotation. We sequenced 33.73–140.95 million reads per species (mean: 50.23; median: 42.10). Reads were assembled into 87,016–221,818 contigs using Trinity (mean: 155,855; median: 165,685). Contigs shorter than 200 bp were excluded (5). Table S2 contains statistics about the Trinity assemblies. To evaluate the quality of a transcriptome assembly, we aligned the assembled Trinity transcripts to the proteins of the UniProtKB/ Swiss-Prot database downloaded on March 21, 2013 using blastx with an E-value cutoff of 1e-20 and allowing only a single target sequence to be reported. Next, we determined the percent of the UniProtKB/ Swiss-Prot protein that aligned to the best matching Trinity transcript 1 of 19 through the perl script analyze_blastPlus_topHit_coverage.pl provided through Trinity. Likely coding regions (ORFs) were extracted from Trinity transcripts using Transdecoder. Transdecoder identified between 25,945 and 113,672 best ORFs (mean: 65,766; median: 72,152). Transcriptome size of the best ORFs identified in Transdecoder ranged from 27.80 to 113.60 Mb (mean = 69.54 Mb; median = 78.65 Mb), indicating ∼57- to 269-fold coverage when considering the amount of filtered and trimmed data input into Trinity (range, 5.21–11.55 Gb; mean: 6.80 Gb; median = 6.43 Gb). These ORFs were clustered into centroids using USEARCH (6) separately for each transcriptome (see below for a more detailed description). The coding sequence of the peptides produced by Transdecoder and the centroids were also analyzed with the analyze_ blastPlus_topHit_coverage.pl script provided by Trinity to determine the percent length of coverage for the top hit in the UniProtKB/Swiss-Prot database. We conducted this analysis on the best ORF sequences and separately on the centroids to examine whether the Transdecoder or USEARCH processes resulted in ORFs that spanned a greater percent length of their best blast hit relative to the originally produced Trinity transcript contigs. Blastx analysis of the original Trinity transcripts to the UniProtKB/Swiss-Prot database resulted in an average of 54.10% (SD = 5.82%; median = 55.19%) of transcripts that matched a hit in the UniProtKB/Swiss-Prot database, covering at least 80% of the length of their best blast hit. This number increased slightly when the best ORF transcriptomes provided by Transdecoder (average: 56.30%; SD: 5.50%; median: 56.64%) or the USEARCH centroids (average: 58.00%; SD: 5.74%; median: 58.41%) were used in the Blastx analysis. Last, because the Anolis carolinensis genome is published, we examined the percent length of transcripts from the best ORF analysis from Anolis sagrei, which aligned to the Anolis carolinensis genome, using BLAT (7) (similar alignment tool to BLAST) to provide a complementary measure of how many full-length transcripts were assembled. We did not do this for Alligator because this genome is less complete and low-length measures can be a reflection solely of a fragmented genome assembly. We aligned Anolis sagrei Trinity-assembled Transdecoder-filtered RNAseq data to the Anolis carolinensis genome v2.0 genome scaffolds. From this, we found that 67% of transcripts aligned over at least 95% of their length with at least 80% identity, suggesting that ∼67% of our transcripts represent nearly full-length transcripts. Interestingly, 89.5% of transcripts aligned over at least 25% of their length, and only 51.3% of transcripts aligned over 99% of their length, indicating that, although many of our transcripts are present in the Anolis carolinensis genome, our assembly of RNAseq data did not capture all full-length transcripts. These percentages were comparable for the centroids (65.4%, 88.8%, and 49.3%, respectively). The peptides from Transdecoder and centroids created in USEARCH were annotated with the Trinotate pipeline, which incorporates homology searches, protein domain identification, protein signal prediction, and evaluation with EMBL Uniprot eggNOG and GO Pathways databases. Specifically, we used Trinotate to use blastp to find the top hit in the UniProtKB/SwissProt database (maximum e-value cutoff 0.001), HMMER to query the PFAM database downloaded on March 29, 2013, signalP to predict the presence and location of signal peptide cleavage sites, and tmHMM to predict transmembrane helices in proteins. The final Trinotate report was made with an e-value cutoff of 0.001 for reporting the best blast hit and additional annotations. On average, 77.62% of the best ORFs had matches in UniProtKB/Swiss-Prot database (maximum e-value cutoff of 0.001), 61.17% had matches in the PFAM database, 5.73% had matches in signalP, and 12.38% percent had matches in tmHMM. On average, 18.3% of centroids were left with no annotation from any procedures performed McGaugh et al. www.pnas.org/cgi/content/short/1419659112 (range, 10.43–26.51%). All Trinotate annotation databases are publically available on Dryad: dx.doi.org/10.5061/dryad.vn872. Identifying Candidate Orthologs and Generating Multiple Species Alignments. For any comparative evolutionary analysis, identifi- cation of putative orthologs and accurate alignment are essential but can be extremely challenging due to paralogs and alternative splicing. In addition, we found that in some cases, a particular species may have Trinity transcripts that blasted with high confidence to the particular gene of interest, but this species was unrepresented in our final multiple species alignments because Transdecoder did not include the transcript from that particular gene in its best ORF candidate file. To avoid this complication, we only used ORFs from the longest ORF file and not the best ORF predictions. We reduced overlap between the ORFs for each individual species using USEARCH (6) with an identity threshold of 95% of the nucleotide sequences sorted by length (gaps are counted as differences in USEARCH). Because our goal was to cluster isoforms to have one representative sequence per gene, we reduced the gap penalties to the settings -gapopen 5I/1E -gapext 0.1I/0.1E. These clustered centroids were used for all subsequent analyses. For these clustered ORFs for each species (centroids from USEARCH), we identified putative 1:1 orthologs across species using OrthoMCL (8), a program that is based on reciprocal best blast hits. We analyzed a dataset that contained 74 total samples: the 18 samples from our transcriptome project and 56 additional transcriptomes and gene sets available from genome projects and other past studies (Table S2). These literature-derived transcriptomes were made with various technologies and sometimes pools of individuals. We used the transcriptome assemblies provided by the authors in all cases. Transdecoder and USEARCH were run on literature-derived transcriptomes and RNA sets downloaded from NCBI. Ensembl protein sets, and associated cDNAs were downloaded from the Ensembl website and used without additional processing steps. Species from Ensembl, where the protein or gene datasets contained large contiguous stretches of unknown bases, were not included in our analysis. All amino acid and corresponding nucleotide clusters are available as separate files (104,235 total orthologs with two or more species) on Dryad along with a spreadsheet showing the best blast hit of each ortholog cluster to the uniprot database. In total, we started with 74 species, but pared this to 66 species for the alignments because the additional eight species were not well represented. These eight species (as named in the alignments: Python, Quail, Phrynops, Tuatara, Caiman, Caretta, Elaphe, and Emys) generally had lower quality or quantity of reads mined from previous studies, and all 74 species are represented in the original alignment data available through Dryad: dx.doi.org/10.5061/dryad.vn872. We focused our analysis on 61 genes in the IIS/TOR network. The final set of genes (Fig. S1 and Tables S1 and S3) was determined by presence in KEGG pathways for Human Insulin Signaling (KEGG 04910) and Human mTOR (KEGG 04150) (9, 10), connections with Panther Pathways for MAP kinase cascade and insulin/IGF pathway-protein kinase B signaling cascade, and/or previous publications (11). We specifically wanted to include the extracellular hormones, receptors, and binding proteins in the insulin signaling network, which had not previously been included. To identify this focal set of genes in our OrthoMCL orthologs, we performed two searches using Blastp. We made a reference gene set from the KEGG proteins from chicken or anole. This reference gene set was used as a blast database, and Blastp was used to find hits of our translated orthologs to the KEGG-derived protein blast database with an e-value cutoff of 1e-5. We also required a percent identity of at least 50% and at least 60% of our ortholog to align to the KEGG protein. Second, we conducted a 2 of 19 Blastp search using uniprot as the blast database. We used Blastp to identify the best hit in the uniprot blast database for each of our OrthoMCL-defined orthologs. For genes to be included in our subsequent analyses, we used only those OrthoMCL-defined orthologs where both the criteria for the KEGG protein blast was met, and the description/name of the best blast hit from the uniprot blast output matched the name of the focal KEGG protein. For the genes of interest, many of the OrthoMCL-defined orthologs contained multiple sequences from each species. Our goal was to generate alignments with one sequence per gene per species. We reduced redundancy in each OrthoMCL-defined ortholog using USEARCH as above. For each species, we used only the sequence that was most like the centroid of the USEARCHclustered OrthoMCL-defined ortholog. In a few cases, reptiles and mammals formed separate clusters. All genes were clustered with identical parameters in USEARCH; however, the few genes that exhibited taxon-specific clusters may be particularly fast evolving genes. For example, IGF2, PPP1R3D, MKNK1, and SOCS1 had mammal-specific and reptile-specific clusters. In some cases, we were able to combine these genes that appeared in separate clusters into one single multiple sequence alignment (e.g., IGF1R). For IRS4, marsupials and reptiles were clustered separately by USEARCH, and placental mammals were grouped in a separate ortholog by OrthoMCL. We did not combine these clusters for further analyses because the sequences were too divergent to create robust alignments. IRS4 has been identified as being under positive selection in other studies (12, 13), indicating that the alternative explanation for high divergence [i.e., that mutations in IRS4 function may be tolerated with only moderate phenotypic consequences (14)] may have weaker support. IRS4 is located on the X chromosome in mammals and chromosome 4 in chicken, and therefore it may be subjected to different selection pressures in placental mammals vs. reptiles—which includes birds—due to its different location in the genome (has three fourths the effective population size in mammals as autosomal genes). As with the other IRSs, IRS4 interacts with the intracellular domain of the insulin receptor and IGF1R (15–17). IRS4 functions in the cytoplasm in cell cycle progression and growth (18). It is also linked with decreased litter size, reduced growth and glucose homeostasis (14), and reduced maternal nurturing and canonical maternal behaviors in mice (e.g., aggression against intruders and extended latency in retrieving wayward pups) (14, 19). Given the high divergence of IRS4 in reptiles and mammals, it would be interesting to pursue whether IRS4 serves a particularly important role in physiological differences between reptiles and mammals. For each putative ortholog clustered by USEARCH, we created multiple species alignments of the amino acid sequences using MSAProbs (20), which is more accurate than many other common aligners (21, 22). RevTrans (23) and the original nucleotide sequence for the centroid were used to generate nucleotide alignments from amino acid alignments. The command line version of TranslatorX (24) was used in conjunction with the MSAProbs alignments to produce Gblocks-cleaned amino acid and nucleotide alignments (25, 26) with the commands “-c 1 -t T -g -b4 =2 -b5 =a -b3 =10 -b2 =34 –t =p -p=s.” Because the nucleotide sequences were predicted ORFs from Trinity, we did not expect translation of the nucleotides to produce withinspecies frameshifts or stop codons; thus, we did not use a more sophisticated program such as MACSE (27). For additional quality control of the test gene alignments, we visually inspected the alignments to ensure they were correctly aligned. Typically, editing included fixing aligned gaps and truncated sequences with obviously different start or stop codons causing small chunks at the beginning and end of an alignment for one or several species to be substantially different from all others. We made every effort to be as conservative as possible. In addition, we ensured that no paralogs were present in the alignments McGaugh et al. www.pnas.org/cgi/content/short/1419659112 by blasting (with Blastp) each sequence in each alignment to the uniprot database and confirming that, for a single alignment, all sequences had a best blast hit with gene names identical to the expected for that gene. These measures were not performed for the control genes due to the enormity of manual correction for so many alignments. This approach makes comparisons between focal genes and control genes more conservative, as poorer quality alignments for control genes would artificially inflate how much positive selection is found in the control genes (28). We also note that Gblocks is thought not to perform well, especially with indels (29), and therefore for a subset of genes (n = 70), we also used PRANK and GUIDANCE (30). We found that the nucleotide alignments contained on average 64.1% gaps (minimum = 17.4%, maximum = 95.3%) when generated by PRANK and GUIDANCE and 12.7% gaps (minimum = 0.2%, maximum = 31.5%) when processed with MSAProbs and Gblocks. For this reason, we favored the alignments generated with MSAProbs and GBlocks and used this method for all other alignments. The final focal gene alignments are available through Dryad: dx.doi.org/10.5061/dryad.vn872. Classification of Connectivity. Because a gene’s position and extent of connections with other genes in a network influences the impact that mutations might have on the target phenotype (31, 32), we were interested in investigating whether more highly connected genes [defined as the number of other genes or proteins to which a gene is directly connected (33)] have a different evolutionary rate than peripheral genes with few connections. To estimate the level of connectivity for each gene in the IIS/TOR network, we used NetworkAnalyzer (34) within Cytoscape v3.1.0 (35) to calculate the connectivity of all nodes in the BioGrid human reactome 3.2.95 (36) (including protein-protein and proteingene interactions). We focus on the measures of node degree (i.e., connectivity) and betweenness centrality (34). Node degree (i.e., connectivity) is the number of edges or interactions that gene has with other genes or proteins. Betweenness centrality ranges from 0 to 1 and reflects the amount of influence a node exerts on the interactions of the other nodes (37). Molecular Evolutionary Analyses. For many of the analyses of molecular evolution, we required a tree that best represented the species tree for the 66 taxa included in our analyses. Because no single study exists with the tree for all of these species, we combined results from refs. 38 to 45 to generate a tree topology without branch lengths. Newick Utilities (46) was used to prune trees that contained fewer than the total 66 species. Control Genes. We identified 1,417 putative orthologs that contained all 66 species and referred to these as control genes. The control genes may be biased toward being conserved, as it is conceivable that conserved genes are more likely to be recovered for all 66 species. Our dataset of 61 focal IIS/TOR genes generally contained most of the 66 species. In this focal gene set, 20% of the genes contained all 66 species and 62% of our focal genes contained 60 or more species (median = 62; mode = 66; mean = 58.4). The missing species in our 61 focal genes were mostly from the species for which we only had liver transcriptomes, and these species could potentially be missing in the alignments because the missing genes were not expressed in the liver and not because they were too divergent to be included. Therefore, we conducted two supplemental analyses using a reduced number of genes to test how sensitive our conclusions were to the specific control genes in our study. Supplemental analysis I. First, we conducted an additional analysis that limited our focal gene dataset to the 48 genes containing between 56 and 66 species (mean: 62.6 species; median: 63.5; mode: 66 species). Although this is not a perfect comparison with the controls, this 48 focal gene set represents a very similar species 3 of 19 number distribution as the control gene dataset. This analysis was consistent with the findings of the original 61 focal gene set; therefore, we present the 61 focal gene set in the main text. Briefly, results from our analyses of this reduced 48-gene IIS/ TOR dataset include the following: i) Extracellular genes of the IIS/TOR network exhibited greater divergence between mammals and reptiles than 1,417 control genes and intracellular genes. Extracellular genes had equivalent Ks compared with control genes (Wilcoxon rank sum test, W = 3818, P = 0.111), but had notably greater median ω (W = 2847, P = 0.015) and Ka (W = 2162.5, P < 0.003). Compared with intracellular genes, extracellular genes also had significantly higher ω (W = 243, P = 0.022) and Ka (W = 266, P = 0.004), but Ks did not differ (W = 199, P = 0.287). ii) Collectively, the intracellular IIS/TOR genes within the 48gene set did not have elevated median Ka, Ks, or ω compared with control genes (P > 0.287 in all cases). For ω and Ka, the medians for intracellular and control genes were very similar. Specifically, the median Ks for extracellular genes was 1.91; the median Ks was 1.61 for intracellular genes and 1.51 for control genes. The median Ka for extracellular genes was 0.16; the median Ka was 0.087 for intracellular genes and 0.083 for control genes. Finally, the median ω for extracellular genes was 0.10; ω was 0.051 for intracellular genes 0.054 for control genes. iii) When comparing the distribution of ω values for extracellular vs. intracellular IIS/TOR genes in the 48 focal gene set to the distribution of the ω values for the control genes, the extracellular genes were 6.5 times more likely than control genes to reside in the highest 5% of ω values (OR, 6.51; 95% CI, 1.29, 32.86). The intracellular genes were not more likely than controls to be in the top 5% (OR, 1.00; 95% CI, 0.236, 4.219). These odds ratios imply that the extracellular group contains the fastest evolving components of the IIS/TOR network. These three conclusions are in agreement with the 61 focal gene set analyses, which includes some genes with fewer species, presented in the main text. Supplemental analysis II. In addition to the 48 IIS/TOR focal gene analysis detailed above, we conducted a second analysis to address a different potential issue with the control genes. Specifically, to assess how potentially conserved the original 1,417 control genes with 66 species were, we identified additional control genes that contained phylogenetically-matched species sets as our 61 IIS/ TOR focal genes. In many cases, we only had a single phylogenetically matched control gene for any given IIS/TOR gene. We constructed a focal data set of 43 IIS/TOR genes (31 focal IIS/ TOR genes with phylogenetically matched controls + 12 focal IIS/TOR genes that contained all 66 species) and compared Ka/Ks between the 43 focal genes and the 43 phylogenetically matched control genes. These 43 pairs of matched focal and control genes contained between 34 and 66 species (mean: 61 species; median: 64 species; mode: 66 species). When there was more than one phylogenetically matched control gene for a particular focal gene, we used a random number generator and took the control gene with the largest random number. Although we would have liked to phylogenetically match all original IIS/TOR focal genes with fewer than the total 66 species to a control gene, or even multiple control genes, we did not have phylogenetically matched controls in all cases. For these 43 pairs of matched focal and control genes, the Ka/Ks and Ka values are somewhat elevated in the phylogenetically matched control gene set relative to the full set of 1,417 control genes (Ka/Ks phylo-match 43 control genes: 0.076; original 1,417 control genes: 0.054; Ka phylo-match 43 control genes: 0.115; original 1,417 control genes: 0.083). However, extracellular genes (n = 7 that had phylogenetically matched controls) still McGaugh et al. www.pnas.org/cgi/content/short/1419659112 had significantly larger Ka values and marginally nonsignificant Ka/Ks even compared with the phylogenetically matched control gene set. One-tailed Wilcoxon rank sum tests indicated that trends were identical in the 43 matched control-focal comparisons relative to the other two gene sets we analyzed. Median Ka values were significantly different between the seven extracellular genes and their phylogenetically matched control genes (W = 83.5, P = 0.032), Ka/Ks values were marginally nonsignificant between extracellular and control genes (W = 100.5, P = 0.083), and Ks values remain nonsignificant between extracellular genes and controls (W = 120.5, P = 0.204). We suspect that the Ka/Ks Wilcoxon test is not significant in this reduced gene analysis due to a lack of power. In addition, many of the extracellular genes that were consistently found to be under positive selection within PAML (IGF1, IGF1R, IGF2, IRS1, and IRS2) were not included in this reduced analysis because no appropriate phylogenetically matched control genes were available. Altogether, these two supplemental analyses that considered different means of designating control genes, (i.e., the 48 focal genes that better matched the number of species in the 1,417 control genes and the 43 paired phylogenetically matched controlfocal genes), are in agreement with our results reported in the main text for the 61 focal IIS/TOR genes and the corresponding 1,417 control genes. Testing Whether the IIS/TOR Network Contains Fast-Evolving Outliers. To test for differences in evolutionary rate between mammals and reptiles for each of our focal genes, we used the clade model C, with M2a_rel as the null hypothesis (47). Clade models are less prone to false positives than branch-site models and better account for among-site variation in selective constraint (47). Importantly, the clade model C tests whether there is evidence for differential ω between the test clade and the remainder of the tree, and we did not use the results from the clade model as support for positive selection. For those test genes that were significant via the clade model, we compared the ω values (i.e., Ka/Ks) for each clade via paired Wilcoxon test and χ2 tests. To calculate evolutionary parameters ω, Ka, and Ks, we processed the GBlocks nucleotide alignments in PAML. Because we were specifically interested in molecular evolution between mammals and reptiles, for all IIS/TOR genes and control genes, we calculated the pairwise mammal and reptile divergence (every reptile-mammal comparison) from the 2NG.dN and 2NG.dS output files from PAML, which always output the same values regardless of the model because they are calculated with the Nei and Gojobori method (48). These results were very similar to confirmatory analysis conducted using the analysis package from libsequence (49). Using a Wilcoxon rank sum test on the median ω, Ka, and Ks of pairwise comparisons between reptile and mammalian taxa, we tested whether the extracellular IIS/TOR genes or the intracellular IIS/TOR genes exhibited greater divergence between mammals and reptiles than the control genes. Testing for Positive Selection for the IIS Network Genes. We conducted branch-site tests for positive selection in PAML (50–52), which examines the likelihood of a modified model A (model = 2, NSsites = 2, ω not fixed to 1) and the likelihood of the corresponding null model with ω fixed to 1. Two times the difference in likelihood between the two models conforms to a χ2 distribution, permitting statistical tests. For the likelihood ratio test (LRT), a P value was estimated assuming a null distribution that is a 1:1 mixture of χ2 distribution with 1 and 0 df (53, 54). For negative test statistics from the LRT (meaning that the null model fit the data better than the alternative), typically one would run PAML several times for these particular genes. Due to the computational time required for the number of genes we were testing and that it was unlikely that these genes would have 4 of 19 large positive test statistics in subsequent runs, we did not rerun any genes multiple times. Validation of Procedure Based on IGF1. Previously, we documented increased divergence of IGF1 in lizards and snakes relative to other reptiles and mammals (55). Those data were generated using single gene Sanger sequencing. In contrast, here we used a next-generation sequencing (NGS) approach, generating transcriptomes from Illumina RNAsEq. (100-bp paired end) and followed by nearly automated multiple sequence alignments. We use IGF1 for comparison between these methods for both sequence quality and for molecular evolutionary analyses. To estimate sequencing error, we compared the pairwise sequence identity of IGF1 for the six species included in both approaches. For each of these pairs, the sequence identities were >99.4% identical. In each case that was not 100% identical between the two approaches, the difference was due to an ambiguity code in the Sanger sequencing that represented within-species allelic diversity. Thus, we are confident that our NGS approach produced highly accurate sequence data for analysis. Furthermore, our NGS approach added an additional 200 bp of sequence to the IGF1 alignment for every species. To validate the molecular evolution analyses, we compared the sites that were identified to be under positive selection in our previous IGF1 analysis (55) to our current NGS approach [both approaches using the branch-site model in PAML (50–52), with the branch leading to Squamata (snakes and lizards) as the foreground branch]. Every positively selected site identified in ref. 55 had as strong or stronger support for being under positive selection in our current analyses. Overall, our NGS methods appear to improve on traditional methods. Mapping Positively Selected Sites onto Protein Structures of Hormones and Receptors. To understand how positive selection may affect interactions between IGF hormones and receptors, we mapped the sites with a high probability of being under positive selection from the PAML branch-site analysis onto the predicted protein structures. Because snakes in particular appear to be highly divergent, we use a snake as a representative reptile for visualizing the predicted protein structures. We used Swiss-Model (56) to thread the snake sequences onto the human protein structures from the PDB: INS, PDB ID code 2KQP.1 (57); IGF1, PDB ID code 1BQT.1 (58); IGF2, PDB ID code 2L29.1 (59); IGF1R, PDB ID code 1IGR.1.A (60); and IGF2R, PDB ID code 2V5O.1 (61). From the PAML branch-site analyses described above, we mapped the BEB posterior probability >0.90 of being under positive selection (branch-site model of positive selection) in mammals or reptiles onto the amino acids in the mature protein structures and the full propeptide alignments. Separately for the reptile and mammal clades, we mapped the sites predicted from both branch-site models: one that specifically tests for selection on the branch leading to the clade of interest set in the foreground (e.g., the branch leading to reptiles) and one that tests for positive selection across the whole clade of interest (e.g., the whole clade of reptiles). We evaluated the clustering of positively selected sites within functional domains of the protein structure, and their relationship to the binding surfaces between the hormones and the receptors, as described by previous literature (Table S4). Evaluating Variation in the Presence and Length of the IGF Binding Domains of the IGFBPs. The binding proteins consist of two do- mains: the IGF binding domain on the 5′ end and the thyroglobulin domain on the 3′ end. We noted that the IGFBPs were often truncated to various degrees on the 5′ end, leading to extensive variation among species in the length or presence of the N-terminal binding domain. We realigned the original sequences using ClustalX to specifically evaluate variation in the length of McGaugh et al. www.pnas.org/cgi/content/short/1419659112 the IGF binding domain in the context of the protein structure. Additionally, we calculated similarity for each binding protein across the whole alignment using a Poisson correction model (62) in MEGA6 (63). Coevolution Analysis of IGF Hormones and IGF2R in Reptiles. We used CAPs (64) to test for coevolving amino acid sites between IGF1 and IGF2R and between IGF2 and IGF2R in reptiles. CAPS uses the phylogenetic relationships from the sequence alignments along with the 3D structure of the proteins to identify coevolving pairs of amino acid using Pearson correlation coefficients. For these analyses, we used the amino acid sequence alignments with their respective human protein structures from PDB: IGF1, BQT.1 (58); IGF2, 2L29.1 (59); and IGF2R, 2V5O.1 (61). We used the following settings: bootstrap value of 0.8, gap threshold of 0.8, α threshold of P = 0.01, and simulated 100 alignments. Significance is estimated by comparing the observed coefficients to a distribution from pseudorandomly sampled amino acid pairs, correcting for multiple comparisons and nonindependence of data using a step-down permutation procedure (64). Comparison of phylogenetic gene trees can be used to detect coevolution among genes (65). We used the MMMvII algorithm (66) to identify which subgroups of the hormone family (INS, IGF1, and IGF2) and IGF2R were most tightly coevolving across species. The MMMvII algorithm detects similarity between phylogenetic trees, using information from the both the tree topology and the branch lengths, which are calculated by MMMvII. MMMvII identifies the most tightly coevolving subtrees for any given tolerance level, returning all possible solutions. For each hormone, we constructed a single multiple sequence alignment of the mature protein sequences using ClustalX (67) within Geneious v6.1.6 (68). For IGF2R, we focused on the region of the protein that is involved with binding the hormones: domains 11–13. To identify the most tightly coevolving subgroups of proteins, we set the tolerance level to 0.2. High levels of coevolution are achieved by large or multiple subsections of the gene trees changing in a coordinated fashion (topology and branch length). With this method, highly connected proteins may have no observable coevolution if they are highly conserved. SI Results Divergent Evolutionary Rates Between Mammals and Reptiles. We tested for differences in mammal-specific ω and reptile-specific ω using the clade model (47) for each of our 61 focal genes (each alignment contained 19–66 species; median: 62) in PAML (69). Significant genes included five extracellular genes (of a total of 10) and 21 intracellular genes (of a total of 51). Extracellular genes were not statistically more likely to be significant than intracellular genes in the clade model (Fisher’s exact test, P = 0.430). We also compared the distribution of likelihood ratio test statistics for the clade model relative to a null model for 1,417 control genes (SI Materials and Methods) to test statistics obtained for the 61 members of the network. Only IGF2R exhibited a result that was in the largest 5% of test statistics for IIS/TOR network + control genes. We compared the ω for each clade for those control genes where the clade model indicated support for a significant difference in ω between reptiles and mammals (n = 797 before sequential Bonferroni correction, n = 491 after sequential Bonferroni correction). In short, we found no appreciable difference between control and test genes; after correction for multiple testing, both had ∼77% of genes with larger ω in reptiles relative the rest of the tree. Connectivity Is Associated with Evolutionary Rate. Nonsynonymous reptile-mammal divergence (Ka) and ω were highly correlated with connectivity. For extracellular genes, Ka and ω were negatively correlated to the degree of connectivity (Ka Spearman’s ρ = −0.71, P = 0.02; ω Spearman’s ρ = −0.84, P < 0.01), and Ks 5 of 19 exhibited a positive, but nonsignificant relationship with degree of connectivity (Spearman’s ρ = 0.40, P = 0.26). Likewise, for intracellular genes, Ka and ω were negatively correlated to degree of connectivity (Ka Spearman’s ρ = −0.39, P < 0.01; ω Spearman’s ρ = −0.34, P = 0.01), whereas Ks was not (Spearman’s ρ < 0.01, P = 0.99). In other words, more connected genes generally had smaller nonsynonymous substitution rates than less connected genes; this result suggests that more connected genes experience more purifying selection than less connected genes. Importantly, the relationship of Ka and ω to degree of connectivity was stronger for extracellular genes than for intracellular genes. Indeed, an interaction term of connectivity and classification (intracellular vs. extracellular) in a linear model was nearly significant (P = 0.07), with extracellular genes having a steeper slope. Nearly identical results were obtained when using betweenness centrality (extracellular Ka Spearman’s ρ = −0.68, P = 0.03; ω Spearman’s ρ = −0.82, P < 0.01, Ks Spearman’s ρ = −0.37, P = 0.29); therefore, we focus further analyses on connectivity. Expression level governs the amount of purifying selection (70, 71). Thus, expression must be accounted for to conclude that the lower evolutionary rates we observed in more connected genes are because of high connectivity. Finding a suitable expression measure across such a broad range of taxa is difficult. Because protein length is negatively correlated with expression level, we used the longest protein isoform in human to provide a proxy for potential impacts of expression on protein evolutionary rate. We found no relationship of Ka, Ks, ω, connectivity, or betweenness with the length of the longest protein isoform from human (Spearman’s ρ < 0.15, P > 0.24 in all cases). Also, more highly expressed genes experience higher selection on Ks for easier translatable codons. Thus, a relationship between Ks and connectivity is a strong indication that expression level, not connectivity, is driving molecular evolution (71). We see no significant relationships between Ks and connectivity; hence, expression may not be a strong driver of the relationship between ω and connectivity in our data. Evolutionary rates of members of the IIS/TOR network in our study were negatively related with connectivity. This result is consistent with findings for other pathways, such as the N-glycosylation pathway of primates (72) and the yeast proteome (71, 73–76). Likewise, a negative relationship of closeness centrality with Ka and ω occurs in the mammalian phototransduction pathway, and closeness centrality is largely influenced by connectivity (77). Interpreting our findings requires two caveats. First, GC-biased gene conversion (preferential substitution of GC during recombination) can produce results that resemble positive selection, although such a confounding effect is usually attenuated with increased phylogenetic distance due to the lack of conservation in location of recombination hotspots (78). Thus, for mammal-reptile comparisons, this may not be a substantive concern. Further, genes indicated with the branch-site model to be under positive selection are less likely to be confounded by biased gene conversion than those indicated by the branch-test model (78). Second, we did not directly account for gene expression variation, intron number, and gene essentiality, and these are all variables associated with protein evolution (71, 75, 76). Not including these covariates could affect our conclusion regarding the importance of connectivity in influencing evolutionary rate. The choice of an appropriate tissue and developmental time point in which to measure expression level for all 66 species and the lack of gene expression data suitable for quantification in some species are vexing problems. However, we suspect that molecular evolutionary rate is influenced, at least in part, by connectivity because we found no relationship of Ka, Ks, ω, connectivity, or betweenness with the length of the longest protein isoform from human (a proxy for expression). In addition, as explained above, highly expressed genes experience selection on Ks for easier translatable codons, and we see no significant relationships between Ks and McGaugh et al. www.pnas.org/cgi/content/short/1419659112 connectivity—a relationship that would indicate that expression level, not connectivity, is driving molecular evolution (71). Tests for Positive Selection. We tested whether positive selection shaped evolution of IIS/TOR pathway genes using a branch-site model in PAML. This model, with the reptile clade specified as the foreground branch, was favored over the null model of neutral evolution for only two genes, both of which were intracellular: RPS6KA6 and MLST8 (after sequential Bonferroni correction; Table S3). In this test, the entire clade of reptiles, including terminal branches, was specified as the foreground branch. This relative lack of significance is likely due to variable selection among the diverse terminal branches, which span >350 My of evolution. Additional models are discussed in the main text and include a branch-site model of positive selection with the branch leading to the reptile clade as the foreground branch and a similar model with the branch leading to mammals designated as the foreground branch. We also conducted a series of taxon-specific branch-site tests, where the branch leading to a particular clade was specified as a foreground branch. The results of all tests are presented in Table S3. As detailed in the main text, our results are concordant with previous work that suggests that extracellular genes in the IIS/TOR network may evolve more rapidly and are under stronger positive selection than the remainder of the network. For instance, DAF-2 (a homolog of the vertebrate IGF1R and INSR genes) is the most divergent protein in the IIS/TOR network across Caenorhabditis species (72), and changes in this receptor and interactions with its hormone may allow for rapid adaptation under shifting environmental conditions (71). Likewise, residues within the homolog of IGF1R (Drosophila’s insulin-like receptor) evolve under positive selection in Drosophila (79). In addition, IGF1 evolves under strong positive selection in snakes and lizards (55). Evolution in Squamata. Because previous research indicates that components of the IIS/TOR network may be under strong positive selection in Squamata (lizards and snakes) (55), we also tested the branch-site model using the branch leading to snakes and lizards as the foreground branch. Fourteen genes exhibited significant support for positive selection along the branch leading to lizards and snakes; seven remained significant after sequential Bonferroni correction (IGF2R, IGF1R, PIK3R5, IRS2, IRS1, IKBKB, and TSC2; Table S3). These seven also exhibited test statistics that were in the largest 5% of test statistics for all (control and test) genes analyzed in this comparison. For crocodilians, birds, and turtles, fewer genes provided significant support for the branch-site model either before (13, 11, and 11 genes, respectively) or after multiple test correction (6, 1, and 5 genes, respectively). The bird comparison is particularly notable because birds represent an independent evolutionary origin of endothermy (vs. mammals). We more explicitly assayed higher divergence in Squamata relative to the rest of the tree by the clade model with Squamata as the foreground clade. We detected 24 genes with significant support (postmultiple test correction) for heterogeneous rates relative to the rest of the tree (a total of 33 before multiple test correction). For 14 of these significant genes, the ω estimated for the Squamata clade was larger than the estimate for the rest of the tree. However, this difference between the numbers of genes in Squamata that were more highly divergent than the rest of the tree was not significant (P > 0.3). Notably, IGF1, IGFBP2, RHEB, IGF2R, and INSR exhibit test statistics that were in the largest 5% of test statistics for all (control and focal) genes analyzed for the clade model with Squamata in the foreground. In comparison, we detected 15 genes with significant support (after multiple test correction) for heterogeneous rates relative to the rest of the tree when using snakes as the foreground clade (a total of 30 before multiple test correction). For 11 of these significant 6 of 19 genes, the ω estimated for snakes was larger than the estimate for the rest of the tree, and the reverse was true for the other 4 genes. This difference between the numbers of genes in snakes that were more or less divergent than the rest of the tree was nearly significant (χ2 = 3.27, P = 0.07). Similar results were obtained for a paired Wilcoxon test (V = 24, P = 0.04). However, only PRKCG, IGFBP2, and INSR exhibit test statistics that were in the largest 5% of test statistics for all (control and test) genes analyzed for the clade model with snakes in the foreground. Overall, it appears that Squamata has qualitatively higher divergence in IIS/TOR network genes, and several more genes may be under positive selection on the branch leading to Squamata, than on the branch leading to crocodilians, birds, and turtles (tested independently). However, these differences are not exceptionally unique, and each branch of reptiles, excepting avian reptiles, contains multiple IIS/TOR genes under positive natural selection. Mammal-Specific and Reptile-Specific Evolution of Hormones and Receptors. The amino acid sites that define the ability of IGF1 directly interact with the C-domain of the IGFs to regulate binding affinity. More specifically, from mutagenesis studies, one of these sites under positive selection on the IGF1R CR-domain (F251, human numbering) directly interacts with the IGF1 C-domain to regulate binding of IGF1R to IGF1 (81). Furthermore, one of the sites under positive selection on the reptilian IGF1 C-domain (R37, human numbering) regulates binding of IGF1 to IGF1R (80) (Table S4). Thus, the location and clustering of these positively selected sites on the hormone and the receptor suggest positive selection on the binding affinity between IGF1 and IGF1R across the reptiles. This signature of positive selection is absent in the mammalian IGF1 and IGF1R. In contrast, we see positive selection on the C-domain of IGF2 in mammals that regulates the binding to IGF1R and INSR. These positively selected sites in the C-domain of mammalian IGF2 may cause variation in the binding affinity between IGF2-IGF1R and IGF2-INSR among mammal species. Specifically, one of the IGF1 residues in mammals that inhibits high-affinity binding to IGF2R (R55) is an isoleucine (I55) in snakes, which is predicted to promote binding to IGF2R due to its hydrophobicity. and IGF2 to bind IGF1R (mainly in domains A and B; Fig. 2) are conserved, indicating that these protein sequences are likely functional (80). The C-domain of IGF1 and IGF2 form a flexible loop that is oriented toward the binding pocket of INSR and IGF1R and contacts the CR domain in the binding pocket of the IGF1R and INSR (81) (Fig. 2). The IGF1 and IGF2 C-domain is essential to bind IGF1R (82), and variation in the C-domain regulates the specificity of the hormones binding to IGF1R (82) and to INSR (83). INSR has two isoforms due to the absence (INSR-A) or presence (INSR-B) of exon 11 (84). In mammals, both INSR isoforms bind INS with high affinity, but only INSR-A binds IGF2 with high affinity, and neither bind IGF1 with high affinity. This difference in INSR binding between IGF2 and IGF1 is driven by the C-domain of the hormones (83). For IGF1, 30% percent of the C-domain amino acids in reptiles are predicted to be under positive selection, whereas none of the C-domain sites of IGF1 in mammals are predicted to be under positive selection. In contrast, for IGF2, 25% percent of sites in the C-domain amino acids in mammals were identified as being under positive selection, and no sites were under positive selection in the reptilian IGF2 C-domain (Fig. 2 and Table S4). This positive selection in the C-domains of reptile IGF1 and mammal IGF2 suggests their binding affinities to IGF1R and INSR are likely variable across the species in the respective clades. IGF1R has three domains that are predicted to play a role in binding both IGF1 and IGF2 hormones (L1-, CR-, and L2domains) (81, 85). Positively selected sites in reptiles clustered on the hormone-binding surface of the CR domain of IGF1R and include specific sites identified from mutagenesis studies to Coevolution of IGF2R and IGFs in Reptiles. In addition to high divergence in reptiles and snakes among focal genes mentioned above, many of the positively selected sites on the receptors and hormones are due to amino acid changes within the Squamates (lizards and snakes) relative to other reptiles. Our coevolution network analysis clearly signals strong coevolution of the receptors and hormones specifically within snakes or squamates. This rapid molecular evolution is in concordance with extensive recent work showing extreme adaptation in metabolic pathways of snakes (86, 87). Although nematodes and Drosophila are models for conservation of the intracellular IIS (88, 89), snakes and lizards may be models for examining the coevolution of the extracellular hormones-receptors. The CAPS analysis identified a pair of coevolving amino acids on IGF2 and IGF2R in reptiles: IGF2 P4 and IGF2R R1623 (ρ = 0.4, P < 0.01). No sites were identified as coevolving between IGF1 and IGF2R. To further predict how evolution has shaped the interactions between IGF2R and the IGF hormones in reptiles, we used MMMvII (66) to identify the species with the tightest correlated rates of evolution between IGFs and IGF2R based on the gene tree topologies and branch lengths, given a tolerance value of 0.2. Interestingly, within the reptiles, snakes (sunbeam and viper boa) had the tightest coevolutionary signal between hormonereceptor pairings IGF2 and IGF2R (ρ = 1), and the lizards (brown and green anoles and gecko) had the tightest coevolutionary signal between IGF1 and IGF2R (ρ = 0.33), suggesting that among the reptiles, these receptor-hormone relationships are most strongly coevolving in the squamate clade specifically. 1. Sparkman AM, Vleck CM, Bronikowski AM (2009) Evolutionary ecology of endocrinemediated life-history variation in the garter snake Thamnophis elegans. Ecology 90(3):720–728. 2. Grabherr MG, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652. 3. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):10–12. 4. Lohse M, et al. (2012) RobiNA: A user-friendly, integrated software solution for RNASeq-based transcriptomics. Nucleic Acids Res 40(Web Server issue):W622–W627. 5. Cahais V, et al. (2012) Reference-free transcriptome assembly in non-model animals from next-generation sequencing data. Mol Ecol Resour 12(5):834–845. 6. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461. 7. Kent WJ (2002) BLAT—The BLAST-like alignment tool. Genome Res 12(4):656–664. 8. Li L, Stoeckert CJ, Jr, Roos DS (2003) OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. 9. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30. 10. Kanehisa M, et al. (2014) Data, information, knowledge and principle: Back to metabolism in KEGG. Nucleic Acids Res 42(Database issue):D199–D205. 11. Luisi P, et al. (2012) Network-level and population genetics analysis of the insulin/TOR signal transduction pathway across human populations. Mol Biol Evol 29(5):1379–1392. 12. Alvarez-Ponce D, Aguadé M, Rozas J (2011) Comparative genomics of the vertebrate insulin/TOR signal transduction pathway: A network-level analysis of selective pressures. Genome Biol Evol 3:87–101. 13. Wang M, et al. (2013) The molecular evolutionary patterns of the Insulin/FOXO signaling pathway. Evol Bioinform Online 9:1–16. 14. Fantin VR, Wang Q, Lienhard GE, Keller SR (2000) Mice lacking insulin receptor substrate 4 exhibit mild defects in growth, reproduction, and glucose homeostasis. Am J Physiol Endocrinol Metab 278(1):E127–E133. 15. Yenush L, White MF (1997) The IRS-signalling system during insulin and cytokine action. BioEssays 19(6):491–500. 16. Lavan BE, et al. (1997) A novel 160-kDa phosphotyrosine protein in insulin-treated embryonic kidney cells is a new member of the insulin receptor substrate family. J Biol Chem 272(34):21403–21407. 17. Fantin VR, et al. (1998) Characterization of insulin receptor substrate 4 in human embryonic kidney 293 cells. J Biol Chem 273(17):10726–10732. 18. Qu B-H, Karas M, Koval A, LeRoith D (1999) Insulin receptor substrate-4 enhances insulin-like growth factor-I-induced cell proliferation. J Biol Chem 274(44): 31179–31184. 19. Xu X, et al. (2012) Modular genetic control of sexually dimorphic behaviors. Cell 148(3):596–607. 20. Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: Multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26(16):1958–1964. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 7 of 19 21. Plyusnin I, Holm L (2012) Comprehensive comparison of graph based multiple protein sequence alignment strategies. BMC Bioinformatics 13(1):64. 22. Sievers F, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):539. 23. Wernersson R, Pedersen AG (2003) RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res 31(13):3537–3539. 24. Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: Multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res 38(Web Server issue): W7-13. 25. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56(4):564–577. 26. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552. 27. Ranwez V, Harispe S, Delsuc F, Douzery EJ (2011) MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. PLoS ONE 6(9):e22594. 28. Schneider A, et al. (2009) Estimates of positive Darwinian selection are inflated by errors in sequencing, annotation, and alignment. Genome Biol Evol 1:114–118. 29. Jordan G, Goldman N (2012) The effects of alignment error and alignment filtering on the sitewise detection of positive selection. Mol Biol Evol 29(4):1125–1139. 30. Penn O, et al. (2010) GUIDANCE: A web server for assessing alignment confidence scores. Nucleic Acids Res 38(Web Server issue):W23-8. 31. Wright KM, Rausher MD (2010) The evolution of control and distribution of adaptive mutations in a metabolic pathway. Genetics 184(2):483–502. 32. Kim PM, Korbel JO, Gerstein MB (2007) Positive selection at the protein network periphery: Evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci USA 104(51):20274–20279. 33. Hahn MW, Kern AD (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 22(4):803–806. 34. Doncheva NT, Assenov Y, Domingues FS, Albrecht M (2012) Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc 7(4):670–685. 35. Shannon P, et al. (2003) Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. 36. Stark C, et al. (2006) BioGRID: A general repository for interaction datasets. Nucleic Acids Res 34(Database issue, suppl 1):D535–D539. 37. Yoon J, Blumer A, Lee K (2006) An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics 22(24):3106–3108. 38. Wiens JJ, et al. (2012) Resolving the phylogeny of lizards and snakes (Squamata) with extensive sampling of genes and species. Biol Lett 8(6):1043–1046. 39. Kimball RT, Wang N, Heimer-McGinn V, Ferguson C, Braun EL (2013) Identifying localized biases in large datasets: A case study using the avian tree of life. Mol Phylogenet Evol 69(3):1021–1032. 40. McCormack JE, et al. (2013) A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS ONE 8(1):e54848. 41. Thomson RC, Shaffer HB (2010) Sparse supermatrices for phylogenetic inference: Taxonomy, alignment, rogue taxa, and the phylogeny of living turtles. Syst Biol 59(1): 42–58. 42. Perelman P, et al. (2011) A molecular phylogeny of living primates. PLoS Genet 7(3): e1001342. 43. Eo SH, Bininda-Emonds OR, Carroll JP (2009) A phylogenetic supertree of the fowls (Galloanserae, Aves). Zool Scr 38(5):465–481. 44. Hedges SB, Kumar S (2009) The Timetree of Life (Oxford Univ Press, New York). 45. dos Reis M, et al. (2012) Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny. Proc Roy Soc B Biol Sci 279 (1742):3491–3500. 46. Junier T, Zdobnov EM (2010) The Newick utilities: High-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26(13):1669–1670. 47. Weadick CJ, Chang BS (2012) An improved likelihood ratio test for detecting sitespecific functional divergence among clades of protein-coding genes. Mol Biol Evol 29(5):1297–1300. 48. Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3(5):418–426. 49. Thornton K (2003) Libsequence: A C++ class library for evolutionary genetic analysis. Bioinformatics 19(17):2325–2327. 50. Yang Z, Nielsen R (2002) Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol 19(6):908–917. 51. Zhang J, Nielsen R, Yang Z (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22(12): 2472–2479. 52. Yang Z, dos Reis M (2011) Statistical properties of the branch-site test of positive selection. Mol Biol Evol 28(3):1217–1228. 53. Self SG, Liang K-L (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82(398):605–610. 54. Goldman N, Whelan S (2000) Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol Biol Evol 17(6):975–978. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 55. Sparkman AM, et al. (2012) Rates of molecular evolution vary in vertebrates for insulinlike growth factor-1 (IGF-1), a pleiotropic locus that regulates life history traits. Gen Comp Endocrinol 178(1):164–173. 56. Biasini M, et al. (2014) SWISS-MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res 42(Web Server issue):W252-8. 57. Yang Y, et al. (2010) Solution structure of proinsulin: Connecting domain flexibility and prohormone processing. J Biol Chem 285(11):7847–7851. 58. Sato A, et al. (1993) Three-dimensional structure of human insulin-like growth factor-I (IGF-I) determined by 1H-NMR and distance geometry. Int J Pept Protein Res 41(5):433–440. 59. Williams C, et al. (2012) An exon splice enhancer primes IGF2:IGF2R binding site structure and function evolution. Science 338(6111):1209–1213. 60. Garrett TPJ, et al. (1998) Crystal structure of the first three domains of the type-1 insulin-like growth factor receptor. Nature 394(6691):395–399. 61. Brown J, et al. (2008) Structure and functional analysis of the IGF-II/IGF2R interaction. EMBO J 27(1):265–276. 62. Zuckerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. Evolving Genes and Proteins, eds Bryson V, Vogel HJ (Academic Press, New York). 63. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30(12):2725–2729. 64. Fares MA, McNally D (2006) CAPS: Coevolution analysis using protein sequences. Bioinformatics 22(22):2821–2822. 65. de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14(4):249–261. 66. Rodionov A, Bezginov A, Rose J, Tillier ER (2011) A new, fast algorithm for detecting protein coevolution using maximum compatible cliques. Algorithms Mol Biol 6(1):17. 67. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25(24):4876–4882. 68. Kearse M, et al. (2012) Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28(12):1647–1649. 69. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591. 70. Subramanian S, Kumar S (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168(1):373–381. 71. Jovelin R, Phillips PC (2011) Expression level drives the pattern of selective constraints along the insulin/Tor signal transduction pathway in Caenorhabditis. Genome Biol Evol 3:715–722. 72. Montanucci L, Laayouni H, Dall’Olio GM, Bertranpetit J (2011) Molecular evolution and network-level analysis of the N-glycosylation metabolic pathway across primates. Mol Biol Evol 28(1):813–823. 73. Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol 3(1):11. 74. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296(5568):750–752. 75. Bloom JD, Adami C (2004) Evolutionary rate depends on number of protein-protein interactions independently of gene expression level: Response. BMC Evol Biol 4(1):14. 76. Larracuente AM, et al. (2008) Evolution of protein-coding genes in Drosophila. Trends Genet 24(3):114–123. 77. Invergo BM, Montanucci L, Laayouni H, Bertranpetit J (2013) A system-level, molecular evolutionary analysis of mammalian phototransduction. BMC Evol Biol 13(1):52. 78. Ratnakumar A, et al. (2010) Detecting positive selection within genomes: The problem of biased gene conversion. Philos Trans R Soc Lond B Biol Sci 365(1552):2571–2580. 79. Guirao-Rico S, Aguadé M (2009) Positive selection has driven the evolution of the Drosophila insulin-like receptor (InR) at different timescales. Mol Biol Evol 26(8):1723–1732. 80. Denley A, Cosgrove LJ, Booker GW, Wallace JC, Forbes BE (2005) Molecular interactions of the IGF system. Cytokine Growth Factor Rev 16(4-5):421–439. 81. Keyhanfar M, Booker GW, Whittaker J, Wallace JC, Forbes BE (2007) Precise mapping of an IGF-I-binding site on the IGF-1R. Biochem J 401(1):269–277. 82. Bayne ML, et al. (1989) The C region of human insulin-like growth factor (IGF) I is required for high affinity binding to the type 1 IGF receptor. J Biol Chem 264(19):11004–11008. 83. Denley A, et al. (2004) Structural determinants for high-affinity binding of insulin-like growth factor II to insulin receptor (IR)-A, the exon 11 minus isoform of the IR. Mol Endocrinol 18(10):2502–2512. 84. Seino S, Bell GI (1989) Alternative splicing of human insulin receptor messenger RNA. Biochem Biophys Res Commun 159(1):312–316. 85. Epa VC, Ward CW (2006) Model for the complex between the insulin-like growth factor I and its receptor: Towards designing antagonists for the IGF-1 receptor. Protein Eng Des Sel 19(8):377–384. 86. Castoe TA, Jiang ZJ, Gu W, Wang ZO, Pollock DD (2008) Adaptive evolution and functional redesign of core metabolic proteins in snakes. PLoS ONE 3(5):e2201. 87. Castoe TA, et al. (2009) Evidence for an ancient adaptive episode of convergent molecular evolution. Proc Natl Acad Sci USA 106(22):8986–8991. 88. Oldham S (2011) Obesity and nutrient sensing TOR pathway in flies and vertebrates: Functional conservation of genetic mechanisms. Trends Endocrinol Metab 22(2):45–52. 89. Tatar M, Bartke A, Antebi A (2003) The endocrine regulation of aging by insulin-like signals. Science 299(5611):1346–1351. 8 of 19 IGFBP4 INS IGF2 IGFBP1 IGF1 Extracellular IGFBP5 IGFBP2 IGFBP3 INSR IGFBP6 * IGF1R P KRAS PIP2 GTP NRAS PIP3 SOS1 PIK3CA INPPL1 PDPK1 PIK3R5 PIK3CB PIK3CD PIK3CG SOCS1 SOCS3 SOCS4 Raf SH2B2 MAPK10 PTPN1 MEK1/2 PRKCG AKT PKC SGK1 Degradation of Ligands MLST8 mTOR AKT1S1 Lipogenesis Survival, Growth, PDE3B Proliferation FOXO1 MLST8 TSC1 mTOR TSC2 PPP1R3C PPP1R3D PPARGC1A BAD Rictor GSK3 Apoptosis PRKAA2 Raptor P eIF2B CALM1 RPS6KA6 4EBP1 PHKB PHKG1 RPS6 Glycogenesis RHEB ULK2 ULK3 eIF4E2 Autophagy MKNK1 Protein Synthesis Gene Expression P RSK STK11 STRADA MO25 eIF4E GLS ERK1/2 IKBKB Proliferation / Differentiation Elk1 FOXO1 Fig. S1. The IIS/TOR signaling network. Proteins not included in this study due to lack of sequence data across species are in gray. Gene names correspond to Tables S1 and S3. Genes in yellow were identified as reptiles having highly divergent Ka/Ks relative to the rest of the tree by the CMCreptiles model (last column of Table S3), significant after correction for multiple comparisons. *IRS4 and *IGFBP6 were analyzed manually due to their exceptional divergence in sequence and length between reptiles and mammals (Table S5 and Fig. S3). Figure modified from ProteinLounge.com, SABiosciences. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 9 of 19 Fig. S2. A rooted cladogram showing the phylogenetic relationships among the species included in this study. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 10 of 19 Fig. S3. Annotated amino acid alignment of IGFBP6. The human sequence is set as a reference at the top of the alignment, and sequence differences from the reference sequence are highlighted. We provide functional annotation on the human sequence. The N- and C-terminal domains are in red; the cysteine residues are in dark blue. IGF binding sites that are conserved across all binding proteins are marked in cyan (excepting two snake species, for which only one of these is conserved). IGF binding sites specific to IGFBP6 are marked in green, and the sites with different function (e.g., integrin binding) are marked in gray. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 11 of 19 Table S1. IIS/TOR genes used in this study and their estimates of divergence between reptiles and mammals Function Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Symbol EntrezID IGF1 3479 IGF1R 100500937 IGF2 3481 IGF2R 3482 IGFBP2 3485 IGFBP3 3486 IGFBP4 3487 IGFBP5 3488 INS 3630 INSR 3643 AKT1S1 84335 CALM1 801 EIF4E 1977 EIF4E2 9470 FOXO1 2308 GRB2 2885 IKBKB 3551 INPPL1 3636 IRS1 3667 IRS2 8660 KRAS 3845 MAPK10 5602 MKNK1 8569 MLST8 64223 MTOR 2475 NRAS 4893 PDE3B 5140 PDPK1 5170 PHKB 5257 PHKG1 5260 PIK3CA 5290 PIK3CB 5291 PIK3CD 5293 PIK3CG 5294 PIK3R5 23533 PPARGC1A 10891 PPP1R3C 5507 PPP1R3D 5509 PRKAA2 5563 PRKCG 5582 PTEN 5728 PTPN1 5770 RHEB 6009 RICTOR 253260 RPS6 6194 RPS6KA6 27330 SGK1 6446 SH2B2 10603 SHC1 6464 SHC2 25759 SHC3 53358 SOCS1 8651 SOCS3 9021 SOCS4 122809 SOS1 6654 STK11 6794 STRADA 92335 TSC1 7248 TSC2 7249 ULK2 9706 ULK3 25989 Betweenness Degree Mammal Gator Lizard Bird Turtle Snake Total Length 2.19E-04 4.10E-04 0 5.55E-05 0 4.45E-04 5.40E-07 4.42E-05 1.98E-06 7.80E-04 9.10E-07 1.90E-04 1.81E-04 2.36E-05 4.39E-05 8.00E-08 2.96E-05 0 0 4.50E-05 7.19E-05 1.06E-04 3.73E-06 9.28E-05 4.00E-08 3.20E-07 1.43E-05 8.32E-05 2.21E-06 0 2.67E-04 9.32E-06 0 6.84E-06 5.51E-06 1.95E-04 5.60E-06 2.40E-07 3.10E-04 2.54E-04 0.00141419 4.04E-05 2.03E-06 3.27E-04 3.91E-05 1.08E-04 2.63E-04 0 6.00E-08 2.00E-08 3.42E-06 0 0 0 3.03E-04 0 4.00E-08 0 6.56E-05 7.04E-06 0 20 88 1 34 1 36 5 12 6 76 10 26 60 13 63 2 3 1 2 40 34 28 21 30 3 3 4 69 6 1 59 14 1 27 7 91 7 3 52 19 114 30 36 86 44 28 70 2 2 5 10 8 6 6 114 1 12 1 120 9 1 32 31 27 25 31 29 29 30 20 32 24 31 32 32 31 31 32 32 29 24 26 29 30 32 31 27 32 32 32 31 32 32 30 32 32 32 32 29 31 31 31 30 30 32 32 4 31 30 31 28 30 29 30 29 31 31 31 32 32 32 32 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 1 1 2 2 1 2 2 2 1 2 2 2 2 1 2 2 1 1 2 2 2 2 2 2 1 2 2 2 2 1 1 0 2 2 2 2 2 2 2 2 3 4 7 7 7 5 7 7 2 6 7 7 7 7 7 6 7 6 6 0 7 2 6 7 7 6 7 7 7 1 7 6 6 7 5 7 7 7 3 5 7 7 7 7 6 6 7 6 7 1 0 7 6 5 8 7 7 7 7 7 7 10 10 8 10 10 10 0 9 10 10 0 10 10 10 10 10 10 6 10 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 9 0 10 10 9 10 10 0 10 10 10 7 10 10 8 9 10 9 10 10 10 10 10 5 5 7 8 6 7 7 4 1 8 4 8 8 7 7 8 8 7 6 6 8 3 8 8 8 7 8 8 8 2 8 6 8 8 5 7 6 6 4 5 8 8 8 8 8 2 6 6 6 2 2 7 4 8 8 8 8 8 8 6 7 7 5 7 7 7 5 5 7 0 7 7 7 7 7 6 7 7 7 0 0 7 0 4 7 7 7 7 7 7 0 7 4 7 7 7 1 7 6 0 4 7 6 6 7 6 6 6 6 7 0 0 4 5 3 7 7 7 7 7 7 7 59 56 58 59 63 58 50 59 34 65 44 65 66 65 63 64 66 60 53 40 60 45 59 66 65 58 66 66 66 45 66 60 63 66 60 59 64 59 48 47 65 63 62 66 64 19 62 60 63 40 43 58 53 56 66 64 65 66 66 64 65 0.54 0.98 0.69 0.81 0.67 0.61 1 0.40 0.96 0.96 0.87 0.99 0.87 0.89 0.59 1 0.79 0.67 0.61 0.20 0.98 0.92 0.89 0.99 0.99 0.99 0.82 0.98 0.98 0.92 0.52 0.99 0.97 1 0.89 0.85 0.99 0.80 0.94 0.64 0.94 0.95 0.99 0.83 0.99 0.99 0.80 0.13 0.78 0.70 0.70 0.92 0.92 0.95 0.97 1 0.49 0.96 0.36 0.28 0.90 N ω Ka Ks 864 772 837 850 992 841 609 869 280 1,041 480 1,054 1,088 1,055 992 1,023 1,087 895 696 383 884 464 870 1,088 1,041 837 1,084 1,085 1,087 434 1,088 895 989 1,082 841 863 1,024 870 527 489 1,054 990 960 1,079 1,024 60 961 880 992 336 390 841 688 783 1,042 1,023 1,054 1,084 1,063 1,017 1,054 0.12 0.05 0.15 0.11 0.14 0.06 0.12 0.10 0.24 0.05 0.18 0.00 0.06 0.02 0.07 0.01 0.06 0.07 0.07 0.19 0.02 0.01 0.05 0.03 0.01 0.01 0.15 0.03 0.05 0.13 0.03 0.05 0.07 0.03 0.14 0.12 0.09 0.12 0.02 0.06 0.05 0.04 0.01 0.05 0.01 0.06 0.02 0.14 0.06 0.15 0.06 0.14 0.09 0.07 0.04 0.03 0.04 0.11 0.06 0.05 0.11 0.17 0.09 0.31 0.31 0.18 0.15 0.17 0.11 0.31 0.12 0.34 0.01 0.03 0.03 0.12 0.02 0.12 0.09 0.11 0.19 0.05 0.02 0.10 0.04 0.02 0.02 0.18 0.06 0.09 0.13 0.04 0.08 0.11 0.10 0.24 0.08 0.21 0.28 0.04 0.09 0.03 0.11 0.01 0.07 0.02 0.09 0.04 0.18 0.09 0.13 0.16 0.23 0.07 0.10 0.05 0.07 0.09 0.15 0.11 0.10 0.14 1.30 1.66 2.17 3.00 1.35 3.00 1.51 1.23 1.32 2.88 1.91 2.98 0.55 1.41 1.76 1.15 1.88 1.20 1.40 0.97 3.00 0.95 2.14 1.43 1.62 2.64 1.30 1.96 1.68 1.05 1.70 1.49 1.54 3.00 1.57 0.68 2.23 2.37 2.43 1.99 0.57 2.62 0.89 1.22 1.87 1.62 1.42 1.41 1.30 0.93 2.98 1.63 0.75 1.82 1.15 2.44 2.88 1.34 1.83 1.60 1.23 Bold HGNC gene symbols are genes classified as extracellular; not bold are intracellular. Betweenness is the amount influence a node exerts on the interactions of the other nodes (range 0–1). Degree is a measure of connectivity and is the number of edges or interactions that gene has with other genes or proteins based on BioGrid human reactome 3.2.95 (1) (including protein-protein and protein-gene interactions). The numbers below each taxa represent the McGaugh et al. www.pnas.org/cgi/content/short/1419659112 12 of 19 number of sequences from that group represented in the alignment. Total is the number of sequences in alignment; N = total pairwise comparisons between reptiles and mammals used to calculate divergence measures. Divergence measures (Ka, nonsynonymous divergence; Ks, synonymous; ω, nonsynonymous/ synonymous) are the median of the pairwise comparisons calculated in PAML between reptiles and mammals. Length is the median length of sequences in the multiple species alignment given as a proportion of the longest human isoform. 1. Stark C, et al. (2006) BioGRID: A general repository for interaction datasets. Nucleic Acids Res 34(Database issue, suppl 1)D535–D539. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 13 of 19 Table S2. Genomic and transcriptomic datasets used in this study Common name Species name Tissue Green anole* Red-eared slider turtle Anolis carolinesis Trachemys scripta Painted turtle† Galápagos tortoise† Chinese softshell turtle* Chinese alligator† Pigeon† Darwin finch† Budgerigar◇ Saker falcon† Peregrine falcon† Collared flycatcher* Turkey* Chicken* Zebrafinch* Duck* Tenrec† Elephant* Rat* Mouse* Shrew† Vole† Chrysemys picta Chelonoidis nigra Pelodiscus sinensis Alligator sinensis Columba livia Geospiza fortis Melopsittacus undulatus Falco cherrug Falco peregrinus Ficedula albicollis Meleagris gallopavo Gallus gallus Taeniopygia guttata Anas platyrhynchos Echinops telfairi Loxodonta africana Rattus norvegicus Mus musculus Sorex araneus Microtus ochrogaster Multiple Brain Embryonic stage 14, 17 Multiple Blood Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Ground squirrel* Pika† European rabbit* Naked mole rat† Guinea pig* Bush baby* Macaque* White-cheeked gibbon* Ictidomys tridecemlineatus Ochotona princeps Oryctolagus cuniculus Heterocephalus glaber Cavia porcellus Otolemur garnettii Macaca mulatta Nomascus leucogenys Orangutan* Gorilla gorilla* Chimpanzee* Human* Pig* Cow* Dolphin† Horse* Little brown bat* Brandt’s bat† Cat* Dog* Giant Panda* Ferret* Armadillo† Opossum* Platypus* Tasmanian devil* Alligator Anolis lizard Alligator lizard Fence lizard Bearded dragon Skink Gecko African house snake Cottonmouth Sunbeam snake Total contigs Mean (bp) N50 (bp) n:N50 19,177 55,456 included above 1,589 767 2,094 1,074 25,802 19,668 20,668 38,114 31,132 28,607 26,145 26,628 27,810 15,893 16,496 16,354 18,204 16,353 38,810 25,635 25,725 50,718 40,099 46,900 1,646 615 1,588 1,104 118 1,140 1,179 1,207 1,206 1,635 1,596 1,669 1,347 1,494 1,097 1,623 1,532 1,358 1,125 1,042 2,091 687 2,013 1,686 1,737 1,749 1,818 1,875 1,869 2,202 2,148 2,223 1,911 2,142 1,605 2,109 2,043 2,013 1,590 1,620 5,881 5,265 4,770 6,732 5,435 5,017 4,610 4,689 4,797 3,430 3,634 3,537 3,644 3,265 7,135 5,771 5,571 9,740 7,676 8,080 Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple 20,000 40,749 20,588 69,635 19,774 19,986 36,384 19,988 1,542 1,092 1,602 1,046 1,567 1,619 1,442 1,626 1,932 1,632 2,100 1,578 2,058 2,085 1,920 2,133 4,560 7,378 4,533 12,738 4,357 4,505 7,979 4,435 Pongo abelii Gorilla gorilla Pan troglogdytes Homo sapiens Sus scrofa Bos taurus Tursiops truncatus Equus caballus Myotis lucifugus Myotis brandtii Felis catus Canis familiaris Ailuropoda melanoleuca Mustela putorius Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple Multiple 21,414 27,473 19,907 102,156 25,883 22,118 38,169 22,654 20,719 47,102 20,259 25,160 21,136 20,062 1,507 1,608 1,582 1,147 1,354 1,605 979 1,688 1,535 1,023 1,587 1,734 1,618 1,606 2,040 2,166 2,094 1,839 1,824 2,082 1,377 2,319 2,037 1,557 2,112 2,298 2,154 2,127 4,562 5,842 4,327 17,747 5,574 4,830 7,665 4,641 4,466 8,315 4,354 5,395 4,520 4,295 Dasypus novemcinctus Monodelphis domestica Ornithorhynchus anatinus Sarcophilus harrisii Alligator mississippiensis Anolis sagrei Elgaria multicarinata Sceloporus undulatus Pogona vitticeps Scincella lateralis Eublepharis macularius Lamprophis fuliginosus Agkistrodon piscivorus Xenopeltis unicolor Multiple Multiple Multiple Multiple Liver, f‡, juvenile Liver, m, adult Liver, u, juvenile Liver, m, adult Liver, u, juvenile Liver, u, adult Liver, m, adult Liver, f, adult Liver, f, adult Liver, f, adult 57,911 22,310 23,584 22,404 47,884 23,392 24,018 32,046 38,739 50,129 37,488 32,952 25,220 27,211 991 1,592 1,166 1,604 868 891 888 1,000 933 945 931 818 903 956 1,407 2,049 1,593 2,091 1,206 1,227 1,242 1,479 1,323 1,359 1,338 1,077 1,257 1,359 11,113 4,975 4,777 4,987 9,548 4,843 4,978 6,178 7,910 9,867 7,508 7,149 5,353 5,606 McGaugh et al. www.pnas.org/cgi/content/short/1419659112 GC 4,305 48.42 10,920 50.88 Citation (1 2) (3) (4) 48.83 (5) 45.89 (6) 48.46 (7) 49.70 (8) 50.56 (9) 51.04 (10) 49.28 (11) 49.37 (12) 49.63 (12) 52.17 (13) 48.61 (14) 50.32 (15, 16) 50.75 (17) 49.25 (18) 54.18 (19) 51.76 (19) 51.77 (20) 51.95 (19) 55.44 (19) 52.22 Unpublished Broad Institute 51.83 (19) 54.49 (19) 53.88 (19) 53.87 (21) 52.56 (19) 51.55 (19) 51.54 (22) 51.52 Baylor College of Medicine 52.03 (23) 52.15 (24) 51.96 (25) 52.24 (19) 53.25 (26) 53.33 (19) 53.63 (19) 51.57 (19) 53.16 (19) 53.04 (27) 52.67 (19) 52.77 (28) 52.89 (29) 53.35 Unpublished Broad Institute 54.18 (19) 48.32 (30) 54.07 (31) 47.91 (32) 49.35 This study, SM07 47.77 This study, SM02 48.76 This study, SM03 47.48 This study, SM08 49.44 This study, SM09 51.22 This study, SM12 48.76 This study, SM15 47.69 This study, SM04 47.57 This study, SM05 47.63 This study, SM06 14 of 19 Table S2. Cont. Common name Viper boa W. aquatic garter snake Garter snake-lake Garter snake-meadow Snapping turtle Stinkpot turtle Sideneck turtle Box turtle Species name Candoia aspera Thamnophis couchii Thamnophis elegans Thamnophis elegans Cheyldra serpentina Sternotherus odoratus Pelusios castaneus Terrapene ornata Tissue Liver, Liver, Liver, Liver, Liver, Liver, Liver, Liver, f, adult f, adult m, juvenile f, juvenile m, juvenile f, juvenile f, juvenile u, juvenile Total contigs 34,984 38,648 37,723 36,090 26,251 43,717 40,755 43,109 Mean (bp) N50 (bp) 947 986 1,013 1,053 835 971 984 959 1,332 1,410 1,443 1,566 1,119 1,413 1,434 1,401 n:N50 GC 7,215 7,666 7,635 6,963 5,688 8,652 7,943 8,207 48.56 47.77 47.83 47.64 50.45 50.97 49.70 50.44 Citation This This This This This This This This study, study, study, study, study, study, study, study, SM14 TC HS08 HS11 SM01 SM10 SM11 SM13 Contigs less than 200 bp were not included in our study. n:N50 is defined here as the number of contigs that add up to 50% of the total assembly size when sorted longest to shortest, and the N50 refers to the mean length of the contig such that half of all bases in the assembly are made of sequences of equal or longer length. Liver transcriptome was sequenced for all individuals in our study and the sex and stage is given. Individual identifier abbreviation of raw sequence data for the liver transcriptome data generated from this study can be found under Citation. U, unknown. *Sequence was downloaded from Ensembl, thus, is also annotated using the genomic sequence. † Sequence was RNA downloaded from NCBI’s genome ftp. ‡ Sex: f, female; m, male; u, unknown. 1. Alföldi J, et al. (2011) The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature 477(7366):587–591. 2. Eckalbar WL, et al. (2013) Genome reannotation of the lizard Anolis carolinensis based on 14 adult and embryonic deep transcriptomes. BMC Genomics 14(1):49. 3. Tzika AC, Helaers R, Schramm G, Milinkovitch MC (2011) Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles. Evodevo 2(1):19. 4. Kaplinsky NJ, et al. (2013) The embryonic transcriptome of the red-eared slider turtle (Trachemys scripta). PLoS ONE 8(6):e66357. 5. Shaffer HB, et al. (2013) The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage. Genome Biol 14(3):R28. 6. Chiari Y, Cahais V, Galtier N, Delsuc F (2012) Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (Archosauria). BMC Biol 10(1):65. 7. Wang Z, et al. (2013) The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nat Genet 45(6): 701–706. 8. Wan Q-H, et al. (2013) Genome analysis and signature discovery for diving and sensory properties of the endangered Chinese alligator. Cell Res 23(9):1091–1105. 9. Shapiro MD, et al. (2013) Genomic diversity and evolution of the head crest in the rock pigeon. Science 339(6123):1063–1067. 10. Parker P, Li B, Li H, Wang J (2012) The genome of Darwin’s Finch (Geospiza fortis). GigaScience. Available at dx.doi.org/10.5524/100040. Accessed September 10, 2013. 11. Bradnam KR, et al. (2013) Assemblathon 2: Evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2(1):10. 12. Zhan X, et al. (2013) Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle. Nat Genet 45(5):563–566. 13. Ellegren H, et al. (2012) The genomic landscape of species divergence in Ficedula flycatchers. Nature 491(7426):756–760. 14. Dalloul RA, et al. (2010) Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis. PLoS Biol 8(9):e1000475. 15. Rubin C-J, et al. (2010) Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464(7288):587–591. 16. Hillier LW, et al.; International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432(7018):695–716. 17. Warren WC, et al. (2010) The genome of a songbird. Nature 464(7289):757–762. 18. Huang Y, et al. (2013) The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nat Genet 45(7):776–783. 19. Lindblad-Toh K, et al.; Broad Institute Sequencing Platform and Whole Genome Assembly Team; Baylor College of Medicine Human Genome Sequencing Center Sequencing Team; Genome Institute at Washington University (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478(7370):476–482. 20. Gibbs RA, et al.; Rat Genome Sequencing Project Consortium (2004) Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428(6982):493–521. 21. Kim EB, et al. (2011) Genome sequencing reveals insights into physiology and longevity of the naked mole rat. Nature 479(7372):223–227. 22. Gibbs RA, et al.; Rhesus Macaque Genome Sequencing and Analysis Consortium (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316(5822): 222–234. 23. Locke DP, et al. (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469(7331):529–533. 24. Scally A, et al. (2012) Insights into hominid evolution from the gorilla genome sequence. Nature 483(7388):169–175. 25. Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055):69–87. 26. Groenen MA, et al. (2012) Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491(7424):393–398. 27. Seim I, et al. (2013) Genome analysis reveals insights into physiology and longevity of the Brandt’s bat Myotis brandtii. Nat Commun 4:2212. 28. Lindblad-Toh K, et al. (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438(7069):803–819. 29. Li R, et al. (2010) The sequence and de novo assembly of the giant panda genome. Nature 463(7279):311–317. 30. Mikkelsen TS, et al.; Broad Institute Genome Sequencing Platform; Broad Institute Whole Genome Assembly Team (2007) Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447(7141):167–177. 31. Warren WC, et al. (2008) Genome analysis of the platypus reveals unique signatures of evolution. Nature 453(7192):175–183. 32. Murchison EP, et al. (2012) Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer. Cell 148(4):780–791. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 15 of 19 Table S3. Results from tests for positive selection on each IIS/TOR gene Classification HGNC symbol bs_reptilesC bs_reptiles bs_mammal bs_croc bs_bird bs_turtle bs_squamata CMC_squamata CMCreptiles Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Extracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular Intracellular IGF1 IGF1R IGF2 IGF2R IGFBP2 IGFBP3 IGFBP4 IGFBP5 INS INSR AKT1S1 CALM1 EIF4E EIF4E2 FOXO1 GRB2 IKBKB INPPL1 IRS1 IRS2 KRAS MAPK10 MKNK1 MLST8 MTOR NRAS PDE3B PDPK1 PHKB PHKG1 PIK3CA PIK3CB PIK3CD PIK3CG PIK3R5 PPARGC1A PPP1R3C PPP1R3D PRKAA2 PRKCG PTEN PTPN1 RHEB RICTOR RPS6 RPS6KA6 SGK1 SH2B2 SHC1 SHC2 SHC3 SOCS1 SOCS3 SOCS4 SOS1 STK11 STRADA TSC1 TSC2 ULK2 ULK3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 −6.08 0.90 0.00 −25.65 0.00 0.00 0.00 0.00 0.00 6.58 0.00 24.72 0.02 0.00 0.00 −21.23 0.00 0.00 −538.65 −121.84 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 −0.21 0.00 −0.02 0.00 0.00 23.84 0.00 0.00 6.30 0.00 0.00 0.00 0.00 0.00 −38.49 −0.03 0.00 0.00 0.00 0.00 0.00 4.19 6.54 0.85 38.16 8.30 0.00 2.21 7.33 3.59 9.40 0.70 0.00 0.00 0.00 0.00 0.00 1.78 14.01 66.81 14.35 0.99 0.00 0.00 0.00 2.18 0.00 5.80 0.00 7.50 0.05 0.00 2.98 6.86 0.00 50.68 0.17 0.00 0.48 0.57 12.88 0.00 0.00 0.00 4.15 0.00 0.00 0.00 0.00 0.15 1.43 0.00 0.00 0.03 0.23 0.00 0.00 0.01 0.00 3.36 0.00 0.00 0 0.00 1.54 29.44 0.00 0.00 4.45 3.29 3.02 15.62 8.31 0.0 0.0 0.0 1.64 0.0 3.45 0.0 20.51 14.34 0.0 0.0 6.47 2.03 4.16 0.0 0.0 0.0 8.48 0.37 0.0 8.87 14.65 4.18 24.83 0.0 0.56 0.59 0.0 27.81 0.0 0.0 0.0 12.67 0.0 0.0 0.0 2.11 0.05 6.62 2.06 0.01 3.50 0.0 0.25 0.0 1.98 2.85 17.23 0.0 3.09 0.00 12.16 0.00 8.77 0.00 0.00 10.55 11.87 0.00 24.17 0.00 0.00 0.00 0.00 0.60 0.00 0.00 0.00 6.15 10.38 0.00 6.93 0.00 0.00 NA −0.01 0.00 0.00 0.00 0.55 0.00 1.23 0.77 1.19 5.14 0.00 0.00 0.00 0.00 4.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.32 0.00 NA 1.63 0.00 0.00 0.00 12.95 −509.79 7.62 0.61 0.00 0.00 0.00 17.32 0.00 0.00 NA 3.75 0.00 −245.07 NA 0.07 0.00 −34.75 0.00 0.00 7.08 7.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 −0.29 0.00 0.00 −1.13 2.40 0.00 2.40 1.27 0.00 5.66 0.00 0.00 4.73 0.00 NA 0.00 3.02 0.00 3.40 0.00 NA 0.00 0.00 0.00 0.00 0.00 5.60 0.00 0.00 0.00 0.00 0.00 2.79 7.95 0.00 2.28 0.00 0.00 0.00 25.59 1.88 0.00 0.00 10.94 0.00 5.77 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6.37 34.06 7.13 0.00 0.00 0.00 0.00 NA 0.07 0.00 0.00 −3.03 0.01 61.91 0.39 −0.01 0.00 0.00 0.00 0.00 13.59 0.00 5.82 0.00 0.00 0.00 −0.01 0.00 0.00 0.00 0.00 0.00 8.13 0.47 0.00 0.00 0.00 1.06 −0.22 0.00 8.12 2.17 0.00 1.17 4.16 10.44 1.75 31.03 5.84 0.00 2.61 2.59 0.00 0.84 3.65 0.00 0.00 0.00 0.00 0.00 72.16 1.36 21.81 14.35 0.99 0.00 0.00 0.00 0.79 0.00 6.52 0.00 0.00 6.67 −3.61 1.63 1.90 0.00 13.56 0.65 0.00 0.00 3.03 0.00 0.00 0.00 0.00 5.32 0.00 0.00 0.00 0.00 2.43 6.64 NA 0.00 0.02 0.50 0.00 0.00 0.18 0.00 177.56 0.00 0.00 94.02 2.09 21.80 156.70 109.35 0.02 16.76 0.89 −99.60 183.93 3.90 0.00 10.83 −9.30 51.56 −41.64 7.36 0.00 12.80 1.21 −90.99 2.90 0.06 17.46 16.02 0.92 3.83 3.99 11.47 12.86 27.03 8.19 4.54 0.90 19.40 27.20 0.44 0.13 4.37 39.70 −67.73 −453.77 125.64 9.08 0.48 51.16 −33.02 16.07 0.16 0.01 NA 9.92 8.01 1.76 −0.03 1.11 9.60 14.98 2.41 0.75 0.35 −33.35 2.57 63.69 372.49 23.43 −209.79 24.01 3.54 −99.60 31.49 14.55 −0.34 0.46 −9.30 48.00 −41.67 31.28 76.69 24.89 1.21 −90.99 −13.98 0.65 3.26 17.98 1.29 0.58 7.73 0.85 0.00 97.43 2.57 29.62 3.35 35.65 22.46 7.17 11.89 0.18 24.75 −67.73 −436.67 0.44 43.17 22.79 61.63 4.07 69.51 31.10 0.02 64.93 39.29 2.22 −543.68 −0.18 6.11 0.87 25.44 2.37 54.64 1.39 χ2 values from likelihood ratio tests from PAML, where significant values suggest evidence for positive selection at the gene level for the specified phylogenetic clade or branch. Italic and bold = significant at P < 0.05 before multiple test correction. Bold and underlined = significant at P < 0.05 after multiple test correction. The CMCs used the entire clade as the foreground. bs, branch-site test; bs_reptilesC, branch-site test with the entire reptile clade as the foreground branch, all other branch-site tests used only the branch leading to the specific taxa as the foreground branch; CMC, clade model; NA, not applicable for the specific gene. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 16 of 19 Table S4. Positively selected amino acid sites in hormones and binding domains of the receptors Human mature protein Mammal clade Snake proto-protein Reptile clade Signal peptide C-domain C-domain C-domain C-domain Signal peptide Signal peptide Propeptide Propeptide B-domain C-domain C-domain C-domain R6 D60 L80 L82 Q87 0.97 0.99 0.95 0.97 P2 S33 S34 R37 R6 Q60 Q78 Q80 V85 A17 V18 I22 F37 Q54 G85 S86 S89 0.99 0.99 0.94 0.98 1.00 1.00 0.93 1.00 IGF1 IGF1 IGF1 C-domain C-domain A-domain A38 Q40 R55 S90 T92 I107 0.99 1.00 0.99 IGF1 IGF1 IGF1 IGF1 IGF1 IGF1 IGF1 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 IGF2 INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR INSR D-domain E peptide E peptide E peptide E peptide E peptide E peptide Signal peptide Signal peptide C-domain C-domain C-domain Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide Protopeptide L1 domain L1 domain L1 domain L1 domain CR domain CR domain CR domain CR domain CR domain CR domain CR domain L2 domain FnIII-1 FnIII-1 FnIII-2 FnIII-2 FnIII-2 FnIII-2 FnIII-2 L64 Y87 Q88 S91 K94 K97 K102 V I A32 V35 S36 P74 F81 R83 Y92 V117 K120 E123 F125 R126 K129 A136 T139 Q140 V1 P3 R13 D68 Q171 S180 T188 Y226 R230 Q266 P280 G311 P537 Q540 S658 G735 V737 V744 A746 V116 V140 H141 N144 R147 T150 Y155 L3 V15 V48 N51 R52 L91 F102 K104 Y113 W139 E142 Q145 S147 E148 K151 V158 T161 H162 V1 P3 N13 K68 D170 S179 A187 V225 R229 S265 P277 E307 S533 K536 NA A719 S721 T728 G730 0.97 Gene Protein domain INS INS INS INS INS IGF1 IGF1 IGF1 IGF1 IGF1 IGF1 IGF1 IGF1 Mammal branch 0.92 0.96 0.99 1.00 0.92 1.00 0.98 0.97 0.96 1.00 1.00 1.00 0.99 1.00 1.00 0.99 1.00 0.99 1.00 0.92 0.92 0.92 0.96 1.00 1.00 0.99 0.92 0.91 0.97 0.91 1.00 0.97 0.91 McGaugh et al. www.pnas.org/cgi/content/short/1419659112 Reptile branch Functional annotations 1.00 Affects binding affinity to IGF1R and INSR (1, 2) 0.99 0.98 Affects binding affinity to IGF2R (3) 0.99 0.99 1.00 0.95 1.00 0.98 0.95 1.00 1.00 0.99 0.91 17 of 19 Table S4. Cont. Gene Protein domain INSR INSR INSR INSR INSR INSR INSR INSR INSR FnIII-2 FnIII-2 FnIII-2 FnIII-2 FnIII-2 FnIII-3 FnIII-3 FnIII-3 Transmembrane region Transmembrane region Transmembrane region Transmembrane region INSR INSR INSR Human mature protein Mammal clade T757 S758 V769 N770 T796 L886 L865 S884 K923 1.00 Mammal branch Snake proto-protein 0.98 E740 V741 V752 F753 A779 Q846 S848 Q867 A904 1.00 0.97 Reptile clade 1.00 1.00 1.00 1.00 1.00 I916 1.00 V938 F918 0.96 G922 1.00 R1241 S12 W14 L16 S29 K31 E44 A175 Y222 T278 N281 0.91 0.91 INSR IGF1R IGF1R IGF1R IGF1R IGF1R IGF1R IGF1R IGF1R IGF1R IGF1R Signal peptide Signal peptide Signal peptide Signal peptide L1 domain L1 domain CR domain CR domain CR domain CR domain P1266 * * * * E1 Q14 P145 R192 D248 F251 IGF1R IGF1R IGF1R IGF1R IGF1R IGF2R IGF2R CR domain CR domain CR domain L2 domain L2 domain Domain 11 Domain 11 E259 D262 Q275 M319 L379 A541 Y1542 P289 L292 Q306 S349 N409 Y1456 F1458 0.96 0.97 1.00 1.00 1.00 IGF2R Domain 11 E1544 N1460 0.98 IGF2R IGF2R IGF2R Domain 11 Domain 11 Domain 11 K1545 Y1549 N1558 Q1461 Q1641 T1474 1.00 0.95 0.90 IGF2R IGF2R IGF2R Domain 11 Domain 11 Domain 11 P1561 G1568 Q1569 G1478 G1487 H1488 0.98 IGF2R IGF2R IGF2R IGF2R IGF2R IGF2R Domain Domain Domain Domain Domain Domain 11 11 11 11 11 11 T1570 R1571 A1577 K1593 D1594 G1603 Q1489 P1490 L1497 K1512 E1513 A1522 0.94 0.99 0.96 1.00 0.91 0.97 IGF2R Domain 11 V1609 IGF2R IGF2R Domain 11 Domain 11 R1623 I1627 Q1542 I1546 0.98 1.00 IGF2R Domain 11 Q1632 K1551 0.98 IGF2R IGF2R IGF2R Domain 11 Domain 11 Domain 11 P1643 −1648 R1655 V1562 R1569 T1576 0.99 0.92 0.97 1.00 0.94 0.94 1.00 McGaugh et al. www.pnas.org/cgi/content/short/1419659112 Functional annotations 1.00 S936 I942 Reptile branch 1.00 0.98 0.96 0.99 0.96 0.93 0.95 0.98 0.92 0.96 Interacts with IGF1 C-domain (not IGF2) (4) 0.98 1.00 Y1528 Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) Predicted to affect IGF2 binding based on substitution in Chicken/Monotreme (5) 1.00 18 of 19 Listed are the sites with a posterior probabilities > 0.9 of being under positive selection in PAML branch-site model using either the branch leading to the clade or the entire clade in the foreground. The amino acid sites in the “human mature protein” sequence correspond to the expanded amino acids in Fig. 2. For the “human mature protein”, the amino acid listed is the human variant. For the “snake protoprotein” the amino acid listed is the snake variant. “Functional Annotations” column lists studies that have assigned functional significance to particular sites based on mutagenesis, antibody binding, and crystalline structure complexes (not an exhaustive list). NA, not applicable. 1. Denley A, Cosgrove LJ, Booker GW, Wallace JC, Forbes BE (2005) Molecular interactions of the IGF system. Cytokine Growth Factor Rev 16(4-5):421–439. 2. Zhang W, Gustafson TA, Rutter WJ, Johnson JD (1994) Positively charged side chains in the insulin-like growth factor-1 C- and D-regions determine receptor binding specificity. J Biol Chem 269(14):10609–10613. 3. Sakano K, et al. (1991) The design, expression, and characterization of human insulin-like growth factor II (IGF-II) mutants specific for either the IGF-II/cation-independent mannose 6-phosphate receptor or IGF-I receptor. J Biol Chem 266(31):20626–20635. 4. Keyhanfar M, Booker GW, Whittaker J, Wallace JC, Forbes BE (2007) Precise mapping of an IGF-I-binding site on the IGF-1R. Biochem J 401(1):269–277. 5. Brown J, Jones EY, Forbes BE (2009) Keeping IGF-II under control: Lessons from the IGF-II-IGF2R crystal structure. Trends Biochem Sci 34(12):612–619. Table S5. Variation in the sequence and presence of the IGF binding domain in IGF binding proteins 2–6 (% is the amino acid percent identity over the complete alignments) Taxon Reptiles Archosaurs Turtles Squamates Mammal Primates Other placental mammals Monotreme/marsupials BP2 (71%) BP3 (75%) BP4 (80%) BP5 (83%) BP6 (56%) G: 5/5 M T: 7/7 M G: 0 T: 2/6 F 3/6 R 1/6 M G: 1/1 M T: 9/12 F 1/12 R 2/12 M G: 1/5 F 3/5 R 1/5 M T: 2/7 R 5/7 M G: 1/1 F T: 3/6 F 2/6 R 1/6 M G: 1/1 F T: 3/ 8 F 2/8 R 3/8 M G: 3/5 R 2/5 M T: 2/6 F 2/6 R 2/6 M G: 1/1 F T: 6/6 F G: 2/4 F 2/4 M T: 1/7 F 4/7 R 2/7 M G: 1/1 M T: 2/4 F 2/4 R G: 0 T: 0 Suspect Gene Lost G: 1/1 M T: 2/2 R G: 1/1 F T: 10/11 F* 1/11 R G: 1/1 M T: 10/12 F 2/12 R G: 1/1 M T: 5/8 F 3/8 R G: 6/7 F 1/7 M G: 6/14 F 4/14 R 4/14 M T: 6/9 F 2/9 R 1/9 M G: 1/2 F 1/2 M G: 6/7 F 1/7 M G: 6/14 F 4/14 R 4/14 M T: 1/6 F 1/6 R 5/6 M G: 2/3 F 1/3 M G: 7/7 F G: 7/7 F G: 6/6 F G: 12/14 F 2/14 R T: 5/7 F 1/1 R 1/1 M G: 14/14 F T: 7/7 F G: 14/14 F T: 7/7 F G: 2/3 F 1/3 M G: 1/2 F 1/2 R G: 1/1 F Within reptiles and mammals, for each specified group of species, we report the proportion of sequences from genomic data (G) and/or transcriptomic data (T) that have the full N-terminal domain (F), a truncated N-terminal domain (R), or a missing binding domain (M). *Two species of snakes also showed an isoform with a missing IGF domain. McGaugh et al. www.pnas.org/cgi/content/short/1419659112 19 of 19