Rapid molecular evolution across amniotes of the IIS/TOR network

advertisement
Rapid molecular evolution across amniotes of the
IIS/TOR network
Suzanne E. McGaugha,1, Anne M. Bronikowskib,1, Chih-Horng Kuoc,1, Dawn M. Redingb,2, Elizabeth A. Addisb,3,
Lex E. Flageld,e, Fredric J. Janzenb, and Tonia S. Schwartzf,1
a
Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN 55108; bDepartment of Ecology, Evolution, and Organismal
Biology, Iowa State University, Ames, IA 50011; cInstitute of Plant and Microbial Biology, Academia Sinica, Taipei 11529, Taiwan; dDepartment of Plant
Biology, University of Minnesota, Saint Paul, MN 55108; eMonsanto Company, Chesterfield, MO 63017; and fOffice of Energetics, School of Public Health,
University of Alabama, Birmingham, AL 35294
Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved April 16, 2015 (received for review October 20, 2014)
insulin signaling
history and metabolic traits differ substantially between mammals
and reptiles (11, 12), and the IIS/TOR network influences these
traits (13–15).
Within vertebrates, many IIS/TOR extracellular genes have
evolved through gene duplication and thus are paralogs. Duplications of an insulin-like progenitor gene resulted in genes
encoding insulin (INS) and insulin-like growth factors 1 and 2
(IGF1 and IGF2) (6). These paralogous hormones bind the
similarly paralogous receptors, insulin receptor (INSR) and
insulin-like growth factor 1 receptor (IGF1R), and this binding
initiates the intracellular signaling cascade through insulin receptor substrate (IRS) and through the phosphatidylinositol
3-kinase (PI3K) and serine/threonine protein kinase intracellular
nodes (Fig. S1) (5, 7). Repeated duplication of the gene encoding
the ancestral IGF-binding proteins (IGFBP) resulted in six binding proteins that regulate bioavailability of the hormones (8).
Generally, these receptors, hormones, and binding proteins maintain the ability for cross-talk, but binding affinities differ (16). An
additional receptor, IGF2R, is a co-opted mannose-6 phosphate
receptor that regulates IGF2 bioavailability for activating IIS/TOR
Significance
Comparative analyses of central molecular networks uncover
variation that can be targeted by biomedical research to develop insights and interventions into disease. The insulin/insulin-like signaling and target of rapamycin (IIS/TOR) molecular
network regulates metabolism, growth, and aging. With the
development of new molecular resources for reptiles, we show
that genes in IIS/TOR are rapidly evolving within amniotes
(mammals and reptiles, including birds). Additionally, we find
evidence of natural selection that diversified the hormonereceptor binding relationships that initiate IIS/TOR signaling. Our
results uncover substantial variation in the IIS/TOR network
within and among amniotes and provide a critical step to unlocking information on vertebrate patterns of genetic regulation
of metabolism, modes of reproduction, and rates of aging.
| insulin growth factor | molecular evolution | rapamycin
T
he last 20 y has provided overwhelming support that the insulin- and insulin-like signaling/target of rapamycin (IIS/TOR)
molecular network responds to stress and nutrients and underlies a
wide range of physiological functions (1); cancer, metabolic syndrome, and diabetes (2); and the timing of life events (e.g., growth,
maturation, reproduction, and aging) (3). The vertebrate IIS/TOR
network consists of peptide hormones, binding proteins that regulate hormone bioavailability, and cell membrane receptors (hereafter, extracellular proteins of the IIS/TOR network) that induce an
intracellular signaling cascade (hereafter, intracellular proteins of
the IIS/TOR network) to stimulate cell proliferation, survival, and
metabolism (Fig. S1). The core intracellular signal transduction
genes in this network are largely conserved across deep phylogenetic time (4, 5). In contrast, genes encoding the IIS/TOR extracellular network have diverged in the vertebrate lineage (6–8) and
may have variable roles among taxa (9, 10). Despite its central role
in health, comparative analyses of IIS/TOR have been limited to
model invertebrates and mammals. Here we conduct evolutionary
analyses of IIS/TOR across amniotes: i.e., mammals and their
reptile sister clade, which includes birds (Fig. S2). Many life
www.pnas.org/cgi/doi/10.1073/pnas.1419659112
Author contributions: S.E.M., A.M.B., D.M.R., E.A.A., F.J.J., and T.S.S. designed and performed research; S.E.M., A.M.B., C.-H.K., L.E.F., and T.S.S. analyzed data; and S.E.M.,
A.M.B., C.-H.K., D.M.R., E.A.A., L.E.F., F.J.J., and T.S.S. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequences reported in this paper have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive, www.ncbi.
nlm.nih.gov/sra (accession nos. SRA062458 and SRP017466). Transcriptome assemblies,
annotation summaries, and alignments for protein coevolution analyses are available
through Dryad (10.5061/dryad.vn872).
1
To whom correspondence may be addressed. Email: smcgaugh@umn.edu (assemblies,
alignments, and molecular evolution analyses), abroniko@iastate.edu (the study itself,
transcriptomes, and accessing data), chk@gate.sinica.edu.tw (OrthoMCL analyses), or
tschwartz@uab.edu (protein predictions and coevolutionary and network interactions).
2
Present address: Department of Biology, Luther College, Decorah, IA 52101.
3
Present Address: Department of Biology, Gonzaga University, Spokane, WA 99258.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1419659112/-/DCSupplemental.
PNAS | June 2, 2015 | vol. 112 | no. 22 | 7055–7060
EVOLUTION
The insulin/insulin-like signaling and target of rapamycin (IIS/TOR)
network regulates lifespan and reproduction, as well as metabolic
diseases, cancer, and aging. Despite its vital role in health, comparative analyses of IIS/TOR have been limited to invertebrates
and mammals. We conducted an extensive evolutionary analysis
of the IIS/TOR network across 66 amniotes with 18 newly generated
transcriptomes from nonavian reptiles and additional available
genomes/transcriptomes. We uncovered rapid and extensive molecular evolution between reptiles (including birds) and mammals:
(i) the IIS/TOR network, including the critical nodes insulin receptor
substrate (IRS) and phosphatidylinositol 3-kinase (PI3K), exhibit divergent evolutionary rates between reptiles and mammals; (ii)
compared with a proxy for the rest of the genome, genes of the
IIS/TOR extracellular network exhibit exceptionally fast evolutionary rates; and (iii) signatures of positive selection and coevolution
of the extracellular network suggest reptile- and mammal-specific
interactions between members of the network. In reptiles, positively selected sites cluster on the binding surfaces of insulin-like
growth factor 1 (IGF1), IGF1 receptor (IGF1R), and insulin receptor
(INSR); whereas in mammals, positively selected sites clustered on
the IGF2 binding surface, suggesting that these hormone-receptor
binding affinities are targets of positive selection. Further, contrary
to reports that IGF2R binds IGF2 only in marsupial and placental
mammals, we found positively selected sites clustered on the hormone binding surface of reptile IGF2R that suggest that IGF2R binds
to IGF hormones in diverse taxa and may have evolved in reptiles.
These data suggest that key IIS/TOR paralogs have sub- or neofunctionalized between mammals and reptiles and that this network
may underlie fundamental life history and physiological differences
between these amniote sister clades.
7056 | www.pnas.org/cgi/doi/10.1073/pnas.1419659112
Mammal−Reptile Divergence
3
0.5
Median Ks
IIS/TOR Network Contains Fast-Evolving Outliers. Twenty-six genes
(of 61 analyzed) from the IIS/TOR network exhibited divergent
evolutionary rates between reptiles and mammals (i.e., significant likelihood ratio test between the null and alternative models)
using the Clade model in PAML (31) and a P value estimated
following refs. 32 and 33 and corrected for multiple tests by sequential Bonferroni (Table S3, CMC reptiles). For 20 of these
26 divergent genes, the ω [nonsynonymous substitutions per nonsynonymous site (Ka)/synonymous substitutions per synonymous
site (Ks)] for reptiles was significantly greater than the ω for the
rest of the tree (e.g., mammals, χ2 = 7.54, P = 0.006). We obtained
similar results for a paired Wilcoxon test (P = 0.056), and this
Evidence for Positive Selection. To understand how positive selection may have shaped the genes within the IIS/TOR network,
we used the branch-site test in PAML, which tests for molecular
evolution at the nucleotide level with functional impacts at the
protein level. In the first analysis, the branch leading to reptiles
was tested for evidence of positive selection (i.e., was placed in
the “foreground,” which functions to test the predicted ancestral
reptile against all mammals). Eighteen genes showed significant
signatures of positive selection along this branch leading to reptiles, six of which remained significant after sequential Bonferroni
correction: IGF2R and five intracellular genes [protein kinase C
gamma (PRKCG), inositol polyphosphate phosphatase-like 1,
phosphatidylinositol 3-kinase regulatory subunit (PIK3R), IRS1,
and IRS2] (Table S3). In the second analysis, with the branch
leading to mammals designated as the foreground branch, testing
this predicted ancestral mammalian branch against all reptiles, 23
genes showed significant signatures of positive selection, 9 of which
remained significant after sequential Bonferroni correction. This
group included most of the genes that were significant along the
Median Ka
Results
We identified an average of 31,060 unique ORFs per species
(range, 15,893–102,156) and used OrthoMCL (29) and quality
control methods to produce alignments of putative orthologs
across 66 species (Table S2) (see data deposition footnote and
SI Text).
We focused on 61 genes from the IIS/TOR network that were
identified through KEGG pathways (Kyoto encyclopedia of
genes and genomes, ref. 30) and/or previous publications (Fig. S1
and Table S1) (8, 24). Alignments of these focal genes contained
19–66 species (median = 62, mode = 66; mean = 58.4; Table S1).
To provide a proxy for evolution of the noninsulin signaling
genes in the genome, we used 1,417 putative orthologs that
contained all 66 species and referred to these as control genes.
We also analyzed (i) 48 of the 61 focal genes that had greater
representation of species within the alignments (56–66 species,
median = 63.5, mode = 66, mean = 62.6), and (ii) a control and
IIS/TOR focal gene set that contained phylogenetically matched
species (43 control genes and 43 focal genes with identical species). Both of these analyses are presented in the SI Materials and
Methods and are consistent with the analysis reported below that
extracellular genes of the network are highly divergent outliers.
difference in divergence between clades was also seen among
control genes (SI Results).
Extracellular genes of the IIS/TOR network exhibited greater
divergence between mammals and reptiles than 1,417 control
genes and intracellular genes when measured by the median of
all pairwise mammal-reptile Ka/Ks measures. Extracellular genes
had equivalent Ks compared with control genes (Wilcoxon rank
sum test, W = 5476, P = 0.2154), but had notably greater median
ω (W = 2998, P = 0.002) and Ka (W = 2333.5, P < 0.001; Fig. 1).
Compared with intracellular genes, extracellular genes also had
significantly higher ω and Ka (ω: W = 129.5, P = 0.02; Ka: W = 88,
P = 0.001), but Ks did not differ (W = 209, P = 0.375).
Collectively, the intracellular IIS/TOR genes did not have elevated median Ka, Ks, or ω compared with control genes (P >
0.467 in all cases; Fig. 1). In all cases, the median for intracellular
and control genes was identical to the hundredths place. The
median Ks for extracellular genes was 1.59, and the median Ks
for intracellular and control genes was 1.60. The median Ka for
extracellular genes was 0.17, and the median Ka for intracellular
and control genes was 0.09. The median ω for extracellular genes
was 0.11 and for intracellular and control genes ω was 0.06.
When comparing the distribution of ω values for each group of
IIS/TOR genes to the distribution of the ω values for the control
genes, the extracellular genes were 8.4 times more likely than
control genes to reside in the highest 5% of ω values (OR, 8.37;
95% CI, 2.12, 33.08). In comparison, the intracellular genes were
not significantly more likely than controls to be in top 5% (OR,
2.21; 95% CI, 0.82, 5.51). These odds ratios imply that the extracellular group contains the fastest evolving components of the
IIS/TOR network.
Median Ka/Ks
by bringing IGF2 into the cell for degradation (17). It is widely
hypothesized that IGF2-IGF2R binding is unique to therian
mammals (marsupials and placentals) due to sexual conflict in
regulating paternal IGF2 during placental and embryo development (10, 18–21).
Previous evolutionary analyses of IIS/TOR in vertebrate and
invertebrate lineages suggest that extracellular genes (Table S1
and Fig. S1) often experienced positive selection, whereas intracellular genes often experienced purifying selection (22–28),
especially farther downstream in the intracellular network. These
previous findings have supported the prediction that upstream
extracellular factors (e.g., the initial components that interact
with environmental stimuli in signal transduction pathways) may
have larger impacts on signaling through the network than downstream components. However, comparative studies of (co)evolution among extracellular components of the network have not
been possible with studies from invertebrates due to the near
absence of these paralogs. In addition, mammalian studies of this
network have not included the other half of the amniote group
(avian and nonavian reptiles), except chickens. Thus, a general
understanding of the evolution of this network and the coevolutionary relationships among the proteins of this network has
not been possible.
We analyze coding sequence data from 32 species of mammal—
an order of magnitude higher than previous comparative studies
of mammals—and 34 species of reptile (10 species of birds and
24 nonavian reptiles; Fig. S2). Analyses of this improved sampling
revealed that members of the IIS/TOR network, particularly extracellular and critical intracellular genes, exhibit exceptionally
fast evolutionary rates between mammals and reptiles relative to
the rest of the genome. Additionally, strong positive selection
occurs at amino acid sites important for hormone-receptor protein interactions, and this selection likely shapes binding affinities
in reptile- and mammal-specific ways.
0.4
0.2
2
0.3
0.2
0.1
1
0.1
0.0
0.0
Control Extra Intra
Control Extra Intra
Control Extra Intra
Fig. 1. Medians of pairwise measures between all reptiles and mammals
per gene for Ka, Ks, and Ka/Ks calculated in PAML. Control = 1,417 genes not
in the IIS/TOR network, with 66 taxa represented in the alignments. Intracellular = 51 genes. Extracellular = 10 genes (hormones, receptors, IGF binding
proteins). Extracellular genes exhibit significantly greater median Ka/Ks and
median Ka than control or intracellular genes.
McGaugh et al.
Positive Selection in the IIS/TOR Hormones and Receptors Show
Clade-Specific Patterns. Positively selected sites [i.e., those in-
dicated by PAML branch-site models with Bayes Empirical
Bayes (BEB) score of 0.9 or greater] were analyzed in the context of the protein structure and predicted protein-protein interactions between insulin/IGF hormones and their receptors.
We found reptile- and mammal-specific patterns of positive selection in the hormone and receptor domains that are important
for binding affinity. First, the mature INS hormone (containing
protein domains A and B) was conserved in reptiles and mammals, whereas the C-peptide that is cleaved from the mature
insulin protein (35) contained four positively selected sites in
mammals. Second, 5 of the 12 amino acids of the C-domain in
IGF1 in reptiles, but none in mammals, were positively selected.
Third, for IGF2, 3 of the 16 amino acids of the C-domain in
mammals, but none in reptiles, were positively selected (Fig. 2A).
IGF hormones bind to the receptors IGF1R and INSR by
interacting with specific domains on each receptor (L1, CR, and
L2) (36, 37). Variation in the C-domain of IGF1 and IGF2 can
regulate binding specificity to IGF1R (38) and to INSR (39)
through the interactions of the C-domain of the hormones with
the CR-domain in the binding pocket of IGF1R and INSR (36)
(Fig. 2B). Specifically, previous mutagenesis studies revealed that
altering one of the positively selected sites (IGF1 C-domain R37,
human numbering used throughout) disrupts IGF1-IGF1R binding (16, 40). In reptiles, positively selected sites were clustered
on the hormone-binding surface of the IGF1R CR domain and
in the binding pocket of INSR. They include IGF1R site F251,
A
C
B
A
C
Mammal B
D
C
C
A
A
RepƟle
Mammal
IGF2 P/T4
D
IGF2
R/Q1623
CR
B
N251
L1
IGF1
T/N1558
S37
C
L2
A
bloodstream to regulate their bioavailability (48, 49). These IGFBPs
are characterized by N- and C-terminal domains that cooperate to
bind IGFs; protease cleavage separating these domains decreases
affinity to IGFs. In both reptiles and mammals (except primates),
many of our assembled IGFBP transcripts were either completely
missing the N-terminal domain or it was truncated (Table S5 and
Fig. S3). As assembled, these transcripts would produce truncated
proteins with diminished binding affinity to IGF1 and IGF2. We
summarize putative losses and truncations in Table S4 to serve as a
hypothesis-generating resource for future validation work. Most
evident is IGFBP6, which was neither found in any archosaurs
(birds and crocodilians) nor in platypus (8). The 5′ end of IGFBP6
was truncated in nearly all other reptiles including genome-derived ENSEMBL sequences of the Anolis lizard and the Pelodiscus
IGF2
B
D
Binding Proteins Exhibit Putative Truncations of Important Functional
Domains. IGF binding proteins bind to IGF1 and IGF2 in the
B RepƟle IGF1R with IGF1
Ligands
IGF1
RepƟle
which affects IGF1-IGF1R binding in humans through its interaction with the IGF1 C-domain (36) (Table S4).
In therian mammals, IGF2R binds IGF2 with relatively high
affinity, but studies of this interaction in reptiles (mainly
chickens) have yielded conflicting results (18, 21, 41, 42). We
found that IGF2R has been shaped by putatively strong positive
selection within reptiles and positively selected sites clustered on
the IGF2R protein surface in domain 11, which is intimately
involved in binding IGF1 and IGF2. Several of the positively
selected sites on the protein surface of IGF2R in reptiles are
essential for binding IGF2 based on mutagenesis studies and the
crystal structure of the IGF2R-IGF2 complex (e.g., Y1542) (43–
45) (Fig. 2C and Table S4). Although some variants in IGF2R
would predict decreased binding to IGF2, such as in chicken,
many variants in snakes and lizards predict increased binding to
IGF2 and/or IGF1 because they exhibit similar biochemical properties as the human amino acids (e.g., Y1542F in snakes and
Y1542L/M in lizards; Table S4). Utilizing Coevolutionary Analysis
Using Protein Sequences (CAPS) (46), we identified that amino
acid site P4 of IGF2 is coevolving with the positively selected site
on the binding surface of IGF2R (site R1623, ρ = 0.4, P < 0.01) in
reptiles (Fig. 2C). Among reptiles, MatrixMatchMaker version II
(MMMvII) (47) identified sunbeam and viper boa snakes as having
the tightest coevolutionary signal between IGF2 and IGF2R (ρ =
1), and identified brown anole, green anole, and gecko lizards as
having the tightest coevolutionary signal between IGF1 and
IGF2R (ρ = 0.33). Thus, among reptiles, IGF2R binding of IGFs
is most likely to be found in the Squamates.
IGF2R
Domain 11
N1558
IGF2R
Domain 11
V1609
D
Fig. 2. Protein structures for reptile and mammal IGF hormones and receptors. Reptile protein structures predicted from snake sequence homology modeled
onto human protein structures from the Protein Data Bank. Enlarged positions indicate the amino acid sites predicted to be under positive selection (Table
S3). (A) Reptile and mammal IGF hormones with their protein domains color coded. Positively selected sites cluster on the C-domain of reptile IGF1 but are not
present in the C-domain of the reptilian IGF2. In contrast, positively selected sites cluster on the C-domain of mammal IGF2. (B) The α chain of reptile IGF1R
homodimer with hormone binding domains L1, CR, and L2 labeled. The square is an enlargement with IGF1 orientated in the IGF1R binding pocket to
demonstrate the clustering of positively selected sites on the interacting IGF1-IGF1R binding surfaces (36). Labeled sites (IGF1 S37 and N251; human numbering) are known to affect IGF hormone and receptor binding (Table S4). (C) Domain 11 of reptile and mammal IGF2R with IGF2 oriented toward the binding
pocket to demonstrate the clustering of positively selected sites on the reptile IGF2R binding surfaces (43, 44). The magenta sites on reptile IGF2 and IGF2R
were identified as coevolving amino acids using CAPS (46). Labeled sites IGF2R (1558 and 1609; human numbering) are predicted to regulate IGF2-IGF2R
binding (Table S4). Like mammals, some lizards have IGF2R N1558.
McGaugh et al.
PNAS | June 2, 2015 | vol. 112 | no. 22 | 7057
EVOLUTION
reptile branch (IGF2R, IRS1, IRS2, PRKCG, and PIK3R), as well
as others (Table S3).
We also performed the branch-site test with specific lineages
within reptiles, because previous research indicated that genes
of the IIS/TOR network may be under strong positive selection
in Squamata (lizards and snakes) (34) (Table S3). Overall, the
branch leading to Squamata had more genes under positive
selection (number of genes = 7 of 61) than on the branches
leading to crocodilians (n = 6), birds (n = 1), and turtles (n = 5)
(when separate tests were conducted for each), i.e., minor
differences (Table S3). In additional tests using the clade
model, we found that snakes had larger ω relative to the rest of
the tree (paired Wilcoxon-sign rank test, V= 24, P = 0.04)
across the 15 IIS/TOR network genes that were significant after
multiple test correction.
turtle. Furthermore, in examining the three N-terminal amino
acids that are conserved across all binding proteins in humans
(49), only one of these amino acids was conserved in only two
snake species in IGFBP6, although all three sites were conserved
across the reptile IGFBP2-5. For those amino acids important
for binding IGFs and specific to IGFBP6 (49), only 7 of 12 are
conserved in reptiles. Two of these seven conserved amino acids
have additional functions beyond IGF binding, which requires
conservation (Fig. S3). These multiple lines of evidence suggest
that IGBP6 does not function as an IGF binding protein across
the reptile clade.
Discussion
We conducted extensive evolutionary analyses of the IIS/TOR
network in amniotes (i.e., mammals and reptiles, including birds)
and uncovered fundamental differences between reptiles and
mammals in the evolution of this centrally important network.
Our analyses revealed that members of the IIS/TOR network
have exceptionally fast evolutionary rates between reptiles and
mammals compared with a proxy for the rest of the genome.
More specifically, the extracellular network is a target of positive
selection, and the location of the selected sites suggests changes
in the hormone-receptor binding relationships in reptile- and
mammal-specific patterns.
Members of IIS/TOR Network Are Outliers in Evolutionary Rate.
Members of the IIS/TOR network, especially the extracellular
hormones, receptors, and binding proteins, exhibit remarkably
high reptile-mammal divergence compared with control genes.
Our results complement those of ref. 24, who found that the IIS/
TOR network across human populations is enriched for genes
evolving under positive selection relative to a sample representing the genomic background. Across the amniote scale that
we examined, many evolutionary innovations have arisen (e.g.,
feathers/hair, leglessness, endothermy), and each was likely accompanied by substantial molecular evolution. However, within
the 1,478 total genes that we analyzed (61 IIS/TOR network
genes plus 1,417 non-IIS/TOR genes), the evolution of the IIS/
TOR extracellular network is a prominent outlier in reptilemammal divergence. Our results provide additional evidence
that the phenotypes governed by this pathway, including metabolism and life histories, are key differences between reptiles
and mammals.
Our data show that multiple IIS/TOR genes are under positive
selection in one or more lineages of amniotes. Importantly, these
include genes that encode proteins in critical nodes of the IIS/
TOR network that mediate the intracellular signal (e.g., IRS and
PI3K) (5) and extracellular nodes that regulate the initiation of
the cascade (IGF1R, INSR, IGFBP4, IGFBP5, and IGF2R).
Although these genes are implicated in aging and disease phenotypes (3, 50), here we find they are also under positive selection
among amniote species. Because vertebrate IIS/TOR connects
with many other networks, we cannot directly compare our results
to studies in the more simplified invertebrate network (23, 26, 27).
However, our findings of elevated ωs agree with those of ref. 28
and supply further support that extracellular components are
among the fastest evolving genes in the IIS/TOR network (22),
as is likely true in other networks. Overall, our data are in
agreement with reports that receptors and other extracellular
components of signal transduction pathways appear to be under
less purifying selection than intracellular components (51–53).
Indeed, our data indicate that one potential driver of differences
in evolutionary rates among genes in the network may be the
number of interactions that a gene has with other genes or
proteins (i.e., connectivity; SI Results) similar to what has been
seen in other systems (54–56, but see refs. 27, 57, and 58).
Evolving Interactions in the Extracellular Network. Our data strongly
support the conclusion that many of the IIS/TOR extracellular
proteins have undergone positive selection. Detailed evaluation
of the protein structure of the hormones, receptors, and binding
7058 | www.pnas.org/cgi/doi/10.1073/pnas.1419659112
proteins of IIS/TOR suggests that these binding relationships are
targets of clade-specific selection between mammals and reptiles.
In reptiles, structural evaluations indicated that residues on the
interacting binding surfaces of IGF1 C-domain and the IGF1R
CR-domain are under positive selection. Specifically, positively
selected amino acid sites identified by our models have previously
been shown to modulate binding when altered in humans (16).
More broadly, mutations predicted to affect this binding relationship are associated with longevity in humans (59) and model organisms (3). In contrast, positive selection on the IGF2 C-domain in
mammals suggests that IGF2 binding affinities with both INSR and
IGF1R may be targets of positive selection in mammals. The
positive selection putatively affecting hormone-receptor binding
relationships across amniote species equates to selection at the
cell surface start of the IIS/TOR signaling cascade. Functional
studies are necessary to further our understanding of the regulatory effects of these changes and the stability of the physiological roles of extracellular IIS/TOR proteins across amniotes.
Juvenile and adult IGF2 gene expression is observed in humans (60, 61) and fish (62), but not in adult mice and rats (the
typical vertebrate models for studying the IIS/TOR network)
(63). We found IGF1 and IGF2 gene expression in each of our
reptile transcriptomes, regardless of whether the source liver was
from a juvenile or an adult (Table S2). This observation underscores the importance of broad taxonomic sampling for understanding the function and evolution of pathways important to
human health—for which rodent models may not always be the
most appropriate. Together, our molecular evolution and expression data suggest that the IGF2 protein may have a more
stable role in IIS/TOR signaling across reptiles, in contrast to the
more variable and specialized roles of IGF2 across mammals (64).
IGF2R-IGF2 binding is believed to have evolved in therian
mammals for maternal regulation of paternally imprinted IGF2
(10). This hypothesis of mammalian-specific function was bolstered by early studies showing that IGF2R does not effectively
bind IGF2 in chicken (18, 21, 41), Xenopus (18), or monotremes
(10), and IGF2R has lower affinity for IGF2 in marsupials compared with placental mammals (19, 20). However, more sensitive
assays have indicated that IGF2R-IGF2 binding occurs in
chicken, trout, and garden lizards (42, 65, 66), which counters the
claim that measurable IGF2-IGF2R binding is confined to
mammals. Our data provide support for the hypothesis that
positive selection drove the high-affinity binding between IGF2R
and IGF2 in placental mammals relative to monotremes and
marsupials. Our data also call into question the assumption that
IGF2R does not bind IGF hormones in reptiles. IGF2R in
chicken contains a substitution thought to inhibit IGF2 binding
ability (isoleucine to leucine at 1572, I1572L) (67, 68). However,
our work shows that this amino acid is a conserved isoleucine in
many reptile species, even within other birds (66). Additionally,
many of the sites that are important for binding of IGF2 to
IGF2R in mammals are conserved across most reptiles in our
study. Because chickens have typically been used as the sole representative of the reptile clade, we suggest that this narrow sampling promoted the premature conclusion that IGF2 binds IGF2R
only in mammals. Further, in reptiles, we found a signal of coevolution between IGF2 and IGF2R in our CAPs and MMMvII
analyses. Additionally, we found three sites under positive selection
on the surface of the IGF1 that would likely promote binding with
IGF2R (34, 67). Thus, by extending the comparative genomic
landscape, we suggest that IGF-IGF2R binding may not be unique
to therian mammals but also may occur in some reptile species.
IGF binding proteins regulate the ability of hormones to activate receptors through steric hindrance, thereby limiting the
bioavailability of IGF1 and IGF2 to initiate the IIS/TOR signaling cascade (49). Intriguingly, we found that many reptile
species appear to have truncated or missing N-terminal domains
across the IGFBPs that would decrease IGF binding affinity.
Confirming results from ref. 8, IGFBP6 was not recovered from
opossum, platypus, or any bird or crocodile. When identified
in our other reptile transcriptomes and ENSEMBL-derived
McGaugh et al.
been associated with longevity in humans (59, 78, 80–82).
Likewise, our comparative genomic analyses show that many
IIS/TOR genes are variable across amniotes and that the
binding affinities of IGF1, IGF1R and INSR, and thereby the
initiation of IIS/TOR signaling, is likely impacted. Future
comparative analyses of the IIS/TOR network across amniotes
and within reptiles may provide unique insights into the regulation of body size, reproductive investment (e.g., placentation), and rates of aging (83).
Materials and Methods
We used transcriptomic and genomic data across amniotes to evaluate
molecular evolution of the IIS/TOR pathway between reptiles and mammals.
All animal protocols were approved by the Iowa State University Institutional
Animal Care and Use Committee (log 3-2-5125J). De novo liver transcriptome
assembly was performed in Trinity (Table S2), and some gene sets were
obtained through past studies (Table S2). The longest ORF from each assembled transcript was used for defining homologs through OrthoMCL (29). Sequences within each putative ortholog were further clustered so that a single
transcript represented each ortholog from each species. Transcripts were
translated, and amino acid sequences were aligned with MSAprobs (84).
Alignments were back-translated to the original nucleic acids with RevTrans
(85) and trimmed of poorly aligned regions using Gblocks (86).
These cleaned nucleotide alignments were analyzed for molecular evolutionary parameters and models of sequence evolution in PAML (31). Positively selected sites for extracellular genes were predicted for reptiles and
mammals using the branch-site model in PAML. Sites with signatures of
positive selection were evaluated for putative functional significance on
human protein structures from the Protein Data Bank (PDB) or predicted
reptile structures from homology modeling of snake sequences onto human
structures. Hormone and IGF2R amino acid alignments were used for coevolution analyses with CAPS (46) [significance of permutations (P < 0.01)
detailed in SI Materials and Methods] and MMMvII (47) (tolerance level: 0.2).
We describe each of these steps in detail in SI Materials and Methods.
Comparative Genomics Approach. The insights our study provide
into the evolution of the IIS/TOR network were previously unattainable without adequate molecular resources in reptiles. Our
work adds to the recent discoveries of rapid evolution of genes
involved in development and metabolism in the branch leading
to modern snakes (71) and of regulatory innovation in IGFBP2
and IGFBP5 in the branch leading to modern birds (72). Although de novo transcriptome assemblies may not fully reveal all
biologically important signals in data (such as species-specific
isoforms and very recent paralogs) (73), when combined with
available genomes, ours revealed insights into the evolution of
the IIS/TOR network. Although the core of the IIS/TOR network is conserved in animals (4, 5), we found high divergence
and selection on genes in this network between mammals and
their sister clade reptiles (including birds). The extracellular
genes of this network had exceptionally fast divergence between
reptiles and mammals relative to genomic background, and many
genes have been shaped by positive selection. Hormones, receptors, and binding proteins that are essential for producing a
physiological response to environmental stimuli have undergone
taxon-specific patterns of positive selection. Our results suggest
that key paralogs have subfunctionalized or neofunctionalized
between reptiles and mammals and that this network may underlie
fundamental life history and physiological differences between
these clades.
In a larger context, the strength of comparative biology in understanding human health and disease lies in its power to distinguish conserved vs. flexible mechanisms of normal and disease
states and thereby suggest worthy targets of biomedical research
into future interventions (74, 75). For example, lifespan extension
is observed with mutant IGF1, IGF1R, and IRS across diverse
model species (3, 76–79)—where a shared effect on IIS/TOR
signaling is to either decrease rates of signaling by disrupting
protein-protein interactions or to decrease normal levels of
hormone or receptor. In addition, the IIS/TOR network has
ACKNOWLEDGMENTS. We thank D. Warner, R. Telemeco, A. Cordero,
N. Ford, K. Wray, T. Owerkowicz, and C. Watson for contributing specimens;
E. Tillier for advice on MMMvII; and A. Brown, J. P. de Magalhaes, and an
anonymous reviewer for useful comments. We thank the Baylor College of
Medicine and The Genome Institute at Washington University in St. Louis for
use of the unpublished genomic sequence. We thank the Broad Institute
Genomics Platform, Vertebrate Genome Biology group, J. Alfoldi, and
K. Lindblad-Toh for making the Mustela putorius and Microtus ochrogaster
data available. We are grateful for resources from the University of Minnesota
Supercomputing Institute, University of Alabama at Birmingham Office of
Energetics, and the Iowa State University High Performance Computing facility.
This research was supported by National Science Foundation (NSF) Grants IOS0922528 and IOS-1253896 (to A.M.B.) and DEB-DDIG-1011350 (to A.M.B.
and T.S.S.) and grants from the Iowa State University Center for Integrated
Animal Genomics (to A.M.B. and F.J.J.). We acknowledge additional support
from the NSF (Graduate Research Fellowship to S.E.M.), the James S. McDonnell
Foundation (postdoctoral fellowship to T.S.S.), the Howard Hughes Medical Institute (postdoctoral support to E.A.A.), and Academia Sinica (C.H.K.).
1. Wullschleger S, Loewith R, Hall MN (2006) TOR signaling in growth and metabolism.
Cell 124(3):471–484.
2. Zoncu R, Efeyan A, Sabatini DM (2011) mTOR: From growth signal integration to
cancer, diabetes and ageing. Nat Rev Mol Cell Biol 12(1):21–35.
3. Kenyon CJ (2010) The genetics of ageing. Nature 464(7288):504–512.
4. Oldham S (2011) Obesity and nutrient sensing TOR pathway in flies and vertebrates: Functional conservation of genetic mechanisms. Trends Endocrinol Metab
22(2):45–52.
5. Taniguchi CM, Emanuelli B, Kahn CR (2006) Critical nodes in signalling pathways:
Insights into insulin action. Nat Rev Mol Cell Biol 7(2):85–96.
6. Olinski RP, Lundin L-G, Hallböök F (2006) Conserved synteny between the Ciona genome and human paralogons identifies large duplication events in the molecular
evolution of the insulin-relaxin gene family. Mol Biol Evol 23(1):10–22.
7. Hernández-Sánchez C, Mansilla A, de Pablo F, Zardoya R (2008) Evolution of the insulin receptor family and receptor isoform expression in vertebrates. Mol Biol Evol
25(6):1043–1053.
8. Daza DO, Sundström G, Bergqvist CA, Duan C, Larhammar D (2011) Evolution of the insulinlike growth factor binding protein (IGFBP) family. Endocrinology 152(6):2278–2289.
9. O’Neill MJ, et al. (2007) Ancient and continuing Darwinian selection on insulin-like
growth factor II in placental fishes. Proc Natl Acad Sci USA 104(30):12404–12409.
10. Killian JK, et al. (2000) M6P/IGF2R imprinting evolution in mammals. Mol Cell 5(4):
707–716.
11. Schwartz TS, Bronikowski AM (2011) Molecular stress pathways and the evolution of
life histories in reptiles. Molecular Mechanisms of Life History Evolution, ed Heyland F
(Oxford Univ Press, Oxford, UK).
12. de Magalhães JP, Toussaint O (2002) The evolution of mammalian aging. Exp Gerontol 37(6):769–775.
13. Swanson EM, Dantzer B (2014) Insulin-like growth factor-1 is associated with lifehistory variation across Mammalia. Proc Royal Soc B Biol Sci 281(1782):20132458.
14. Sparkman AM, Vleck CM, Bronikowski AM (2009) Evolutionary ecology of endocrinemediated life-history variation in the garter snake Thamnophis elegans. Ecology
90(3):720–728.
15. Sparkman AM, Byars D, Ford NB, Bronikowski AM (2010) The role of insulin-like
growth factor-1 (IGF-1) in growth and reproduction in female brown house snakes
(Lamprophis fuliginosus). Gen Comp Endocrinol 168(3):408–414.
16. Denley A, Cosgrove LJ, Booker GW, Wallace JC, Forbes BE (2005) Molecular interactions of the IGF system. Cytokine Growth Factor Rev 16(4-5):421–439.
17. Ghosh P, Dahms NM, Kornfeld S (2003) Mannose 6-phosphate receptors: New twists in
the tale. Nat Rev Mol Cell Biol 4(3):202–212.
18. Clairmont KB, Czech MP (1989) Chicken and Xenopus mannose 6-phosphate receptors
fail to bind insulin-like growth factor II. J Biol Chem 264(28):16390–16392.
19. Dahms NM, Brzycki-Wessell MA, Ramanujam KS, Seetharam B (1993) Characterization
of mannose 6-phosphate receptors (MPRs) from opossum liver: Opossum cationindependent MPR binds insulin-like growth factor-II. Endocrinology 133(2):440–446.
McGaugh et al.
PNAS | June 2, 2015 | vol. 112 | no. 22 | 7059
EVOLUTION
genomic data, the N terminus of the protein was truncated.
These data suggest that across reptiles, IGFBP6 is not functioning as an IGF binding protein. Like IGF2-IGF2R binding,
IGF2-IGFBP6 binding in mammals functions to regulate IGF2
levels during embryo development in placental mammals (69).
The putative loss of this regulatory mechanism in both reptiles and
some nonplacental mammals is particularly interesting given that
placentation has evolved not only in mammals but also in various
snake and lizard species (70). Thus, our data suggest that in many
reptiles (i) IGFBP6 has been lost, (ii) IGF2R binds IGF hormones, and (iii) novel positive selection characterizes IGF1-IGF1R
binding. Therefore, future functional assays should address the
role of IIS/TOR extracellular signaling in the evolution of
viviparity and placentation in Squamates, relative to that in
placental mammals (10) and placental fish (9).
20. Yandell CA, Dunbar AJ, Wheldrake JF, Upton Z (1999) The kangaroo cation-independent mannose 6-phosphate receptor binds insulin-like growth factor II with low
affinity. J Biol Chem 274(38):27076–27082.
21. Canfield WM, Kornfeld S (1989) The chicken liver cation-independent mannose
6-phosphate receptor lacks the high affinity binding site for insulin-like growth factor
II. J Biol Chem 264(13):7100–7103.
22. Alvarez-Ponce D, Aguadé M, Rozas J (2013) comment on “The Molecular evolutionary
patterns of the Insulin/FOXO signaling pathway”. Evol Bioinform Online 9:229–234.
23. Alvarez-Ponce D, et al. (2012) Molecular population genetics of the insulin/TOR signal
transduction pathway: A network-level analysis in Drosophila melanogaster. Mol Biol
Evol 29(1):123–132.
24. Luisi P, et al. (2012) Network-level and population genetics analysis of the insulin/TOR
signal transduction pathway across human populations. Mol Biol Evol 29(5):1379–1392.
25. Alvarez-Ponce D, Aguadé M, Rozas J (2011) Comparative genomics of the vertebrate
insulin/TOR signal transduction pathway: A network-level analysis of selective pressures. Genome Biol Evol 3:87–101.
26. Alvarez-Ponce D, Aguadé M, Rozas J (2009) Network-level molecular evolutionary
analysis of the insulin/TOR signal transduction pathway across 12 Drosophila genomes. Genome Res 19(2):234–242.
27. Jovelin R, Phillips PC (2011) Expression level drives the pattern of selective constraints
along the insulin/Tor signal transduction pathway in Caenorhabditis. Genome Biol
Evol 3:715–722.
28. Wang M, et al. (2013) The molecular evolutionary patterns of the Insulin/FOXO signaling pathway. Evol Bioinform Online 9:1–16.
29. Li L, Stoeckert CJ, Jr, Roos DS (2003) OrthoMCL: Identification of ortholog groups for
eukaryotic genomes. Genome Res 13(9):2178–2189.
30. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic
Acids Res 28(1):27–30.
31. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol
24(8):1586–1591.
32. Self SG, Liang K-L (1987) Asymptotic properties of maximum likelihood estimators and
likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82(398):605–610.
33. Goldman N, Whelan S (2000) Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol Biol Evol 17(6):975–978.
34. Sparkman AM, et al. (2012) Rates of molecular evolution vary in vertebrates for insulin-like growth factor-1 (IGF-1), a pleiotropic locus that regulates life history traits.
Gen Comp Endocrinol 178(1):164–173.
35. Wahren J (2004) C-peptide: New findings and therapeutic implications in diabetes.
Clin Physiol Funct Imaging 24(4):180–189.
36. Keyhanfar M, Booker GW, Whittaker J, Wallace JC, Forbes BE (2007) Precise mapping
of an IGF-I-binding site on the IGF-1R. Biochem J 401(1):269–277.
37. Epa VC, Ward CW (2006) Model for the complex between the insulin-like growth
factor I and its receptor: Towards designing antagonists for the IGF-1 receptor. Protein Eng Des Sel 19(8):377–384.
38. Bayne ML, et al. (1989) The C region of human insulin-like growth factor (IGF) I is
required for high affinity binding to the type 1 IGF receptor. J Biol Chem 264(19):
11004–11008.
39. Denley A, et al. (2004) Structural determinants for high-affinity binding of insulin-like
growth factor II to insulin receptor (IR)-A, the exon 11 minus isoform of the IR. Mol
Endocrinol 18(10):2502–2512.
40. Zhang W, Gustafson TA, Rutter WJ, Johnson JD (1994) Positively charged side chains
in the insulin-like growth factor-1 C- and D-regions determine receptor binding
specificity. J Biol Chem 269(14):10609–10613.
41. Yang YW, Robbins AR, Nissley SP, Rechler MM (1991) The chick embryo fibroblast
cation-independent mannose 6-phosphate receptor is functional and immunologically related to the mammalian insulin-like growth factor-II (IGF-II)/man 6-P receptor
but does not bind IGF-II. Endocrinology 128(2):1177–1189.
42. Koduru S, Yadavalli S, Nadimpalli SK (2006) Mannose 6-phosphate receptor (MPR 300)
proteins from goat and chicken bind human IGF-II. Biosci Rep 26(2):101–112.
43. Brown J, Jones EY, Forbes BE (2009) Interactions of IGF-II with the IGF2R/cationindependent mannose-6-phosphate receptor mechanism and biological outcomes.
Vitam Horm 80:699–719.
44. Williams C, et al. (2012) An exon splice enhancer primes IGF2:IGF2R binding site
structure and function evolution. Science 338(6111):1209–1213.
45. Brown J, et al. (2008) Structure and functional analysis of the IGF-II/IGF2R interaction.
EMBO J 27(1):265–276.
46. Fares MA, McNally D (2006) CAPS: Coevolution analysis using protein sequences. Bioinformatics 22(22):2821–2822.
47. Rodionov A, Bezginov A, Rose J, Tillier ER (2011) A new, fast algorithm for detecting
protein coevolution using maximum compatible cliques. Algorithms Mol Biol 6(1):17.
48. Duan C, Xu Q (2005) Roles of insulin-like growth factor (IGF) binding proteins in
regulating IGF actions. Gen Comp Endocrinol 142(1-2):44–52.
49. Forbes BE, McCarthy P, Norton RS (2012) Insulin-like growth factor binding proteins:
A structural perspective. Front Endocrinol (Lausanne) 3:38.
50. Moloney AM, et al. (2010) Defects in IGF-1 receptor, insulin receptor and IRS-1/2 in
Alzheimer’s disease indicate possible resistance to IGF-1 and insulin signalling. Neurobiol Aging 31(2):224–243.
51. Han M, et al. (2013) Evolutionary rate patterns of genes involved in the Drosophila
Toll and Imd signaling pathway. BMC Evol Biol 13(1):245.
52. Song X, Jin P, Qin S, Chen L, Ma F (2012) The evolution and origin of animal Toll-like
receptor signaling pathway revealed by network-level molecular evolutionary analyses. PLoS ONE 7(12):e51657.
53. Cui Q, Purisima EO, Wang E (2009) Protein evolution on a human signaling network.
BMC Syst Biol 3(1):21.
7060 | www.pnas.org/cgi/doi/10.1073/pnas.1419659112
54. Montanucci L, Laayouni H, Dall’Olio GM, Bertranpetit J (2011) Molecular evolution
and network-level analysis of the N-glycosylation metabolic pathway across primates.
Mol Biol Evol 28(1):813–823.
55. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate
in the protein interaction network. Science 296(5568):750–752.
56. Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution
rate and the number of protein-protein interactions. BMC Evol Biol 3(1):11.
57. Bloom JD, Adami C (2004) Evolutionary rate depends on number of protein-protein
interactions independently of gene expression level: Response. BMC Evol Biol 4(1):14.
58. Larracuente AM, et al. (2008) Evolution of protein-coding genes in Drosophila. Trends
Genet 24(3):114–123.
59. Suh Y, et al. (2008) Functionally significant insulin-like growth factor I receptor mutations in centenarians. Proc Natl Acad Sci USA 105(9):3438–3442.
60. Hawkes C, Kar S (2004) The insulin-like growth factor-II/mannose-6-phosphate receptor: Structure, distribution and function in the central nervous system. Brain Res
Brain Res Rev 44(2-3):117–140.
61. Russo VC, Gluckman PD, Feldman EL, Werther GA (2005) The insulin-like growth
factor system and its pleiotropic functions in brain. Endocr Rev 26(7):916–943.
62. Yuan X-N, Jiang X-Y, Pu J-W, Li Z-R, Zou S-M (2011) Functional conservation and divergence of duplicated insulin-like growth factor 2 genes in grass carp (Ctenopharyngodon idellus). Gene 470(1-2):46–52.
63. Brown AL, et al. (1986) Developmental regulation of insulin-like growth factor II
mRNA in different rat tissues. J Biol Chem 261(28):13144–13150.
64. Killian JK, et al. (2001) Monotreme IGF2 expression and ancestral origin of genomic
imprinting. J Exp Zool 291(2):205–212.
65. Méndez E, Planas JV, Castillo J, Navarro I, Gutiérrez J (2001) Identification of a type II
insulin-like growth factor receptor in fish embryos. Endocrinology 142(3):1090–1097.
66. Sivaramakrishna Y, Amancha PK, Siva Kumar N (2009) Reptilian MPR 300 is also the
IGF-IIR: Cloning, sequencing and functional characterization of the IGF-II binding
domain. Int J Biol Macromol 44(5):435–440.
67. Zhou M, Ma Z, Sly WS (1995) Cloning and expression of the cDNA of chicken
cation-independent mannose-6-phosphate receptor. Proc Natl Acad Sci USA
92(21):9762–9766.
68. Garmroudi F, Devi G, Slentz DH, Schaffer BS, MacDonald RG (1996) Truncated forms
of the insulin-like growth factor II (IGF-II)/mannose 6-phosphate receptor encompassing the IGF-II binding site: Characterization of a point mutation that abolishes
IGF-II binding. Mol Endocrinol 10(6):642–651.
69. Gadd TS, Osgerby JC, Wathes DC (2002) Regulation of insulin-like growth factor
binding protein-6 expression in the reproductive tract throughout the estrous cycle and
during the development of the placenta in the ewe. Biol Reprod 67(6):1756–1762.
70. Murphy BF, Thompson MB (2011) A review of the evolution of viviparity in squamate
reptiles: The past, present and future role of molecular biology and genomics. J Comp
Physiol B 181(5):575–594.
71. Castoe TA, et al. (2013) The Burmese python genome reveals the molecular basis for
extreme adaptation in snakes. Proc Natl Acad Sci USA 110(51):20645–20650.
72. Lowe CB, Clarke JA, Baker AJ, Haussler D, Edwards SV (2015) Feather development
genes and associated regulatory innovation predate the origin of Dinosauria. Mol
Biol Evol 32(1):23–28.
73. Vijay N, Poelstra JW, Künstner A, Wolf JBW (2013) Challenges and strategies in
transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 22(3):620–634.
74. Austad SN (2010) Cats, “rats,” and bats: The comparative biology of aging in the 21st
century. Integr Comp Biol 50(5):783–792.
75. Alberts SC, et al. (2013) Reproductive aging patterns in primates reveal that humans
are distinct. Proc Natl Acad Sci USA 110(33):13440–13445.
76. Yamamoto R, Tatar M (2011) Insulin receptor substrate chico acts with the transcription factor FOXO to extend Drosophila lifespan. Aging Cell 10(4):729–732.
77. Bartke A (2008) Impact of reduced insulin-like growth factor-1/insulin signaling on
aging in mammals: Novel findings. Aging Cell 7(3):285–290.
78. Tacutu R, et al. (2013) Human Ageing Genomic Resources: Integrated databases and
tools for the biology and genetics of ageing. Nucleic Acids Res 41(Database issue, D1):
D1027–D1033.
79. Li Y, de Magalhães JP (2013) Accelerated protein evolution analysis reveals genes and
pathways associated with the evolution of mammalian longevity. Age (Dordr) 35(2):
301–314.
80. Soerensen M, et al. (2012) Human longevity and variation in GH/IGF-1/insulin signaling, DNA damage signaling and repair and pro/antioxidant pathway genes: Cross
sectional and longitudinal studies. Exp Gerontol 47(5):379–387.
81. Ziv E, Hu D (2011) Genetic variation in insulin/IGF-1 signaling pathways and longevity.
Ageing Res Rev 10(2):201–204.
82. de Magalhães JP (2014) Why genes extending lifespan in model organisms have not
been consistently associated with human longevity and what it means to translation
research. Cell Cycle 13(17):2671–2673.
83. Miller DAW, Janzen FJ, Fellers GM, Kleeman PM, Bronikowski A (2014) Biodemography
of ectothermic tetrapods provides insights into the evolution and plasticity of mortality
trajectories. Sociality, Hierarchy, Health: Comparative Demography Advances in Biodemography, eds Weinstein M, Lane MA (The National Academies Press, Washington, DC).
84. Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: Multiple sequence alignment based
on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26(16):1958–1964.
85. Wernersson R, Pedersen AG (2003) RevTrans: Multiple alignment of coding DNA from
aligned amino acid sequences. Nucleic Acids Res 31(13):3537–3539.
86. Castresana J (2000) Selection of conserved blocks from multiple alignments for their
use in phylogenetic analysis. Mol Biol Evol 17(4):540–552.
McGaugh et al.
Supporting Information
McGaugh et al. 10.1073/pnas.1419659112
SI Text
Summary of New Resources Available. For the 18 liver transcriptomes
we generated, the raw reads can be found at the NCBI Sequence
Read Archive (SRA062458 at www.ncbi.nlm.nih.gov/sra/?term=
SRA062458 and SRP017466 at www.ncbi.nlm.nih.gov/sra/?term=
SRP017466). Transcriptome assemblies, annotation summaries, and
alignments for protein coevolution analyses are available through
Dryad (dx.doi.org/10.5061/dryad.vn872). Individual identifiers for
these data can be found under citation in Table S2. Transcriptome
assemblies, annotation summaries, and alignments are available
through Dryad: dx.doi.org/10.5061/dryad.vn872.
i) The transcriptome assembly for each of the 18 individuals
sequenced. These assemblies contain the longest ORFs produced by Trinity, which were then clustered by UCLUST into
centroids to reduce redundancy within a single species’ transcriptome. A centroid may have collapsed multiple isoforms,
truncated transcripts, and alleles from a gene, but it may also
have collapsed very recent paralogs.
ii) Trinotate annotation databases for each individual. The IDs
in the database correspond to the centroid IDs in the transcriptome assembly described above.
iii) Putative ortholog amino acid alignments and corresponding
nucleotide alignments. We used OrthoMCL to cluster ORF
centroids into putative orthologs from all of the species included in this study. Data are available as separate files for
each ortholog (104,235 total orthologs with two or more
species). Additionally, we included a spreadsheet showing
the best BLAST hit of each putative ortholog cluster to
the uniprot database.
iv) “Best” ortholog amino acid and nucleotide alignments. The
104,235 putative orthologs described above often contained
more than two representative sequences per species. For the
first 15,000 putative orthologs (those with the most species
included in the alignments), we used UCLUST to find the
best representative per species per ortholog by taking the
sequence that was closest to the centroid for that ortholog.
v) The final nucleotide and amino acid alignments for the 1417
“control genes.”
vi) The hand-curated nucleotide and amino acid alignments for
61 IIS/TOR network genes.
SI Materials and Methods
Sample Collection. Animals or tissues used in this study were
provided by colleagues or our research colonies. Each individual
was maintained or shipped to Iowa State University (ISU). In
agreement with ISU Institutional Animal Care and Use Committee protocol 3-2-5125J, animals were euthanized by decapitation, exsanguinated, and dissected with relevant organs snap
frozen. The exceptions were the cottonmouth and alligator
(Agkistrodon piscivorus and Alligator mississippiensis), which were
euthanized onsite in Texas and California, respectively, following
our established protocol; snap-frozen tissues were sent to ISU.
The animals used were of a variety of ages and both sexes, thus
findings reported here are robust to variation in transcripts that
depend on age, sex, and rearing condition (Table S2).
Tissue and RNA Extraction and Sequencing. Total RNA was isolated
from 12 to 19 mg of snap-frozen liver from each of 18 individuals
from 17 species: a single individual for 16 species and two different
ecotypes from one species for Thamnophis elegans (Table S2 and
Fig. S2). We followed standard protocols including Qiagen
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
RNAeasy kit (Qiagen cat. no. 74104) with a DNA digestion on the
membrane, as described in the manual. The quality and quantity of
RNA was determined on an Agilent Bioanalyzer using a NanoRNA chip. For each sample, 1 μg total RNA was sent to the Duke
Genome Sequencing and Analysis Core Resource for library
preparation and to generate 100-bp paired-end reads using an
Illumina Hi SEq. 2000 with TruSeq v3 chemistry with a standard
insert size distribution. The library preparation protocol was
based on the technical document TruSeq_RNA_SamplePrep_
Guide_15008136_A. Individual libraries were uniquely barcoded
(indexed), and quality was checked on the Bioanalyzer DNA100
chip. For 15 non–garter snake species, five indexed libraries were
pooled in each lane, and ∼8 pM of library pool was deposited on
each lane. Because garter snakes (Thamnophis spp.) are focal
species in our laboratory, the two Thamnophis species (three
samples) were sequenced more deeply. The Thamnophis couchii indexed library was pooled with separately indexed libraries
from two individual Thamnophis elegans of different ecotypes
(1) (meadow and lakeshore in Table S2). This Thamnophis pool
(one T. couchii and two T. elegans individuals) was sequenced
twice, resulting in larger amounts of data available overall for
these two species. None of the libraries were normalized. The
raw reads for the 15 species excluding Thamnophis species can
be found at the SRA SRA062458. The raw reads for the three
garter snake liver transcriptomes (i.e., one from T. couchii and
two from T. elegans) can be found at the SRA SRP017466 (samples
HS08, HS11, and TC).
Processing and de Novo Assembly of Reads. For de novo assembly of
each species’ transcriptome, we used the Trinity version released
on February 25, 2013 (2). Original reads were processed by the
following methods.
The following processing steps were performed using the Fastx
tool kit, (hannonlab.cshl.edu/fastx_toolkit/), Cutadapt (3), and
Trimmomatic (4).
i) Fastx_trimmer was used to remove the first base, as Illumina
personnel indicate that this base can be unreliable (Gary
Schroth).
ii) Cut-adapt was used to trim adapters from the 3′ ends of reads
with an allowed error rate of 0.01.
iii) Trimmomatic was used to remove reads with sliding windows of 6bp that had average quality scores of 30 or less,
and then reads less than 30 bp in length were removed.
From this point, reads that were orphaned (only the left or the
right remained after processing) were removed from the left and
right read files. These reads were placed at the end of the left read
files, as specified in the Trinity manual. All default settings were
kept for transcriptome assembly.
Transcriptome Quality Assessment and Annotation. We sequenced
33.73–140.95 million reads per species (mean: 50.23; median:
42.10). Reads were assembled into 87,016–221,818 contigs using
Trinity (mean: 155,855; median: 165,685). Contigs shorter than
200 bp were excluded (5). Table S2 contains statistics about the
Trinity assemblies.
To evaluate the quality of a transcriptome assembly, we aligned
the assembled Trinity transcripts to the proteins of the UniProtKB/
Swiss-Prot database downloaded on March 21, 2013 using blastx with
an E-value cutoff of 1e-20 and allowing only a single target sequence
to be reported. Next, we determined the percent of the UniProtKB/
Swiss-Prot protein that aligned to the best matching Trinity transcript
1 of 19
through the perl script analyze_blastPlus_topHit_coverage.pl provided through Trinity.
Likely coding regions (ORFs) were extracted from Trinity
transcripts using Transdecoder. Transdecoder identified between
25,945 and 113,672 best ORFs (mean: 65,766; median: 72,152).
Transcriptome size of the best ORFs identified in Transdecoder
ranged from 27.80 to 113.60 Mb (mean = 69.54 Mb; median =
78.65 Mb), indicating ∼57- to 269-fold coverage when considering the amount of filtered and trimmed data input into Trinity
(range, 5.21–11.55 Gb; mean: 6.80 Gb; median = 6.43 Gb).
These ORFs were clustered into centroids using USEARCH (6)
separately for each transcriptome (see below for a more detailed
description).
The coding sequence of the peptides produced by Transdecoder and the centroids were also analyzed with the analyze_
blastPlus_topHit_coverage.pl script provided by Trinity to determine the percent length of coverage for the top hit in the
UniProtKB/Swiss-Prot database. We conducted this analysis on
the best ORF sequences and separately on the centroids to examine whether the Transdecoder or USEARCH processes resulted in ORFs that spanned a greater percent length of their best
blast hit relative to the originally produced Trinity transcript
contigs. Blastx analysis of the original Trinity transcripts to the
UniProtKB/Swiss-Prot database resulted in an average of 54.10%
(SD = 5.82%; median = 55.19%) of transcripts that matched a
hit in the UniProtKB/Swiss-Prot database, covering at least 80%
of the length of their best blast hit. This number increased
slightly when the best ORF transcriptomes provided by Transdecoder (average: 56.30%; SD: 5.50%; median: 56.64%) or the
USEARCH centroids (average: 58.00%; SD: 5.74%; median:
58.41%) were used in the Blastx analysis.
Last, because the Anolis carolinensis genome is published, we
examined the percent length of transcripts from the best ORF
analysis from Anolis sagrei, which aligned to the Anolis carolinensis
genome, using BLAT (7) (similar alignment tool to BLAST) to
provide a complementary measure of how many full-length transcripts were assembled. We did not do this for Alligator because
this genome is less complete and low-length measures can be a
reflection solely of a fragmented genome assembly.
We aligned Anolis sagrei Trinity-assembled Transdecoder-filtered RNAseq data to the Anolis carolinensis genome v2.0 genome scaffolds. From this, we found that 67% of transcripts
aligned over at least 95% of their length with at least 80%
identity, suggesting that ∼67% of our transcripts represent nearly
full-length transcripts. Interestingly, 89.5% of transcripts aligned
over at least 25% of their length, and only 51.3% of transcripts
aligned over 99% of their length, indicating that, although many
of our transcripts are present in the Anolis carolinensis genome, our
assembly of RNAseq data did not capture all full-length transcripts.
These percentages were comparable for the centroids (65.4%,
88.8%, and 49.3%, respectively).
The peptides from Transdecoder and centroids created in
USEARCH were annotated with the Trinotate pipeline, which
incorporates homology searches, protein domain identification,
protein signal prediction, and evaluation with EMBL Uniprot
eggNOG and GO Pathways databases. Specifically, we used
Trinotate to use blastp to find the top hit in the UniProtKB/SwissProt database (maximum e-value cutoff 0.001), HMMER to query
the PFAM database downloaded on March 29, 2013, signalP to
predict the presence and location of signal peptide cleavage sites,
and tmHMM to predict transmembrane helices in proteins. The
final Trinotate report was made with an e-value cutoff of 0.001 for
reporting the best blast hit and additional annotations. On average,
77.62% of the best ORFs had matches in UniProtKB/Swiss-Prot
database (maximum e-value cutoff of 0.001), 61.17% had matches
in the PFAM database, 5.73% had matches in signalP, and 12.38%
percent had matches in tmHMM. On average, 18.3% of centroids
were left with no annotation from any procedures performed
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
(range, 10.43–26.51%). All Trinotate annotation databases are
publically available on Dryad: dx.doi.org/10.5061/dryad.vn872.
Identifying Candidate Orthologs and Generating Multiple Species
Alignments. For any comparative evolutionary analysis, identifi-
cation of putative orthologs and accurate alignment are essential
but can be extremely challenging due to paralogs and alternative
splicing. In addition, we found that in some cases, a particular
species may have Trinity transcripts that blasted with high confidence to the particular gene of interest, but this species was
unrepresented in our final multiple species alignments because
Transdecoder did not include the transcript from that particular
gene in its best ORF candidate file. To avoid this complication, we
only used ORFs from the longest ORF file and not the best ORF
predictions.
We reduced overlap between the ORFs for each individual
species using USEARCH (6) with an identity threshold of 95%
of the nucleotide sequences sorted by length (gaps are counted
as differences in USEARCH). Because our goal was to cluster
isoforms to have one representative sequence per gene, we reduced the gap penalties to the settings -gapopen 5I/1E -gapext
0.1I/0.1E. These clustered centroids were used for all subsequent
analyses.
For these clustered ORFs for each species (centroids from
USEARCH), we identified putative 1:1 orthologs across species
using OrthoMCL (8), a program that is based on reciprocal best
blast hits. We analyzed a dataset that contained 74 total samples:
the 18 samples from our transcriptome project and 56 additional
transcriptomes and gene sets available from genome projects
and other past studies (Table S2). These literature-derived transcriptomes were made with various technologies and sometimes
pools of individuals. We used the transcriptome assemblies provided by the authors in all cases. Transdecoder and USEARCH
were run on literature-derived transcriptomes and RNA sets
downloaded from NCBI. Ensembl protein sets, and associated
cDNAs were downloaded from the Ensembl website and used
without additional processing steps. Species from Ensembl, where
the protein or gene datasets contained large contiguous stretches
of unknown bases, were not included in our analysis. All amino
acid and corresponding nucleotide clusters are available as separate files (104,235 total orthologs with two or more species) on
Dryad along with a spreadsheet showing the best blast hit of each
ortholog cluster to the uniprot database. In total, we started with
74 species, but pared this to 66 species for the alignments because
the additional eight species were not well represented. These eight
species (as named in the alignments: Python, Quail, Phrynops,
Tuatara, Caiman, Caretta, Elaphe, and Emys) generally had lower
quality or quantity of reads mined from previous studies, and all
74 species are represented in the original alignment data available
through Dryad: dx.doi.org/10.5061/dryad.vn872.
We focused our analysis on 61 genes in the IIS/TOR network.
The final set of genes (Fig. S1 and Tables S1 and S3) was determined by presence in KEGG pathways for Human Insulin
Signaling (KEGG 04910) and Human mTOR (KEGG 04150) (9,
10), connections with Panther Pathways for MAP kinase cascade
and insulin/IGF pathway-protein kinase B signaling cascade,
and/or previous publications (11). We specifically wanted to include the extracellular hormones, receptors, and binding proteins
in the insulin signaling network, which had not previously been
included.
To identify this focal set of genes in our OrthoMCL orthologs,
we performed two searches using Blastp. We made a reference
gene set from the KEGG proteins from chicken or anole. This
reference gene set was used as a blast database, and Blastp was
used to find hits of our translated orthologs to the KEGG-derived
protein blast database with an e-value cutoff of 1e-5. We also
required a percent identity of at least 50% and at least 60% of our
ortholog to align to the KEGG protein. Second, we conducted a
2 of 19
Blastp search using uniprot as the blast database. We used Blastp
to identify the best hit in the uniprot blast database for each of our
OrthoMCL-defined orthologs. For genes to be included in our
subsequent analyses, we used only those OrthoMCL-defined
orthologs where both the criteria for the KEGG protein blast was
met, and the description/name of the best blast hit from the uniprot
blast output matched the name of the focal KEGG protein.
For the genes of interest, many of the OrthoMCL-defined
orthologs contained multiple sequences from each species. Our
goal was to generate alignments with one sequence per gene per
species. We reduced redundancy in each OrthoMCL-defined
ortholog using USEARCH as above. For each species, we used only
the sequence that was most like the centroid of the USEARCHclustered OrthoMCL-defined ortholog. In a few cases, reptiles and
mammals formed separate clusters. All genes were clustered with
identical parameters in USEARCH; however, the few genes that
exhibited taxon-specific clusters may be particularly fast evolving
genes. For example, IGF2, PPP1R3D, MKNK1, and SOCS1 had
mammal-specific and reptile-specific clusters. In some cases, we
were able to combine these genes that appeared in separate clusters
into one single multiple sequence alignment (e.g., IGF1R).
For IRS4, marsupials and reptiles were clustered separately by
USEARCH, and placental mammals were grouped in a separate
ortholog by OrthoMCL. We did not combine these clusters for
further analyses because the sequences were too divergent to
create robust alignments. IRS4 has been identified as being under
positive selection in other studies (12, 13), indicating that the
alternative explanation for high divergence [i.e., that mutations
in IRS4 function may be tolerated with only moderate phenotypic consequences (14)] may have weaker support. IRS4 is located on the X chromosome in mammals and chromosome 4 in
chicken, and therefore it may be subjected to different selection
pressures in placental mammals vs. reptiles—which includes
birds—due to its different location in the genome (has three
fourths the effective population size in mammals as autosomal
genes). As with the other IRSs, IRS4 interacts with the intracellular domain of the insulin receptor and IGF1R (15–17).
IRS4 functions in the cytoplasm in cell cycle progression and
growth (18). It is also linked with decreased litter size, reduced
growth and glucose homeostasis (14), and reduced maternal
nurturing and canonical maternal behaviors in mice (e.g., aggression against intruders and extended latency in retrieving wayward
pups) (14, 19). Given the high divergence of IRS4 in reptiles and
mammals, it would be interesting to pursue whether IRS4 serves a
particularly important role in physiological differences between
reptiles and mammals.
For each putative ortholog clustered by USEARCH, we created multiple species alignments of the amino acid sequences
using MSAProbs (20), which is more accurate than many other
common aligners (21, 22). RevTrans (23) and the original nucleotide sequence for the centroid were used to generate nucleotide alignments from amino acid alignments. The command
line version of TranslatorX (24) was used in conjunction with the
MSAProbs alignments to produce Gblocks-cleaned amino acid
and nucleotide alignments (25, 26) with the commands “-c 1 -t T
-g -b4 =2 -b5 =a -b3 =10 -b2 =34 –t =p -p=s.” Because the
nucleotide sequences were predicted ORFs from Trinity, we did
not expect translation of the nucleotides to produce withinspecies frameshifts or stop codons; thus, we did not use a more
sophisticated program such as MACSE (27).
For additional quality control of the test gene alignments, we
visually inspected the alignments to ensure they were correctly
aligned. Typically, editing included fixing aligned gaps and
truncated sequences with obviously different start or stop codons
causing small chunks at the beginning and end of an alignment for
one or several species to be substantially different from all others.
We made every effort to be as conservative as possible. In addition, we ensured that no paralogs were present in the alignments
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
by blasting (with Blastp) each sequence in each alignment to the
uniprot database and confirming that, for a single alignment, all
sequences had a best blast hit with gene names identical to the
expected for that gene. These measures were not performed for
the control genes due to the enormity of manual correction for so
many alignments. This approach makes comparisons between
focal genes and control genes more conservative, as poorer
quality alignments for control genes would artificially inflate how
much positive selection is found in the control genes (28).
We also note that Gblocks is thought not to perform well,
especially with indels (29), and therefore for a subset of genes
(n = 70), we also used PRANK and GUIDANCE (30). We found
that the nucleotide alignments contained on average 64.1% gaps
(minimum = 17.4%, maximum = 95.3%) when generated by
PRANK and GUIDANCE and 12.7% gaps (minimum = 0.2%,
maximum = 31.5%) when processed with MSAProbs and Gblocks.
For this reason, we favored the alignments generated with
MSAProbs and GBlocks and used this method for all other alignments. The final focal gene alignments are available through Dryad:
dx.doi.org/10.5061/dryad.vn872.
Classification of Connectivity. Because a gene’s position and extent
of connections with other genes in a network influences the
impact that mutations might have on the target phenotype (31,
32), we were interested in investigating whether more highly
connected genes [defined as the number of other genes or proteins to which a gene is directly connected (33)] have a different
evolutionary rate than peripheral genes with few connections. To
estimate the level of connectivity for each gene in the IIS/TOR
network, we used NetworkAnalyzer (34) within Cytoscape v3.1.0
(35) to calculate the connectivity of all nodes in the BioGrid human reactome 3.2.95 (36) (including protein-protein and proteingene interactions). We focus on the measures of node degree
(i.e., connectivity) and betweenness centrality (34). Node degree
(i.e., connectivity) is the number of edges or interactions that gene
has with other genes or proteins. Betweenness centrality ranges
from 0 to 1 and reflects the amount of influence a node exerts on
the interactions of the other nodes (37).
Molecular Evolutionary Analyses. For many of the analyses of
molecular evolution, we required a tree that best represented the
species tree for the 66 taxa included in our analyses. Because no
single study exists with the tree for all of these species, we combined
results from refs. 38 to 45 to generate a tree topology without
branch lengths. Newick Utilities (46) was used to prune trees that
contained fewer than the total 66 species.
Control Genes. We identified 1,417 putative orthologs that contained all 66 species and referred to these as control genes. The
control genes may be biased toward being conserved, as it is
conceivable that conserved genes are more likely to be recovered
for all 66 species. Our dataset of 61 focal IIS/TOR genes generally
contained most of the 66 species. In this focal gene set, 20% of the
genes contained all 66 species and 62% of our focal genes
contained 60 or more species (median = 62; mode = 66; mean =
58.4). The missing species in our 61 focal genes were mostly from
the species for which we only had liver transcriptomes, and these
species could potentially be missing in the alignments because
the missing genes were not expressed in the liver and not because
they were too divergent to be included. Therefore, we conducted
two supplemental analyses using a reduced number of genes to
test how sensitive our conclusions were to the specific control
genes in our study.
Supplemental analysis I. First, we conducted an additional analysis
that limited our focal gene dataset to the 48 genes containing
between 56 and 66 species (mean: 62.6 species; median: 63.5;
mode: 66 species). Although this is not a perfect comparison with
the controls, this 48 focal gene set represents a very similar species
3 of 19
number distribution as the control gene dataset. This analysis was
consistent with the findings of the original 61 focal gene set;
therefore, we present the 61 focal gene set in the main text.
Briefly, results from our analyses of this reduced 48-gene IIS/
TOR dataset include the following:
i) Extracellular genes of the IIS/TOR network exhibited greater
divergence between mammals and reptiles than 1,417 control
genes and intracellular genes. Extracellular genes had equivalent Ks compared with control genes (Wilcoxon rank sum
test, W = 3818, P = 0.111), but had notably greater median
ω (W = 2847, P = 0.015) and Ka (W = 2162.5, P < 0.003).
Compared with intracellular genes, extracellular genes also
had significantly higher ω (W = 243, P = 0.022) and Ka (W =
266, P = 0.004), but Ks did not differ (W = 199, P = 0.287).
ii) Collectively, the intracellular IIS/TOR genes within the 48gene set did not have elevated median Ka, Ks, or ω compared with control genes (P > 0.287 in all cases). For ω and
Ka, the medians for intracellular and control genes were very
similar. Specifically, the median Ks for extracellular genes
was 1.91; the median Ks was 1.61 for intracellular genes and
1.51 for control genes. The median Ka for extracellular
genes was 0.16; the median Ka was 0.087 for intracellular
genes and 0.083 for control genes. Finally, the median ω for
extracellular genes was 0.10; ω was 0.051 for intracellular
genes 0.054 for control genes.
iii) When comparing the distribution of ω values for extracellular vs. intracellular IIS/TOR genes in the 48 focal gene set to
the distribution of the ω values for the control genes, the
extracellular genes were 6.5 times more likely than control
genes to reside in the highest 5% of ω values (OR, 6.51; 95%
CI, 1.29, 32.86). The intracellular genes were not more likely
than controls to be in the top 5% (OR, 1.00; 95% CI, 0.236,
4.219). These odds ratios imply that the extracellular group
contains the fastest evolving components of the IIS/TOR
network. These three conclusions are in agreement with the
61 focal gene set analyses, which includes some genes with
fewer species, presented in the main text.
Supplemental analysis II. In addition to the 48 IIS/TOR focal gene
analysis detailed above, we conducted a second analysis to address
a different potential issue with the control genes. Specifically, to
assess how potentially conserved the original 1,417 control genes
with 66 species were, we identified additional control genes that
contained phylogenetically-matched species sets as our 61 IIS/
TOR focal genes. In many cases, we only had a single phylogenetically matched control gene for any given IIS/TOR gene. We
constructed a focal data set of 43 IIS/TOR genes (31 focal IIS/
TOR genes with phylogenetically matched controls + 12 focal
IIS/TOR genes that contained all 66 species) and compared
Ka/Ks between the 43 focal genes and the 43 phylogenetically
matched control genes. These 43 pairs of matched focal and
control genes contained between 34 and 66 species (mean: 61
species; median: 64 species; mode: 66 species). When there was
more than one phylogenetically matched control gene for a particular focal gene, we used a random number generator and took
the control gene with the largest random number. Although we
would have liked to phylogenetically match all original IIS/TOR
focal genes with fewer than the total 66 species to a control gene,
or even multiple control genes, we did not have phylogenetically
matched controls in all cases.
For these 43 pairs of matched focal and control genes, the
Ka/Ks and Ka values are somewhat elevated in the phylogenetically
matched control gene set relative to the full set of 1,417 control
genes (Ka/Ks phylo-match 43 control genes: 0.076; original 1,417
control genes: 0.054; Ka phylo-match 43 control genes: 0.115;
original 1,417 control genes: 0.083). However, extracellular
genes (n = 7 that had phylogenetically matched controls) still
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
had significantly larger Ka values and marginally nonsignificant
Ka/Ks even compared with the phylogenetically matched control
gene set. One-tailed Wilcoxon rank sum tests indicated that
trends were identical in the 43 matched control-focal comparisons relative to the other two gene sets we analyzed. Median Ka
values were significantly different between the seven extracellular genes and their phylogenetically matched control genes
(W = 83.5, P = 0.032), Ka/Ks values were marginally nonsignificant
between extracellular and control genes (W = 100.5, P = 0.083),
and Ks values remain nonsignificant between extracellular genes
and controls (W = 120.5, P = 0.204). We suspect that the Ka/Ks
Wilcoxon test is not significant in this reduced gene analysis due to
a lack of power. In addition, many of the extracellular genes that
were consistently found to be under positive selection within
PAML (IGF1, IGF1R, IGF2, IRS1, and IRS2) were not included
in this reduced analysis because no appropriate phylogenetically
matched control genes were available.
Altogether, these two supplemental analyses that considered
different means of designating control genes, (i.e., the 48 focal
genes that better matched the number of species in the 1,417
control genes and the 43 paired phylogenetically matched controlfocal genes), are in agreement with our results reported in the
main text for the 61 focal IIS/TOR genes and the corresponding
1,417 control genes.
Testing Whether the IIS/TOR Network Contains Fast-Evolving Outliers.
To test for differences in evolutionary rate between mammals and
reptiles for each of our focal genes, we used the clade model C,
with M2a_rel as the null hypothesis (47). Clade models are less
prone to false positives than branch-site models and better account for among-site variation in selective constraint (47). Importantly, the clade model C tests whether there is evidence for
differential ω between the test clade and the remainder of the tree,
and we did not use the results from the clade model as support for
positive selection. For those test genes that were significant via the
clade model, we compared the ω values (i.e., Ka/Ks) for each clade
via paired Wilcoxon test and χ2 tests.
To calculate evolutionary parameters ω, Ka, and Ks, we processed the GBlocks nucleotide alignments in PAML. Because
we were specifically interested in molecular evolution between
mammals and reptiles, for all IIS/TOR genes and control genes,
we calculated the pairwise mammal and reptile divergence (every reptile-mammal comparison) from the 2NG.dN and 2NG.dS
output files from PAML, which always output the same values
regardless of the model because they are calculated with the Nei
and Gojobori method (48). These results were very similar to
confirmatory analysis conducted using the analysis package from
libsequence (49). Using a Wilcoxon rank sum test on the median
ω, Ka, and Ks of pairwise comparisons between reptile and mammalian taxa, we tested whether the extracellular IIS/TOR genes
or the intracellular IIS/TOR genes exhibited greater divergence
between mammals and reptiles than the control genes.
Testing for Positive Selection for the IIS Network Genes. We conducted branch-site tests for positive selection in PAML (50–52),
which examines the likelihood of a modified model A (model =
2, NSsites = 2, ω not fixed to 1) and the likelihood of the corresponding null model with ω fixed to 1. Two times the difference in likelihood between the two models conforms to a χ2
distribution, permitting statistical tests. For the likelihood ratio
test (LRT), a P value was estimated assuming a null distribution
that is a 1:1 mixture of χ2 distribution with 1 and 0 df (53, 54).
For negative test statistics from the LRT (meaning that the null
model fit the data better than the alternative), typically one
would run PAML several times for these particular genes. Due
to the computational time required for the number of genes we
were testing and that it was unlikely that these genes would have
4 of 19
large positive test statistics in subsequent runs, we did not rerun
any genes multiple times.
Validation of Procedure Based on IGF1. Previously, we documented
increased divergence of IGF1 in lizards and snakes relative to
other reptiles and mammals (55). Those data were generated
using single gene Sanger sequencing. In contrast, here we used a
next-generation sequencing (NGS) approach, generating transcriptomes from Illumina RNAsEq. (100-bp paired end) and
followed by nearly automated multiple sequence alignments. We
use IGF1 for comparison between these methods for both sequence quality and for molecular evolutionary analyses. To estimate sequencing error, we compared the pairwise sequence
identity of IGF1 for the six species included in both approaches.
For each of these pairs, the sequence identities were >99.4%
identical. In each case that was not 100% identical between the
two approaches, the difference was due to an ambiguity code in
the Sanger sequencing that represented within-species allelic
diversity. Thus, we are confident that our NGS approach produced highly accurate sequence data for analysis. Furthermore,
our NGS approach added an additional 200 bp of sequence to
the IGF1 alignment for every species.
To validate the molecular evolution analyses, we compared the
sites that were identified to be under positive selection in our
previous IGF1 analysis (55) to our current NGS approach [both
approaches using the branch-site model in PAML (50–52), with
the branch leading to Squamata (snakes and lizards) as the
foreground branch]. Every positively selected site identified in
ref. 55 had as strong or stronger support for being under positive
selection in our current analyses. Overall, our NGS methods
appear to improve on traditional methods.
Mapping Positively Selected Sites onto Protein Structures of Hormones
and Receptors. To understand how positive selection may affect
interactions between IGF hormones and receptors, we mapped
the sites with a high probability of being under positive selection
from the PAML branch-site analysis onto the predicted protein
structures. Because snakes in particular appear to be highly
divergent, we use a snake as a representative reptile for visualizing the predicted protein structures. We used Swiss-Model
(56) to thread the snake sequences onto the human protein
structures from the PDB: INS, PDB ID code 2KQP.1 (57);
IGF1, PDB ID code 1BQT.1 (58); IGF2, PDB ID code 2L29.1
(59); IGF1R, PDB ID code 1IGR.1.A (60); and IGF2R, PDB
ID code 2V5O.1 (61). From the PAML branch-site analyses
described above, we mapped the BEB posterior probability >0.90
of being under positive selection (branch-site model of positive
selection) in mammals or reptiles onto the amino acids in the
mature protein structures and the full propeptide alignments.
Separately for the reptile and mammal clades, we mapped the
sites predicted from both branch-site models: one that specifically
tests for selection on the branch leading to the clade of interest set
in the foreground (e.g., the branch leading to reptiles) and one
that tests for positive selection across the whole clade of interest
(e.g., the whole clade of reptiles). We evaluated the clustering of
positively selected sites within functional domains of the protein
structure, and their relationship to the binding surfaces between
the hormones and the receptors, as described by previous literature (Table S4).
Evaluating Variation in the Presence and Length of the IGF Binding
Domains of the IGFBPs. The binding proteins consist of two do-
mains: the IGF binding domain on the 5′ end and the thyroglobulin domain on the 3′ end. We noted that the IGFBPs were
often truncated to various degrees on the 5′ end, leading to extensive variation among species in the length or presence of the
N-terminal binding domain. We realigned the original sequences
using ClustalX to specifically evaluate variation in the length of
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
the IGF binding domain in the context of the protein structure.
Additionally, we calculated similarity for each binding protein
across the whole alignment using a Poisson correction model (62)
in MEGA6 (63).
Coevolution Analysis of IGF Hormones and IGF2R in Reptiles. We used
CAPs (64) to test for coevolving amino acid sites between IGF1
and IGF2R and between IGF2 and IGF2R in reptiles. CAPS
uses the phylogenetic relationships from the sequence alignments along with the 3D structure of the proteins to identify
coevolving pairs of amino acid using Pearson correlation coefficients. For these analyses, we used the amino acid sequence
alignments with their respective human protein structures from
PDB: IGF1, BQT.1 (58); IGF2, 2L29.1 (59); and IGF2R, 2V5O.1
(61). We used the following settings: bootstrap value of 0.8, gap
threshold of 0.8, α threshold of P = 0.01, and simulated 100
alignments. Significance is estimated by comparing the observed
coefficients to a distribution from pseudorandomly sampled amino
acid pairs, correcting for multiple comparisons and nonindependence of data using a step-down permutation procedure (64).
Comparison of phylogenetic gene trees can be used to detect
coevolution among genes (65). We used the MMMvII algorithm
(66) to identify which subgroups of the hormone family (INS, IGF1,
and IGF2) and IGF2R were most tightly coevolving across species.
The MMMvII algorithm detects similarity between phylogenetic
trees, using information from the both the tree topology and the
branch lengths, which are calculated by MMMvII. MMMvII
identifies the most tightly coevolving subtrees for any given
tolerance level, returning all possible solutions. For each hormone, we constructed a single multiple sequence alignment
of the mature protein sequences using ClustalX (67) within
Geneious v6.1.6 (68). For IGF2R, we focused on the region of
the protein that is involved with binding the hormones: domains 11–13. To identify the most tightly coevolving subgroups of
proteins, we set the tolerance level to 0.2. High levels of coevolution are achieved by large or multiple subsections of the gene
trees changing in a coordinated fashion (topology and branch
length). With this method, highly connected proteins may have
no observable coevolution if they are highly conserved.
SI Results
Divergent Evolutionary Rates Between Mammals and Reptiles. We
tested for differences in mammal-specific ω and reptile-specific ω
using the clade model (47) for each of our 61 focal genes (each
alignment contained 19–66 species; median: 62) in PAML (69).
Significant genes included five extracellular genes (of a total of
10) and 21 intracellular genes (of a total of 51). Extracellular
genes were not statistically more likely to be significant than
intracellular genes in the clade model (Fisher’s exact test, P =
0.430). We also compared the distribution of likelihood ratio test
statistics for the clade model relative to a null model for 1,417
control genes (SI Materials and Methods) to test statistics obtained for the 61 members of the network. Only IGF2R exhibited a result that was in the largest 5% of test statistics for
IIS/TOR network + control genes. We compared the ω for each
clade for those control genes where the clade model indicated
support for a significant difference in ω between reptiles and
mammals (n = 797 before sequential Bonferroni correction, n =
491 after sequential Bonferroni correction). In short, we found
no appreciable difference between control and test genes; after
correction for multiple testing, both had ∼77% of genes with
larger ω in reptiles relative the rest of the tree.
Connectivity Is Associated with Evolutionary Rate. Nonsynonymous
reptile-mammal divergence (Ka) and ω were highly correlated
with connectivity. For extracellular genes, Ka and ω were negatively correlated to the degree of connectivity (Ka Spearman’s
ρ = −0.71, P = 0.02; ω Spearman’s ρ = −0.84, P < 0.01), and Ks
5 of 19
exhibited a positive, but nonsignificant relationship with degree
of connectivity (Spearman’s ρ = 0.40, P = 0.26). Likewise, for
intracellular genes, Ka and ω were negatively correlated to degree of connectivity (Ka Spearman’s ρ = −0.39, P < 0.01; ω
Spearman’s ρ = −0.34, P = 0.01), whereas Ks was not (Spearman’s ρ < 0.01, P = 0.99). In other words, more connected genes
generally had smaller nonsynonymous substitution rates than less
connected genes; this result suggests that more connected genes
experience more purifying selection than less connected genes.
Importantly, the relationship of Ka and ω to degree of connectivity was stronger for extracellular genes than for intracellular
genes. Indeed, an interaction term of connectivity and classification (intracellular vs. extracellular) in a linear model was nearly
significant (P = 0.07), with extracellular genes having a steeper
slope. Nearly identical results were obtained when using betweenness centrality (extracellular Ka Spearman’s ρ = −0.68, P =
0.03; ω Spearman’s ρ = −0.82, P < 0.01, Ks Spearman’s ρ = −0.37,
P = 0.29); therefore, we focus further analyses on connectivity.
Expression level governs the amount of purifying selection
(70, 71). Thus, expression must be accounted for to conclude
that the lower evolutionary rates we observed in more connected
genes are because of high connectivity. Finding a suitable expression measure across such a broad range of taxa is difficult.
Because protein length is negatively correlated with expression
level, we used the longest protein isoform in human to provide a
proxy for potential impacts of expression on protein evolutionary
rate. We found no relationship of Ka, Ks, ω, connectivity, or
betweenness with the length of the longest protein isoform from
human (Spearman’s ρ < 0.15, P > 0.24 in all cases). Also, more
highly expressed genes experience higher selection on Ks for
easier translatable codons. Thus, a relationship between Ks and
connectivity is a strong indication that expression level, not connectivity, is driving molecular evolution (71). We see no significant
relationships between Ks and connectivity; hence, expression may
not be a strong driver of the relationship between ω and connectivity in our data.
Evolutionary rates of members of the IIS/TOR network in our
study were negatively related with connectivity. This result is
consistent with findings for other pathways, such as the N-glycosylation pathway of primates (72) and the yeast proteome (71,
73–76). Likewise, a negative relationship of closeness centrality
with Ka and ω occurs in the mammalian phototransduction pathway,
and closeness centrality is largely influenced by connectivity (77).
Interpreting our findings requires two caveats. First, GC-biased
gene conversion (preferential substitution of GC during recombination) can produce results that resemble positive selection, although such a confounding effect is usually attenuated with
increased phylogenetic distance due to the lack of conservation in
location of recombination hotspots (78). Thus, for mammal-reptile
comparisons, this may not be a substantive concern. Further,
genes indicated with the branch-site model to be under positive
selection are less likely to be confounded by biased gene conversion than those indicated by the branch-test model (78). Second, we did not directly account for gene expression variation,
intron number, and gene essentiality, and these are all variables
associated with protein evolution (71, 75, 76). Not including these
covariates could affect our conclusion regarding the importance
of connectivity in influencing evolutionary rate. The choice of an
appropriate tissue and developmental time point in which to
measure expression level for all 66 species and the lack of gene
expression data suitable for quantification in some species are
vexing problems. However, we suspect that molecular evolutionary rate is influenced, at least in part, by connectivity because
we found no relationship of Ka, Ks, ω, connectivity, or betweenness with the length of the longest protein isoform from human
(a proxy for expression). In addition, as explained above, highly
expressed genes experience selection on Ks for easier translatable
codons, and we see no significant relationships between Ks and
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
connectivity—a relationship that would indicate that expression
level, not connectivity, is driving molecular evolution (71).
Tests for Positive Selection. We tested whether positive selection
shaped evolution of IIS/TOR pathway genes using a branch-site
model in PAML. This model, with the reptile clade specified as
the foreground branch, was favored over the null model of neutral
evolution for only two genes, both of which were intracellular:
RPS6KA6 and MLST8 (after sequential Bonferroni correction;
Table S3). In this test, the entire clade of reptiles, including
terminal branches, was specified as the foreground branch. This
relative lack of significance is likely due to variable selection
among the diverse terminal branches, which span >350 My of
evolution. Additional models are discussed in the main text and
include a branch-site model of positive selection with the branch
leading to the reptile clade as the foreground branch and a similar
model with the branch leading to mammals designated as the
foreground branch. We also conducted a series of taxon-specific
branch-site tests, where the branch leading to a particular clade
was specified as a foreground branch. The results of all tests are
presented in Table S3.
As detailed in the main text, our results are concordant with
previous work that suggests that extracellular genes in the
IIS/TOR network may evolve more rapidly and are under
stronger positive selection than the remainder of the network. For
instance, DAF-2 (a homolog of the vertebrate IGF1R and INSR
genes) is the most divergent protein in the IIS/TOR network
across Caenorhabditis species (72), and changes in this receptor
and interactions with its hormone may allow for rapid adaptation
under shifting environmental conditions (71). Likewise, residues
within the homolog of IGF1R (Drosophila’s insulin-like receptor)
evolve under positive selection in Drosophila (79). In addition, IGF1
evolves under strong positive selection in snakes and lizards (55).
Evolution in Squamata. Because previous research indicates that
components of the IIS/TOR network may be under strong positive selection in Squamata (lizards and snakes) (55), we also
tested the branch-site model using the branch leading to snakes
and lizards as the foreground branch. Fourteen genes exhibited
significant support for positive selection along the branch leading
to lizards and snakes; seven remained significant after sequential
Bonferroni correction (IGF2R, IGF1R, PIK3R5, IRS2, IRS1,
IKBKB, and TSC2; Table S3). These seven also exhibited test
statistics that were in the largest 5% of test statistics for all
(control and test) genes analyzed in this comparison. For crocodilians, birds, and turtles, fewer genes provided significant
support for the branch-site model either before (13, 11, and 11
genes, respectively) or after multiple test correction (6, 1, and 5
genes, respectively). The bird comparison is particularly notable
because birds represent an independent evolutionary origin of
endothermy (vs. mammals).
We more explicitly assayed higher divergence in Squamata
relative to the rest of the tree by the clade model with Squamata
as the foreground clade. We detected 24 genes with significant
support (postmultiple test correction) for heterogeneous rates
relative to the rest of the tree (a total of 33 before multiple test
correction). For 14 of these significant genes, the ω estimated for
the Squamata clade was larger than the estimate for the rest of
the tree. However, this difference between the numbers of genes
in Squamata that were more highly divergent than the rest of the
tree was not significant (P > 0.3). Notably, IGF1, IGFBP2,
RHEB, IGF2R, and INSR exhibit test statistics that were in the
largest 5% of test statistics for all (control and focal) genes analyzed for the clade model with Squamata in the foreground.
In comparison, we detected 15 genes with significant support
(after multiple test correction) for heterogeneous rates relative to
the rest of the tree when using snakes as the foreground clade (a total
of 30 before multiple test correction). For 11 of these significant
6 of 19
genes, the ω estimated for snakes was larger than the estimate for
the rest of the tree, and the reverse was true for the other 4 genes.
This difference between the numbers of genes in snakes that
were more or less divergent than the rest of the tree was nearly
significant (χ2 = 3.27, P = 0.07). Similar results were obtained
for a paired Wilcoxon test (V = 24, P = 0.04). However, only
PRKCG, IGFBP2, and INSR exhibit test statistics that were in the
largest 5% of test statistics for all (control and test) genes analyzed for the clade model with snakes in the foreground.
Overall, it appears that Squamata has qualitatively higher divergence in IIS/TOR network genes, and several more genes may be
under positive selection on the branch leading to Squamata, than on
the branch leading to crocodilians, birds, and turtles (tested independently). However, these differences are not exceptionally unique,
and each branch of reptiles, excepting avian reptiles, contains
multiple IIS/TOR genes under positive natural selection.
Mammal-Specific and Reptile-Specific Evolution of Hormones and
Receptors. The amino acid sites that define the ability of IGF1
directly interact with the C-domain of the IGFs to regulate
binding affinity. More specifically, from mutagenesis studies, one
of these sites under positive selection on the IGF1R CR-domain
(F251, human numbering) directly interacts with the IGF1
C-domain to regulate binding of IGF1R to IGF1 (81). Furthermore, one of the sites under positive selection on the reptilian
IGF1 C-domain (R37, human numbering) regulates binding of
IGF1 to IGF1R (80) (Table S4). Thus, the location and clustering of these positively selected sites on the hormone and the
receptor suggest positive selection on the binding affinity between IGF1 and IGF1R across the reptiles. This signature of
positive selection is absent in the mammalian IGF1 and IGF1R.
In contrast, we see positive selection on the C-domain of IGF2 in
mammals that regulates the binding to IGF1R and INSR. These
positively selected sites in the C-domain of mammalian IGF2
may cause variation in the binding affinity between IGF2-IGF1R
and IGF2-INSR among mammal species. Specifically, one of the
IGF1 residues in mammals that inhibits high-affinity binding to
IGF2R (R55) is an isoleucine (I55) in snakes, which is predicted
to promote binding to IGF2R due to its hydrophobicity.
and IGF2 to bind IGF1R (mainly in domains A and B; Fig. 2) are
conserved, indicating that these protein sequences are likely
functional (80). The C-domain of IGF1 and IGF2 form a flexible
loop that is oriented toward the binding pocket of INSR and
IGF1R and contacts the CR domain in the binding pocket of the
IGF1R and INSR (81) (Fig. 2). The IGF1 and IGF2 C-domain is
essential to bind IGF1R (82), and variation in the C-domain
regulates the specificity of the hormones binding to IGF1R (82)
and to INSR (83). INSR has two isoforms due to the absence
(INSR-A) or presence (INSR-B) of exon 11 (84). In mammals,
both INSR isoforms bind INS with high affinity, but only INSR-A
binds IGF2 with high affinity, and neither bind IGF1 with high
affinity. This difference in INSR binding between IGF2 and IGF1
is driven by the C-domain of the hormones (83). For IGF1, 30%
percent of the C-domain amino acids in reptiles are predicted to
be under positive selection, whereas none of the C-domain sites of
IGF1 in mammals are predicted to be under positive selection. In
contrast, for IGF2, 25% percent of sites in the C-domain amino
acids in mammals were identified as being under positive selection, and no sites were under positive selection in the reptilian
IGF2 C-domain (Fig. 2 and Table S4).
This positive selection in the C-domains of reptile IGF1 and
mammal IGF2 suggests their binding affinities to IGF1R and
INSR are likely variable across the species in the respective
clades. IGF1R has three domains that are predicted to play a role
in binding both IGF1 and IGF2 hormones (L1-, CR-, and L2domains) (81, 85). Positively selected sites in reptiles clustered
on the hormone-binding surface of the CR domain of IGF1R
and include specific sites identified from mutagenesis studies to
Coevolution of IGF2R and IGFs in Reptiles. In addition to high divergence in reptiles and snakes among focal genes mentioned above,
many of the positively selected sites on the receptors and hormones
are due to amino acid changes within the Squamates (lizards and
snakes) relative to other reptiles. Our coevolution network analysis
clearly signals strong coevolution of the receptors and hormones
specifically within snakes or squamates. This rapid molecular evolution is in concordance with extensive recent work showing extreme
adaptation in metabolic pathways of snakes (86, 87). Although
nematodes and Drosophila are models for conservation of the intracellular IIS (88, 89), snakes and lizards may be models for examining the coevolution of the extracellular hormones-receptors.
The CAPS analysis identified a pair of coevolving amino acids on
IGF2 and IGF2R in reptiles: IGF2 P4 and IGF2R R1623 (ρ = 0.4,
P < 0.01). No sites were identified as coevolving between IGF1
and IGF2R. To further predict how evolution has shaped the
interactions between IGF2R and the IGF hormones in reptiles,
we used MMMvII (66) to identify the species with the tightest
correlated rates of evolution between IGFs and IGF2R based on
the gene tree topologies and branch lengths, given a tolerance
value of 0.2. Interestingly, within the reptiles, snakes (sunbeam and
viper boa) had the tightest coevolutionary signal between hormonereceptor pairings IGF2 and IGF2R (ρ = 1), and the lizards (brown
and green anoles and gecko) had the tightest coevolutionary signal
between IGF1 and IGF2R (ρ = 0.33), suggesting that among the
reptiles, these receptor-hormone relationships are most strongly
coevolving in the squamate clade specifically.
1. Sparkman AM, Vleck CM, Bronikowski AM (2009) Evolutionary ecology of endocrinemediated life-history variation in the garter snake Thamnophis elegans. Ecology
90(3):720–728.
2. Grabherr MG, et al. (2011) Full-length transcriptome assembly from RNA-Seq data
without a reference genome. Nat Biotechnol 29(7):644–652.
3. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):10–12.
4. Lohse M, et al. (2012) RobiNA: A user-friendly, integrated software solution for RNASeq-based transcriptomics. Nucleic Acids Res 40(Web Server issue):W622–W627.
5. Cahais V, et al. (2012) Reference-free transcriptome assembly in non-model animals
from next-generation sequencing data. Mol Ecol Resour 12(5):834–845.
6. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461.
7. Kent WJ (2002) BLAT—The BLAST-like alignment tool. Genome Res 12(4):656–664.
8. Li L, Stoeckert CJ, Jr, Roos DS (2003) OrthoMCL: Identification of ortholog groups for
eukaryotic genomes. Genome Res 13(9):2178–2189.
9. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic
Acids Res 28(1):27–30.
10. Kanehisa M, et al. (2014) Data, information, knowledge and principle: Back to metabolism in KEGG. Nucleic Acids Res 42(Database issue):D199–D205.
11. Luisi P, et al. (2012) Network-level and population genetics analysis of the insulin/TOR signal transduction pathway across human populations. Mol Biol Evol
29(5):1379–1392.
12. Alvarez-Ponce D, Aguadé M, Rozas J (2011) Comparative genomics of the vertebrate
insulin/TOR signal transduction pathway: A network-level analysis of selective pressures. Genome Biol Evol 3:87–101.
13. Wang M, et al. (2013) The molecular evolutionary patterns of the Insulin/FOXO signaling pathway. Evol Bioinform Online 9:1–16.
14. Fantin VR, Wang Q, Lienhard GE, Keller SR (2000) Mice lacking insulin receptor substrate 4 exhibit mild defects in growth, reproduction, and glucose homeostasis. Am J
Physiol Endocrinol Metab 278(1):E127–E133.
15. Yenush L, White MF (1997) The IRS-signalling system during insulin and cytokine
action. BioEssays 19(6):491–500.
16. Lavan BE, et al. (1997) A novel 160-kDa phosphotyrosine protein in insulin-treated
embryonic kidney cells is a new member of the insulin receptor substrate family. J Biol
Chem 272(34):21403–21407.
17. Fantin VR, et al. (1998) Characterization of insulin receptor substrate 4 in human
embryonic kidney 293 cells. J Biol Chem 273(17):10726–10732.
18. Qu B-H, Karas M, Koval A, LeRoith D (1999) Insulin receptor substrate-4 enhances
insulin-like growth factor-I-induced cell proliferation. J Biol Chem 274(44):
31179–31184.
19. Xu X, et al. (2012) Modular genetic control of sexually dimorphic behaviors. Cell
148(3):596–607.
20. Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: Multiple sequence alignment based on
pair hidden Markov models and partition function posterior probabilities. Bioinformatics
26(16):1958–1964.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
7 of 19
21. Plyusnin I, Holm L (2012) Comprehensive comparison of graph based multiple protein
sequence alignment strategies. BMC Bioinformatics 13(1):64.
22. Sievers F, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):539.
23. Wernersson R, Pedersen AG (2003) RevTrans: Multiple alignment of coding DNA from
aligned amino acid sequences. Nucleic Acids Res 31(13):3537–3539.
24. Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: Multiple alignment of nucleotide
sequences guided by amino acid translations. Nucleic Acids Res 38(Web Server issue):
W7-13.
25. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and
ambiguously aligned blocks from protein sequence alignments. Syst Biol 56(4):564–577.
26. Castresana J (2000) Selection of conserved blocks from multiple alignments for their
use in phylogenetic analysis. Mol Biol Evol 17(4):540–552.
27. Ranwez V, Harispe S, Delsuc F, Douzery EJ (2011) MACSE: Multiple Alignment of
Coding SEquences accounting for frameshifts and stop codons. PLoS ONE 6(9):e22594.
28. Schneider A, et al. (2009) Estimates of positive Darwinian selection are inflated by
errors in sequencing, annotation, and alignment. Genome Biol Evol 1:114–118.
29. Jordan G, Goldman N (2012) The effects of alignment error and alignment filtering on
the sitewise detection of positive selection. Mol Biol Evol 29(4):1125–1139.
30. Penn O, et al. (2010) GUIDANCE: A web server for assessing alignment confidence
scores. Nucleic Acids Res 38(Web Server issue):W23-8.
31. Wright KM, Rausher MD (2010) The evolution of control and distribution of adaptive
mutations in a metabolic pathway. Genetics 184(2):483–502.
32. Kim PM, Korbel JO, Gerstein MB (2007) Positive selection at the protein network
periphery: Evaluation in terms of structural constraints and cellular context. Proc Natl
Acad Sci USA 104(51):20274–20279.
33. Hahn MW, Kern AD (2005) Comparative genomics of centrality and essentiality in
three eukaryotic protein-interaction networks. Mol Biol Evol 22(4):803–806.
34. Doncheva NT, Assenov Y, Domingues FS, Albrecht M (2012) Topological analysis and
interactive visualization of biological networks and protein structures. Nat Protoc
7(4):670–685.
35. Shannon P, et al. (2003) Cytoscape: A software environment for integrated models of
biomolecular interaction networks. Genome Res 13(11):2498–2504.
36. Stark C, et al. (2006) BioGRID: A general repository for interaction datasets. Nucleic
Acids Res 34(Database issue, suppl 1):D535–D539.
37. Yoon J, Blumer A, Lee K (2006) An algorithm for modularity analysis of directed and
weighted biological networks based on edge-betweenness centrality. Bioinformatics
22(24):3106–3108.
38. Wiens JJ, et al. (2012) Resolving the phylogeny of lizards and snakes (Squamata) with
extensive sampling of genes and species. Biol Lett 8(6):1043–1046.
39. Kimball RT, Wang N, Heimer-McGinn V, Ferguson C, Braun EL (2013) Identifying localized biases in large datasets: A case study using the avian tree of life. Mol Phylogenet Evol 69(3):1021–1032.
40. McCormack JE, et al. (2013) A phylogeny of birds based on over 1,500 loci collected by
target enrichment and high-throughput sequencing. PLoS ONE 8(1):e54848.
41. Thomson RC, Shaffer HB (2010) Sparse supermatrices for phylogenetic inference:
Taxonomy, alignment, rogue taxa, and the phylogeny of living turtles. Syst Biol 59(1):
42–58.
42. Perelman P, et al. (2011) A molecular phylogeny of living primates. PLoS Genet 7(3):
e1001342.
43. Eo SH, Bininda-Emonds OR, Carroll JP (2009) A phylogenetic supertree of the fowls
(Galloanserae, Aves). Zool Scr 38(5):465–481.
44. Hedges SB, Kumar S (2009) The Timetree of Life (Oxford Univ Press, New York).
45. dos Reis M, et al. (2012) Phylogenomic datasets provide both precision and accuracy in
estimating the timescale of placental mammal phylogeny. Proc Roy Soc B Biol Sci 279
(1742):3491–3500.
46. Junier T, Zdobnov EM (2010) The Newick utilities: High-throughput phylogenetic tree
processing in the UNIX shell. Bioinformatics 26(13):1669–1670.
47. Weadick CJ, Chang BS (2012) An improved likelihood ratio test for detecting sitespecific functional divergence among clades of protein-coding genes. Mol Biol Evol
29(5):1297–1300.
48. Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous
and nonsynonymous nucleotide substitutions. Mol Biol Evol 3(5):418–426.
49. Thornton K (2003) Libsequence: A C++ class library for evolutionary genetic analysis.
Bioinformatics 19(17):2325–2327.
50. Yang Z, Nielsen R (2002) Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol 19(6):908–917.
51. Zhang J, Nielsen R, Yang Z (2005) Evaluation of an improved branch-site likelihood
method for detecting positive selection at the molecular level. Mol Biol Evol 22(12):
2472–2479.
52. Yang Z, dos Reis M (2011) Statistical properties of the branch-site test of positive
selection. Mol Biol Evol 28(3):1217–1228.
53. Self SG, Liang K-L (1987) Asymptotic properties of maximum likelihood estimators and
likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82(398):605–610.
54. Goldman N, Whelan S (2000) Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics. Mol Biol Evol 17(6):975–978.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
55. Sparkman AM, et al. (2012) Rates of molecular evolution vary in vertebrates for insulinlike growth factor-1 (IGF-1), a pleiotropic locus that regulates life history traits. Gen
Comp Endocrinol 178(1):164–173.
56. Biasini M, et al. (2014) SWISS-MODEL: Modelling protein tertiary and quaternary structure
using evolutionary information. Nucleic Acids Res 42(Web Server issue):W252-8.
57. Yang Y, et al. (2010) Solution structure of proinsulin: Connecting domain flexibility
and prohormone processing. J Biol Chem 285(11):7847–7851.
58. Sato A, et al. (1993) Three-dimensional structure of human insulin-like growth factor-I
(IGF-I) determined by 1H-NMR and distance geometry. Int J Pept Protein Res 41(5):433–440.
59. Williams C, et al. (2012) An exon splice enhancer primes IGF2:IGF2R binding site
structure and function evolution. Science 338(6111):1209–1213.
60. Garrett TPJ, et al. (1998) Crystal structure of the first three domains of the type-1
insulin-like growth factor receptor. Nature 394(6691):395–399.
61. Brown J, et al. (2008) Structure and functional analysis of the IGF-II/IGF2R interaction.
EMBO J 27(1):265–276.
62. Zuckerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins.
Evolving Genes and Proteins, eds Bryson V, Vogel HJ (Academic Press, New York).
63. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30(12):2725–2729.
64. Fares MA, McNally D (2006) CAPS: Coevolution analysis using protein sequences. Bioinformatics 22(22):2821–2822.
65. de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat
Rev Genet 14(4):249–261.
66. Rodionov A, Bezginov A, Rose J, Tillier ER (2011) A new, fast algorithm for detecting
protein coevolution using maximum compatible cliques. Algorithms Mol Biol 6(1):17.
67. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X
windows interface: Flexible strategies for multiple sequence alignment aided by quality
analysis tools. Nucleic Acids Res 25(24):4876–4882.
68. Kearse M, et al. (2012) Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics
28(12):1647–1649.
69. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol
24(8):1586–1591.
70. Subramanian S, Kumar S (2004) Gene expression intensity shapes evolutionary rates
of the proteins encoded by the vertebrate genome. Genetics 168(1):373–381.
71. Jovelin R, Phillips PC (2011) Expression level drives the pattern of selective constraints along
the insulin/Tor signal transduction pathway in Caenorhabditis. Genome Biol Evol 3:715–722.
72. Montanucci L, Laayouni H, Dall’Olio GM, Bertranpetit J (2011) Molecular evolution
and network-level analysis of the N-glycosylation metabolic pathway across primates.
Mol Biol Evol 28(1):813–823.
73. Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution
rate and the number of protein-protein interactions. BMC Evol Biol 3(1):11.
74. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate
in the protein interaction network. Science 296(5568):750–752.
75. Bloom JD, Adami C (2004) Evolutionary rate depends on number of protein-protein
interactions independently of gene expression level: Response. BMC Evol Biol 4(1):14.
76. Larracuente AM, et al. (2008) Evolution of protein-coding genes in Drosophila. Trends
Genet 24(3):114–123.
77. Invergo BM, Montanucci L, Laayouni H, Bertranpetit J (2013) A system-level, molecular evolutionary analysis of mammalian phototransduction. BMC Evol Biol 13(1):52.
78. Ratnakumar A, et al. (2010) Detecting positive selection within genomes: The problem
of biased gene conversion. Philos Trans R Soc Lond B Biol Sci 365(1552):2571–2580.
79. Guirao-Rico S, Aguadé M (2009) Positive selection has driven the evolution of the Drosophila insulin-like receptor (InR) at different timescales. Mol Biol Evol 26(8):1723–1732.
80. Denley A, Cosgrove LJ, Booker GW, Wallace JC, Forbes BE (2005) Molecular interactions of the IGF system. Cytokine Growth Factor Rev 16(4-5):421–439.
81. Keyhanfar M, Booker GW, Whittaker J, Wallace JC, Forbes BE (2007) Precise mapping
of an IGF-I-binding site on the IGF-1R. Biochem J 401(1):269–277.
82. Bayne ML, et al. (1989) The C region of human insulin-like growth factor (IGF) I is required
for high affinity binding to the type 1 IGF receptor. J Biol Chem 264(19):11004–11008.
83. Denley A, et al. (2004) Structural determinants for high-affinity binding of insulin-like
growth factor II to insulin receptor (IR)-A, the exon 11 minus isoform of the IR. Mol
Endocrinol 18(10):2502–2512.
84. Seino S, Bell GI (1989) Alternative splicing of human insulin receptor messenger RNA.
Biochem Biophys Res Commun 159(1):312–316.
85. Epa VC, Ward CW (2006) Model for the complex between the insulin-like growth
factor I and its receptor: Towards designing antagonists for the IGF-1 receptor. Protein Eng Des Sel 19(8):377–384.
86. Castoe TA, Jiang ZJ, Gu W, Wang ZO, Pollock DD (2008) Adaptive evolution and
functional redesign of core metabolic proteins in snakes. PLoS ONE 3(5):e2201.
87. Castoe TA, et al. (2009) Evidence for an ancient adaptive episode of convergent
molecular evolution. Proc Natl Acad Sci USA 106(22):8986–8991.
88. Oldham S (2011) Obesity and nutrient sensing TOR pathway in flies and vertebrates:
Functional conservation of genetic mechanisms. Trends Endocrinol Metab 22(2):45–52.
89. Tatar M, Bartke A, Antebi A (2003) The endocrine regulation of aging by insulin-like
signals. Science 299(5611):1346–1351.
8 of 19
IGFBP4
INS
IGF2
IGFBP1
IGF1
Extracellular
IGFBP5
IGFBP2
IGFBP3
INSR
IGFBP6
*
IGF1R
P
KRAS
PIP2
GTP
NRAS
PIP3
SOS1
PIK3CA
INPPL1
PDPK1
PIK3R5
PIK3CB
PIK3CD
PIK3CG
SOCS1
SOCS3
SOCS4
Raf
SH2B2
MAPK10
PTPN1
MEK1/2
PRKCG
AKT
PKC
SGK1
Degradation
of Ligands
MLST8
mTOR
AKT1S1
Lipogenesis
Survival,
Growth,
PDE3B
Proliferation
FOXO1
MLST8
TSC1
mTOR
TSC2
PPP1R3C
PPP1R3D
PPARGC1A
BAD
Rictor
GSK3
Apoptosis
PRKAA2
Raptor
P
eIF2B
CALM1
RPS6KA6
4EBP1
PHKB
PHKG1
RPS6
Glycogenesis
RHEB
ULK2
ULK3
eIF4E2
Autophagy
MKNK1
Protein
Synthesis
Gene
Expression
P
RSK
STK11
STRADA
MO25
eIF4E
GLS
ERK1/2
IKBKB
Proliferation /
Differentiation
Elk1
FOXO1
Fig. S1. The IIS/TOR signaling network. Proteins not included in this study due to lack of sequence data across species are in gray. Gene names correspond to
Tables S1 and S3. Genes in yellow were identified as reptiles having highly divergent Ka/Ks relative to the rest of the tree by the CMCreptiles model (last column
of Table S3), significant after correction for multiple comparisons. *IRS4 and *IGFBP6 were analyzed manually due to their exceptional divergence in sequence
and length between reptiles and mammals (Table S5 and Fig. S3). Figure modified from ProteinLounge.com, SABiosciences.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
9 of 19
Fig. S2. A rooted cladogram showing the phylogenetic relationships among the species included in this study.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
10 of 19
Fig. S3. Annotated amino acid alignment of IGFBP6. The human sequence is set as a reference at the top of the alignment, and sequence differences from the
reference sequence are highlighted. We provide functional annotation on the human sequence. The N- and C-terminal domains are in red; the cysteine
residues are in dark blue. IGF binding sites that are conserved across all binding proteins are marked in cyan (excepting two snake species, for which only one of
these is conserved). IGF binding sites specific to IGFBP6 are marked in green, and the sites with different function (e.g., integrin binding) are marked in gray.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
11 of 19
Table S1. IIS/TOR genes used in this study and their estimates of divergence between reptiles and mammals
Function
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Symbol
EntrezID
IGF1
3479
IGF1R
100500937
IGF2
3481
IGF2R
3482
IGFBP2
3485
IGFBP3
3486
IGFBP4
3487
IGFBP5
3488
INS
3630
INSR
3643
AKT1S1
84335
CALM1
801
EIF4E
1977
EIF4E2
9470
FOXO1
2308
GRB2
2885
IKBKB
3551
INPPL1
3636
IRS1
3667
IRS2
8660
KRAS
3845
MAPK10
5602
MKNK1
8569
MLST8
64223
MTOR
2475
NRAS
4893
PDE3B
5140
PDPK1
5170
PHKB
5257
PHKG1
5260
PIK3CA
5290
PIK3CB
5291
PIK3CD
5293
PIK3CG
5294
PIK3R5
23533
PPARGC1A
10891
PPP1R3C
5507
PPP1R3D
5509
PRKAA2
5563
PRKCG
5582
PTEN
5728
PTPN1
5770
RHEB
6009
RICTOR
253260
RPS6
6194
RPS6KA6
27330
SGK1
6446
SH2B2
10603
SHC1
6464
SHC2
25759
SHC3
53358
SOCS1
8651
SOCS3
9021
SOCS4
122809
SOS1
6654
STK11
6794
STRADA
92335
TSC1
7248
TSC2
7249
ULK2
9706
ULK3
25989
Betweenness Degree Mammal Gator Lizard Bird Turtle Snake Total Length
2.19E-04
4.10E-04
0
5.55E-05
0
4.45E-04
5.40E-07
4.42E-05
1.98E-06
7.80E-04
9.10E-07
1.90E-04
1.81E-04
2.36E-05
4.39E-05
8.00E-08
2.96E-05
0
0
4.50E-05
7.19E-05
1.06E-04
3.73E-06
9.28E-05
4.00E-08
3.20E-07
1.43E-05
8.32E-05
2.21E-06
0
2.67E-04
9.32E-06
0
6.84E-06
5.51E-06
1.95E-04
5.60E-06
2.40E-07
3.10E-04
2.54E-04
0.00141419
4.04E-05
2.03E-06
3.27E-04
3.91E-05
1.08E-04
2.63E-04
0
6.00E-08
2.00E-08
3.42E-06
0
0
0
3.03E-04
0
4.00E-08
0
6.56E-05
7.04E-06
0
20
88
1
34
1
36
5
12
6
76
10
26
60
13
63
2
3
1
2
40
34
28
21
30
3
3
4
69
6
1
59
14
1
27
7
91
7
3
52
19
114
30
36
86
44
28
70
2
2
5
10
8
6
6
114
1
12
1
120
9
1
32
31
27
25
31
29
29
30
20
32
24
31
32
32
31
31
32
32
29
24
26
29
30
32
31
27
32
32
32
31
32
32
30
32
32
32
32
29
31
31
31
30
30
32
32
4
31
30
31
28
30
29
30
29
31
31
31
32
32
32
32
2
1
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
1
2
1
1
2
2
1
2
2
2
1
2
2
2
2
1
2
2
1
1
2
2
2
2
2
2
1
2
2
2
2
1
1
0
2
2
2
2
2
2
2
2
3
4
7
7
7
5
7
7
2
6
7
7
7
7
7
6
7
6
6
0
7
2
6
7
7
6
7
7
7
1
7
6
6
7
5
7
7
7
3
5
7
7
7
7
6
6
7
6
7
1
0
7
6
5
8
7
7
7
7
7
7
10
10
8
10
10
10
0
9
10
10
0
10
10
10
10
10
10
6
10
9
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
9
0
10
10
9
10
10
0
10
10
10
7
10
10
8
9
10
9
10
10
10
10
10
5
5
7
8
6
7
7
4
1
8
4
8
8
7
7
8
8
7
6
6
8
3
8
8
8
7
8
8
8
2
8
6
8
8
5
7
6
6
4
5
8
8
8
8
8
2
6
6
6
2
2
7
4
8
8
8
8
8
8
6
7
7
5
7
7
7
5
5
7
0
7
7
7
7
7
6
7
7
7
0
0
7
0
4
7
7
7
7
7
7
0
7
4
7
7
7
1
7
6
0
4
7
6
6
7
6
6
6
6
7
0
0
4
5
3
7
7
7
7
7
7
7
59
56
58
59
63
58
50
59
34
65
44
65
66
65
63
64
66
60
53
40
60
45
59
66
65
58
66
66
66
45
66
60
63
66
60
59
64
59
48
47
65
63
62
66
64
19
62
60
63
40
43
58
53
56
66
64
65
66
66
64
65
0.54
0.98
0.69
0.81
0.67
0.61
1
0.40
0.96
0.96
0.87
0.99
0.87
0.89
0.59
1
0.79
0.67
0.61
0.20
0.98
0.92
0.89
0.99
0.99
0.99
0.82
0.98
0.98
0.92
0.52
0.99
0.97
1
0.89
0.85
0.99
0.80
0.94
0.64
0.94
0.95
0.99
0.83
0.99
0.99
0.80
0.13
0.78
0.70
0.70
0.92
0.92
0.95
0.97
1
0.49
0.96
0.36
0.28
0.90
N
ω
Ka
Ks
864
772
837
850
992
841
609
869
280
1,041
480
1,054
1,088
1,055
992
1,023
1,087
895
696
383
884
464
870
1,088
1,041
837
1,084
1,085
1,087
434
1,088
895
989
1,082
841
863
1,024
870
527
489
1,054
990
960
1,079
1,024
60
961
880
992
336
390
841
688
783
1,042
1,023
1,054
1,084
1,063
1,017
1,054
0.12
0.05
0.15
0.11
0.14
0.06
0.12
0.10
0.24
0.05
0.18
0.00
0.06
0.02
0.07
0.01
0.06
0.07
0.07
0.19
0.02
0.01
0.05
0.03
0.01
0.01
0.15
0.03
0.05
0.13
0.03
0.05
0.07
0.03
0.14
0.12
0.09
0.12
0.02
0.06
0.05
0.04
0.01
0.05
0.01
0.06
0.02
0.14
0.06
0.15
0.06
0.14
0.09
0.07
0.04
0.03
0.04
0.11
0.06
0.05
0.11
0.17
0.09
0.31
0.31
0.18
0.15
0.17
0.11
0.31
0.12
0.34
0.01
0.03
0.03
0.12
0.02
0.12
0.09
0.11
0.19
0.05
0.02
0.10
0.04
0.02
0.02
0.18
0.06
0.09
0.13
0.04
0.08
0.11
0.10
0.24
0.08
0.21
0.28
0.04
0.09
0.03
0.11
0.01
0.07
0.02
0.09
0.04
0.18
0.09
0.13
0.16
0.23
0.07
0.10
0.05
0.07
0.09
0.15
0.11
0.10
0.14
1.30
1.66
2.17
3.00
1.35
3.00
1.51
1.23
1.32
2.88
1.91
2.98
0.55
1.41
1.76
1.15
1.88
1.20
1.40
0.97
3.00
0.95
2.14
1.43
1.62
2.64
1.30
1.96
1.68
1.05
1.70
1.49
1.54
3.00
1.57
0.68
2.23
2.37
2.43
1.99
0.57
2.62
0.89
1.22
1.87
1.62
1.42
1.41
1.30
0.93
2.98
1.63
0.75
1.82
1.15
2.44
2.88
1.34
1.83
1.60
1.23
Bold HGNC gene symbols are genes classified as extracellular; not bold are intracellular. Betweenness is the amount influence a node exerts on the
interactions of the other nodes (range 0–1). Degree is a measure of connectivity and is the number of edges or interactions that gene has with other genes
or proteins based on BioGrid human reactome 3.2.95 (1) (including protein-protein and protein-gene interactions). The numbers below each taxa represent the
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
12 of 19
number of sequences from that group represented in the alignment. Total is the number of sequences in alignment; N = total pairwise comparisons between
reptiles and mammals used to calculate divergence measures. Divergence measures (Ka, nonsynonymous divergence; Ks, synonymous; ω, nonsynonymous/
synonymous) are the median of the pairwise comparisons calculated in PAML between reptiles and mammals. Length is the median length of sequences in the
multiple species alignment given as a proportion of the longest human isoform.
1. Stark C, et al. (2006) BioGRID: A general repository for interaction datasets. Nucleic Acids Res 34(Database issue, suppl 1)D535–D539.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
13 of 19
Table S2. Genomic and transcriptomic datasets used in this study
Common name
Species name
Tissue
Green anole*
Red-eared slider turtle
Anolis carolinesis
Trachemys scripta
Painted turtle†
Galápagos tortoise†
Chinese softshell turtle*
Chinese alligator†
Pigeon†
Darwin finch†
Budgerigar◇
Saker falcon†
Peregrine falcon†
Collared flycatcher*
Turkey*
Chicken*
Zebrafinch*
Duck*
Tenrec†
Elephant*
Rat*
Mouse*
Shrew†
Vole†
Chrysemys picta
Chelonoidis nigra
Pelodiscus sinensis
Alligator sinensis
Columba livia
Geospiza fortis
Melopsittacus undulatus
Falco cherrug
Falco peregrinus
Ficedula albicollis
Meleagris gallopavo
Gallus gallus
Taeniopygia guttata
Anas platyrhynchos
Echinops telfairi
Loxodonta africana
Rattus norvegicus
Mus musculus
Sorex araneus
Microtus ochrogaster
Multiple
Brain
Embryonic
stage 14, 17
Multiple
Blood
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Ground squirrel*
Pika†
European rabbit*
Naked mole rat†
Guinea pig*
Bush baby*
Macaque*
White-cheeked gibbon*
Ictidomys tridecemlineatus
Ochotona princeps
Oryctolagus cuniculus
Heterocephalus glaber
Cavia porcellus
Otolemur garnettii
Macaca mulatta
Nomascus leucogenys
Orangutan*
Gorilla gorilla*
Chimpanzee*
Human*
Pig*
Cow*
Dolphin†
Horse*
Little brown bat*
Brandt’s bat†
Cat*
Dog*
Giant Panda*
Ferret*
Armadillo†
Opossum*
Platypus*
Tasmanian devil*
Alligator
Anolis lizard
Alligator lizard
Fence lizard
Bearded dragon
Skink
Gecko
African house snake
Cottonmouth
Sunbeam snake
Total contigs
Mean (bp) N50 (bp)
n:N50
19,177
55,456
included above
1,589
767
2,094
1,074
25,802
19,668
20,668
38,114
31,132
28,607
26,145
26,628
27,810
15,893
16,496
16,354
18,204
16,353
38,810
25,635
25,725
50,718
40,099
46,900
1,646
615
1,588
1,104
118
1,140
1,179
1,207
1,206
1,635
1,596
1,669
1,347
1,494
1,097
1,623
1,532
1,358
1,125
1,042
2,091
687
2,013
1,686
1,737
1,749
1,818
1,875
1,869
2,202
2,148
2,223
1,911
2,142
1,605
2,109
2,043
2,013
1,590
1,620
5,881
5,265
4,770
6,732
5,435
5,017
4,610
4,689
4,797
3,430
3,634
3,537
3,644
3,265
7,135
5,771
5,571
9,740
7,676
8,080
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
20,000
40,749
20,588
69,635
19,774
19,986
36,384
19,988
1,542
1,092
1,602
1,046
1,567
1,619
1,442
1,626
1,932
1,632
2,100
1,578
2,058
2,085
1,920
2,133
4,560
7,378
4,533
12,738
4,357
4,505
7,979
4,435
Pongo abelii
Gorilla gorilla
Pan troglogdytes
Homo sapiens
Sus scrofa
Bos taurus
Tursiops truncatus
Equus caballus
Myotis lucifugus
Myotis brandtii
Felis catus
Canis familiaris
Ailuropoda melanoleuca
Mustela putorius
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
Multiple
21,414
27,473
19,907
102,156
25,883
22,118
38,169
22,654
20,719
47,102
20,259
25,160
21,136
20,062
1,507
1,608
1,582
1,147
1,354
1,605
979
1,688
1,535
1,023
1,587
1,734
1,618
1,606
2,040
2,166
2,094
1,839
1,824
2,082
1,377
2,319
2,037
1,557
2,112
2,298
2,154
2,127
4,562
5,842
4,327
17,747
5,574
4,830
7,665
4,641
4,466
8,315
4,354
5,395
4,520
4,295
Dasypus novemcinctus
Monodelphis domestica
Ornithorhynchus anatinus
Sarcophilus harrisii
Alligator mississippiensis
Anolis sagrei
Elgaria multicarinata
Sceloporus undulatus
Pogona vitticeps
Scincella lateralis
Eublepharis macularius
Lamprophis fuliginosus
Agkistrodon piscivorus
Xenopeltis unicolor
Multiple
Multiple
Multiple
Multiple
Liver, f‡, juvenile
Liver, m, adult
Liver, u, juvenile
Liver, m, adult
Liver, u, juvenile
Liver, u, adult
Liver, m, adult
Liver, f, adult
Liver, f, adult
Liver, f, adult
57,911
22,310
23,584
22,404
47,884
23,392
24,018
32,046
38,739
50,129
37,488
32,952
25,220
27,211
991
1,592
1,166
1,604
868
891
888
1,000
933
945
931
818
903
956
1,407
2,049
1,593
2,091
1,206
1,227
1,242
1,479
1,323
1,359
1,338
1,077
1,257
1,359
11,113
4,975
4,777
4,987
9,548
4,843
4,978
6,178
7,910
9,867
7,508
7,149
5,353
5,606
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
GC
4,305 48.42
10,920 50.88
Citation
(1 2)
(3)
(4)
48.83
(5)
45.89
(6)
48.46
(7)
49.70
(8)
50.56
(9)
51.04
(10)
49.28
(11)
49.37
(12)
49.63
(12)
52.17
(13)
48.61
(14)
50.32
(15, 16)
50.75
(17)
49.25
(18)
54.18
(19)
51.76
(19)
51.77
(20)
51.95
(19)
55.44
(19)
52.22 Unpublished
Broad Institute
51.83
(19)
54.49
(19)
53.88
(19)
53.87
(21)
52.56
(19)
51.55
(19)
51.54
(22)
51.52 Baylor College
of Medicine
52.03
(23)
52.15
(24)
51.96
(25)
52.24
(19)
53.25
(26)
53.33
(19)
53.63
(19)
51.57
(19)
53.16
(19)
53.04
(27)
52.67
(19)
52.77
(28)
52.89
(29)
53.35 Unpublished
Broad Institute
54.18
(19)
48.32
(30)
54.07
(31)
47.91
(32)
49.35 This study, SM07
47.77 This study, SM02
48.76 This study, SM03
47.48 This study, SM08
49.44 This study, SM09
51.22 This study, SM12
48.76 This study, SM15
47.69 This study, SM04
47.57 This study, SM05
47.63 This study, SM06
14 of 19
Table S2. Cont.
Common name
Viper boa
W. aquatic garter snake
Garter snake-lake
Garter snake-meadow
Snapping turtle
Stinkpot turtle
Sideneck turtle
Box turtle
Species name
Candoia aspera
Thamnophis couchii
Thamnophis elegans
Thamnophis elegans
Cheyldra serpentina
Sternotherus odoratus
Pelusios castaneus
Terrapene ornata
Tissue
Liver,
Liver,
Liver,
Liver,
Liver,
Liver,
Liver,
Liver,
f, adult
f, adult
m, juvenile
f, juvenile
m, juvenile
f, juvenile
f, juvenile
u, juvenile
Total contigs
34,984
38,648
37,723
36,090
26,251
43,717
40,755
43,109
Mean (bp) N50 (bp)
947
986
1,013
1,053
835
971
984
959
1,332
1,410
1,443
1,566
1,119
1,413
1,434
1,401
n:N50
GC
7,215
7,666
7,635
6,963
5,688
8,652
7,943
8,207
48.56
47.77
47.83
47.64
50.45
50.97
49.70
50.44
Citation
This
This
This
This
This
This
This
This
study,
study,
study,
study,
study,
study,
study,
study,
SM14
TC
HS08
HS11
SM01
SM10
SM11
SM13
Contigs less than 200 bp were not included in our study. n:N50 is defined here as the number of contigs that add up to 50% of the total assembly size when
sorted longest to shortest, and the N50 refers to the mean length of the contig such that half of all bases in the assembly are made of sequences of equal or
longer length. Liver transcriptome was sequenced for all individuals in our study and the sex and stage is given. Individual identifier abbreviation of raw
sequence data for the liver transcriptome data generated from this study can be found under Citation. U, unknown.
*Sequence was downloaded from Ensembl, thus, is also annotated using the genomic sequence.
†
Sequence was RNA downloaded from NCBI’s genome ftp.
‡
Sex: f, female; m, male; u, unknown.
1. Alföldi J, et al. (2011) The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature 477(7366):587–591.
2. Eckalbar WL, et al. (2013) Genome reannotation of the lizard Anolis carolinensis based on 14 adult and embryonic deep transcriptomes. BMC Genomics 14(1):49.
3. Tzika AC, Helaers R, Schramm G, Milinkovitch MC (2011) Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic
position of turtles. Evodevo 2(1):19.
4. Kaplinsky NJ, et al. (2013) The embryonic transcriptome of the red-eared slider turtle (Trachemys scripta). PLoS ONE 8(6):e66357.
5. Shaffer HB, et al. (2013) The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage. Genome Biol 14(3):R28.
6. Chiari Y, Cahais V, Galtier N, Delsuc F (2012) Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (Archosauria). BMC Biol 10(1):65.
7. Wang Z, et al. (2013) The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nat Genet 45(6):
701–706.
8. Wan Q-H, et al. (2013) Genome analysis and signature discovery for diving and sensory properties of the endangered Chinese alligator. Cell Res 23(9):1091–1105.
9. Shapiro MD, et al. (2013) Genomic diversity and evolution of the head crest in the rock pigeon. Science 339(6123):1063–1067.
10. Parker P, Li B, Li H, Wang J (2012) The genome of Darwin’s Finch (Geospiza fortis). GigaScience. Available at dx.doi.org/10.5524/100040. Accessed September 10, 2013.
11. Bradnam KR, et al. (2013) Assemblathon 2: Evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2(1):10.
12. Zhan X, et al. (2013) Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle. Nat Genet 45(5):563–566.
13. Ellegren H, et al. (2012) The genomic landscape of species divergence in Ficedula flycatchers. Nature 491(7426):756–760.
14. Dalloul RA, et al. (2010) Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis. PLoS Biol 8(9):e1000475.
15. Rubin C-J, et al. (2010) Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464(7288):587–591.
16. Hillier LW, et al.; International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate
evolution. Nature 432(7018):695–716.
17. Warren WC, et al. (2010) The genome of a songbird. Nature 464(7289):757–762.
18. Huang Y, et al. (2013) The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nat Genet 45(7):776–783.
19. Lindblad-Toh K, et al.; Broad Institute Sequencing Platform and Whole Genome Assembly Team; Baylor College of Medicine Human Genome Sequencing Center Sequencing Team;
Genome Institute at Washington University (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478(7370):476–482.
20. Gibbs RA, et al.; Rat Genome Sequencing Project Consortium (2004) Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428(6982):493–521.
21. Kim EB, et al. (2011) Genome sequencing reveals insights into physiology and longevity of the naked mole rat. Nature 479(7372):223–227.
22. Gibbs RA, et al.; Rhesus Macaque Genome Sequencing and Analysis Consortium (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316(5822):
222–234.
23. Locke DP, et al. (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469(7331):529–533.
24. Scally A, et al. (2012) Insights into hominid evolution from the gorilla genome sequence. Nature 483(7388):169–175.
25. Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055):69–87.
26. Groenen MA, et al. (2012) Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491(7424):393–398.
27. Seim I, et al. (2013) Genome analysis reveals insights into physiology and longevity of the Brandt’s bat Myotis brandtii. Nat Commun 4:2212.
28. Lindblad-Toh K, et al. (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438(7069):803–819.
29. Li R, et al. (2010) The sequence and de novo assembly of the giant panda genome. Nature 463(7279):311–317.
30. Mikkelsen TS, et al.; Broad Institute Genome Sequencing Platform; Broad Institute Whole Genome Assembly Team (2007) Genome of the marsupial Monodelphis domestica reveals
innovation in non-coding sequences. Nature 447(7141):167–177.
31. Warren WC, et al. (2008) Genome analysis of the platypus reveals unique signatures of evolution. Nature 453(7192):175–183.
32. Murchison EP, et al. (2012) Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer. Cell 148(4):780–791.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
15 of 19
Table S3. Results from tests for positive selection on each IIS/TOR gene
Classification
HGNC
symbol
bs_reptilesC
bs_reptiles
bs_mammal
bs_croc
bs_bird
bs_turtle
bs_squamata
CMC_squamata
CMCreptiles
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
Intracellular
IGF1
IGF1R
IGF2
IGF2R
IGFBP2
IGFBP3
IGFBP4
IGFBP5
INS
INSR
AKT1S1
CALM1
EIF4E
EIF4E2
FOXO1
GRB2
IKBKB
INPPL1
IRS1
IRS2
KRAS
MAPK10
MKNK1
MLST8
MTOR
NRAS
PDE3B
PDPK1
PHKB
PHKG1
PIK3CA
PIK3CB
PIK3CD
PIK3CG
PIK3R5
PPARGC1A
PPP1R3C
PPP1R3D
PRKAA2
PRKCG
PTEN
PTPN1
RHEB
RICTOR
RPS6
RPS6KA6
SGK1
SH2B2
SHC1
SHC2
SHC3
SOCS1
SOCS3
SOCS4
SOS1
STK11
STRADA
TSC1
TSC2
ULK2
ULK3
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
−6.08
0.90
0.00
−25.65
0.00
0.00
0.00
0.00
0.00
6.58
0.00
24.72
0.02
0.00
0.00
−21.23
0.00
0.00
−538.65
−121.84
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
−0.21
0.00
−0.02
0.00
0.00
23.84
0.00
0.00
6.30
0.00
0.00
0.00
0.00
0.00
−38.49
−0.03
0.00
0.00
0.00
0.00
0.00
4.19
6.54
0.85
38.16
8.30
0.00
2.21
7.33
3.59
9.40
0.70
0.00
0.00
0.00
0.00
0.00
1.78
14.01
66.81
14.35
0.99
0.00
0.00
0.00
2.18
0.00
5.80
0.00
7.50
0.05
0.00
2.98
6.86
0.00
50.68
0.17
0.00
0.48
0.57
12.88
0.00
0.00
0.00
4.15
0.00
0.00
0.00
0.00
0.15
1.43
0.00
0.00
0.03
0.23
0.00
0.00
0.01
0.00
3.36
0.00
0.00
0
0.00
1.54
29.44
0.00
0.00
4.45
3.29
3.02
15.62
8.31
0.0
0.0
0.0
1.64
0.0
3.45
0.0
20.51
14.34
0.0
0.0
6.47
2.03
4.16
0.0
0.0
0.0
8.48
0.37
0.0
8.87
14.65
4.18
24.83
0.0
0.56
0.59
0.0
27.81
0.0
0.0
0.0
12.67
0.0
0.0
0.0
2.11
0.05
6.62
2.06
0.01
3.50
0.0
0.25
0.0
1.98
2.85
17.23
0.0
3.09
0.00
12.16
0.00
8.77
0.00
0.00
10.55
11.87
0.00
24.17
0.00
0.00
0.00
0.00
0.60
0.00
0.00
0.00
6.15
10.38
0.00
6.93
0.00
0.00
NA
−0.01
0.00
0.00
0.00
0.55
0.00
1.23
0.77
1.19
5.14
0.00
0.00
0.00
0.00
4.74
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
3.32
0.00
NA
1.63
0.00
0.00
0.00
12.95
−509.79
7.62
0.61
0.00
0.00
0.00
17.32
0.00
0.00
NA
3.75
0.00
−245.07
NA
0.07
0.00
−34.75
0.00
0.00
7.08
7.41
0.00
0.00
0.00
0.00
0.00
0.00
0.00
−0.29
0.00
0.00
−1.13
2.40
0.00
2.40
1.27
0.00
5.66
0.00
0.00
4.73
0.00
NA
0.00
3.02
0.00
3.40
0.00
NA
0.00
0.00
0.00
0.00
0.00
5.60
0.00
0.00
0.00
0.00
0.00
2.79
7.95
0.00
2.28
0.00
0.00
0.00
25.59
1.88
0.00
0.00
10.94
0.00
5.77
0.00
0.00
0.00
0.00
0.00
0.00
0.00
6.37
34.06
7.13
0.00
0.00
0.00
0.00
NA
0.07
0.00
0.00
−3.03
0.01
61.91
0.39
−0.01
0.00
0.00
0.00
0.00
13.59
0.00
5.82
0.00
0.00
0.00
−0.01
0.00
0.00
0.00
0.00
0.00
8.13
0.47
0.00
0.00
0.00
1.06
−0.22
0.00
8.12
2.17
0.00
1.17
4.16
10.44
1.75
31.03
5.84
0.00
2.61
2.59
0.00
0.84
3.65
0.00
0.00
0.00
0.00
0.00
72.16
1.36
21.81
14.35
0.99
0.00
0.00
0.00
0.79
0.00
6.52
0.00
0.00
6.67
−3.61
1.63
1.90
0.00
13.56
0.65
0.00
0.00
3.03
0.00
0.00
0.00
0.00
5.32
0.00
0.00
0.00
0.00
2.43
6.64
NA
0.00
0.02
0.50
0.00
0.00
0.18
0.00
177.56
0.00
0.00
94.02
2.09
21.80
156.70
109.35
0.02
16.76
0.89
−99.60
183.93
3.90
0.00
10.83
−9.30
51.56
−41.64
7.36
0.00
12.80
1.21
−90.99
2.90
0.06
17.46
16.02
0.92
3.83
3.99
11.47
12.86
27.03
8.19
4.54
0.90
19.40
27.20
0.44
0.13
4.37
39.70
−67.73
−453.77
125.64
9.08
0.48
51.16
−33.02
16.07
0.16
0.01
NA
9.92
8.01
1.76
−0.03
1.11
9.60
14.98
2.41
0.75
0.35
−33.35
2.57
63.69
372.49
23.43
−209.79
24.01
3.54
−99.60
31.49
14.55
−0.34
0.46
−9.30
48.00
−41.67
31.28
76.69
24.89
1.21
−90.99
−13.98
0.65
3.26
17.98
1.29
0.58
7.73
0.85
0.00
97.43
2.57
29.62
3.35
35.65
22.46
7.17
11.89
0.18
24.75
−67.73
−436.67
0.44
43.17
22.79
61.63
4.07
69.51
31.10
0.02
64.93
39.29
2.22
−543.68
−0.18
6.11
0.87
25.44
2.37
54.64
1.39
χ2 values from likelihood ratio tests from PAML, where significant values suggest evidence for positive selection at the gene level for the specified phylogenetic
clade or branch. Italic and bold = significant at P < 0.05 before multiple test correction. Bold and underlined = significant at P < 0.05 after multiple test correction.
The CMCs used the entire clade as the foreground. bs, branch-site test; bs_reptilesC, branch-site test with the entire reptile clade as the foreground branch, all other
branch-site tests used only the branch leading to the specific taxa as the foreground branch; CMC, clade model; NA, not applicable for the specific gene.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
16 of 19
Table S4. Positively selected amino acid sites in hormones and binding domains of the receptors
Human
mature protein
Mammal
clade
Snake
proto-protein
Reptile
clade
Signal peptide
C-domain
C-domain
C-domain
C-domain
Signal peptide
Signal peptide
Propeptide
Propeptide
B-domain
C-domain
C-domain
C-domain
R6
D60
L80
L82
Q87
0.97
0.99
0.95
0.97
P2
S33
S34
R37
R6
Q60
Q78
Q80
V85
A17
V18
I22
F37
Q54
G85
S86
S89
0.99
0.99
0.94
0.98
1.00
1.00
0.93
1.00
IGF1
IGF1
IGF1
C-domain
C-domain
A-domain
A38
Q40
R55
S90
T92
I107
0.99
1.00
0.99
IGF1
IGF1
IGF1
IGF1
IGF1
IGF1
IGF1
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
IGF2
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
D-domain
E peptide
E peptide
E peptide
E peptide
E peptide
E peptide
Signal peptide
Signal peptide
C-domain
C-domain
C-domain
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
Protopeptide
L1 domain
L1 domain
L1 domain
L1 domain
CR domain
CR domain
CR domain
CR domain
CR domain
CR domain
CR domain
L2 domain
FnIII-1
FnIII-1
FnIII-2
FnIII-2
FnIII-2
FnIII-2
FnIII-2
L64
Y87
Q88
S91
K94
K97
K102
V
I
A32
V35
S36
P74
F81
R83
Y92
V117
K120
E123
F125
R126
K129
A136
T139
Q140
V1
P3
R13
D68
Q171
S180
T188
Y226
R230
Q266
P280
G311
P537
Q540
S658
G735
V737
V744
A746
V116
V140
H141
N144
R147
T150
Y155
L3
V15
V48
N51
R52
L91
F102
K104
Y113
W139
E142
Q145
S147
E148
K151
V158
T161
H162
V1
P3
N13
K68
D170
S179
A187
V225
R229
S265
P277
E307
S533
K536
NA
A719
S721
T728
G730
0.97
Gene
Protein domain
INS
INS
INS
INS
INS
IGF1
IGF1
IGF1
IGF1
IGF1
IGF1
IGF1
IGF1
Mammal
branch
0.92
0.96
0.99
1.00
0.92
1.00
0.98
0.97
0.96
1.00
1.00
1.00
0.99
1.00
1.00
0.99
1.00
0.99
1.00
0.92
0.92
0.92
0.96
1.00
1.00
0.99
0.92
0.91
0.97
0.91
1.00
0.97
0.91
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
Reptile
branch
Functional annotations
1.00
Affects binding affinity to
IGF1R and INSR (1, 2)
0.99
0.98
Affects binding affinity
to IGF2R (3)
0.99
0.99
1.00
0.95
1.00
0.98
0.95
1.00
1.00
0.99
0.91
17 of 19
Table S4. Cont.
Gene
Protein domain
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
INSR
FnIII-2
FnIII-2
FnIII-2
FnIII-2
FnIII-2
FnIII-3
FnIII-3
FnIII-3
Transmembrane
region
Transmembrane
region
Transmembrane
region
Transmembrane
region
INSR
INSR
INSR
Human
mature protein
Mammal
clade
T757
S758
V769
N770
T796
L886
L865
S884
K923
1.00
Mammal
branch
Snake
proto-protein
0.98
E740
V741
V752
F753
A779
Q846
S848
Q867
A904
1.00
0.97
Reptile
clade
1.00
1.00
1.00
1.00
1.00
I916
1.00
V938
F918
0.96
G922
1.00
R1241
S12
W14
L16
S29
K31
E44
A175
Y222
T278
N281
0.91
0.91
INSR
IGF1R
IGF1R
IGF1R
IGF1R
IGF1R
IGF1R
IGF1R
IGF1R
IGF1R
IGF1R
Signal peptide
Signal peptide
Signal peptide
Signal peptide
L1 domain
L1 domain
CR domain
CR domain
CR domain
CR domain
P1266
*
*
*
*
E1
Q14
P145
R192
D248
F251
IGF1R
IGF1R
IGF1R
IGF1R
IGF1R
IGF2R
IGF2R
CR domain
CR domain
CR domain
L2 domain
L2 domain
Domain 11
Domain 11
E259
D262
Q275
M319
L379
A541
Y1542
P289
L292
Q306
S349
N409
Y1456
F1458
0.96
0.97
1.00
1.00
1.00
IGF2R
Domain 11
E1544
N1460
0.98
IGF2R
IGF2R
IGF2R
Domain 11
Domain 11
Domain 11
K1545
Y1549
N1558
Q1461
Q1641
T1474
1.00
0.95
0.90
IGF2R
IGF2R
IGF2R
Domain 11
Domain 11
Domain 11
P1561
G1568
Q1569
G1478
G1487
H1488
0.98
IGF2R
IGF2R
IGF2R
IGF2R
IGF2R
IGF2R
Domain
Domain
Domain
Domain
Domain
Domain
11
11
11
11
11
11
T1570
R1571
A1577
K1593
D1594
G1603
Q1489
P1490
L1497
K1512
E1513
A1522
0.94
0.99
0.96
1.00
0.91
0.97
IGF2R
Domain 11
V1609
IGF2R
IGF2R
Domain 11
Domain 11
R1623
I1627
Q1542
I1546
0.98
1.00
IGF2R
Domain 11
Q1632
K1551
0.98
IGF2R
IGF2R
IGF2R
Domain 11
Domain 11
Domain 11
P1643
−1648
R1655
V1562
R1569
T1576
0.99
0.92
0.97
1.00
0.94
0.94
1.00
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
Functional annotations
1.00
S936
I942
Reptile
branch
1.00
0.98
0.96
0.99
0.96
0.93
0.95
0.98
0.92
0.96
Interacts with IGF1
C-domain (not IGF2) (4)
0.98
1.00
Y1528
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
Predicted to affect IGF2 binding based on
substitution in Chicken/Monotreme (5)
1.00
18 of 19
Listed are the sites with a posterior probabilities > 0.9 of being under positive selection in PAML branch-site model using either the branch leading to the
clade or the entire clade in the foreground. The amino acid sites in the “human mature protein” sequence correspond to the expanded amino acids in Fig. 2.
For the “human mature protein”, the amino acid listed is the human variant. For the “snake protoprotein” the amino acid listed is the snake variant.
“Functional Annotations” column lists studies that have assigned functional significance to particular sites based on mutagenesis, antibody binding, and
crystalline structure complexes (not an exhaustive list). NA, not applicable.
1. Denley A, Cosgrove LJ, Booker GW, Wallace JC, Forbes BE (2005) Molecular interactions of the IGF system. Cytokine Growth Factor Rev 16(4-5):421–439.
2. Zhang W, Gustafson TA, Rutter WJ, Johnson JD (1994) Positively charged side chains in the insulin-like growth factor-1 C- and D-regions determine receptor binding specificity. J Biol
Chem 269(14):10609–10613.
3. Sakano K, et al. (1991) The design, expression, and characterization of human insulin-like growth factor II (IGF-II) mutants specific for either the IGF-II/cation-independent mannose
6-phosphate receptor or IGF-I receptor. J Biol Chem 266(31):20626–20635.
4. Keyhanfar M, Booker GW, Whittaker J, Wallace JC, Forbes BE (2007) Precise mapping of an IGF-I-binding site on the IGF-1R. Biochem J 401(1):269–277.
5. Brown J, Jones EY, Forbes BE (2009) Keeping IGF-II under control: Lessons from the IGF-II-IGF2R crystal structure. Trends Biochem Sci 34(12):612–619.
Table S5. Variation in the sequence and presence of the IGF binding domain in IGF binding
proteins 2–6 (% is the amino acid percent identity over the complete alignments)
Taxon
Reptiles
Archosaurs
Turtles
Squamates
Mammal
Primates
Other placental mammals
Monotreme/marsupials
BP2 (71%)
BP3 (75%)
BP4 (80%)
BP5 (83%)
BP6 (56%)
G: 5/5 M
T: 7/7 M
G: 0
T: 2/6 F
3/6 R
1/6 M
G: 1/1 M
T: 9/12 F
1/12 R
2/12 M
G: 1/5 F
3/5 R
1/5 M
T: 2/7 R
5/7 M
G: 1/1 F
T: 3/6 F
2/6 R
1/6 M
G: 1/1 F
T: 3/ 8 F
2/8 R
3/8 M
G: 3/5 R
2/5 M
T: 2/6 F
2/6 R
2/6 M
G: 1/1 F
T: 6/6 F
G: 2/4 F
2/4 M
T: 1/7 F
4/7 R
2/7 M
G: 1/1 M
T: 2/4 F
2/4 R
G: 0
T: 0
Suspect
Gene
Lost
G: 1/1 M
T: 2/2 R
G: 1/1 F
T: 10/11 F*
1/11 R
G: 1/1 M
T: 10/12 F
2/12 R
G: 1/1 M
T: 5/8 F
3/8 R
G: 6/7 F
1/7 M
G: 6/14 F
4/14 R
4/14 M
T: 6/9 F
2/9 R
1/9 M
G: 1/2 F
1/2 M
G: 6/7 F
1/7 M
G: 6/14 F
4/14 R
4/14 M
T: 1/6 F
1/6 R
5/6 M
G: 2/3 F
1/3 M
G: 7/7 F
G: 7/7 F
G: 6/6 F
G: 12/14 F
2/14 R
T: 5/7 F
1/1 R
1/1 M
G: 14/14 F
T: 7/7 F
G: 14/14 F
T: 7/7 F
G: 2/3 F
1/3 M
G: 1/2 F
1/2 R
G: 1/1 F
Within reptiles and mammals, for each specified group of species, we report the proportion of sequences
from genomic data (G) and/or transcriptomic data (T) that have the full N-terminal domain (F), a truncated
N-terminal domain (R), or a missing binding domain (M).
*Two species of snakes also showed an isoform with a missing IGF domain.
McGaugh et al. www.pnas.org/cgi/content/short/1419659112
19 of 19
Download