Porter et al Supplemental Methods

advertisement
1
Dataset (Supplemental)
2
The supplemental dataset contains information for the source of each sequence, the major lineage
3
and subgroup designation based on our phylogenetic analysis, the G-protein binding partner, cell
4
type, tissue expression, and counter-ion location, as well as the associated references for each
5
piece of information. All of the sequences acquired from genome data represent manually
6
curated sequences, which can be found on the UCSC genome browser
7
(http://genomewiki.ucsc.edu/index.php/Opsin_evolution:_update_blog) or in the supplemental
8
alignment. For records without an associated Genbank accession number (i.e. 123 sequences),
9
the source has been listed as ‘Genome’ or ‘EST library’. Because genome assemblies are
10
unstable and change as improvements are made, the accessions used at the time of making the
11
gene models are transient. Therefore, these labels indicate that the gene model was obtained by a
12
BLAST of close homologs to GenBank genomic sequences or ESTs rather than to trace reads or
13
transcripts or extracted from published literature. To find the source for these gene models,
14
BLAST the sequence supplied in our supplemental alignment against the raw genomic and EST
15
data at GenBank for the species in question to find the current assembly, scaffold, contig, trace
16
or transcript number. To validate the model for consistency, compare visually to known
17
orthologs in our compilation of manually curated sequences
18
(http://genomewiki.ucsc.edu/index.php/Opsin_evolution:_update_blog) for exon phase and
19
length matching, perform a simple multiple alignment for consistency with orthologs using
20
Multalin (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_multalin.html),
21
and examine the gene model for conservation at known conserved sequence motifs. For more
22
details on the methods used, see below.
23
24
METHODS (Supplemental)
25
Sequence Acquisition
26
Opsin sequences were acquired from Genbank using one of two methods. To include sequences
27
where expression data are available, opsin transcript sequences were mined from Genbank for
28
major taxonomic and known opsin groups using queries of either the nucleotide or the
29
‘transcriptome shotgun assmebly’ databases. To generate the dataset of genomically-derived
30
opsins, we used conventional transcript-derived entries from GenBank as tBlastn queries against
31
the wgs division of GenBank (where all genome assemblies are stored). Because opsins have a
32
minimum of ~ 20% protein identity to bovine rhodopsin, as determined by the degree of
33
conservation across all GPCRs, and because other opsins in the collection will outperform
34
bovine rhodopsin as a query in specific cases, this method has more than enough sensitivity to
35
detect even short homologous exons in the bovine rhodopsin K296 region. Thus the complete
36
opsin repertoire can be recovered from each species - even new homology classes - provided it is
37
present in the assembly. The recovered sequences using this methodology often need curation in
38
regions of weak alignment against the original data for a particular species and in the case of
39
misassembly stutter. Missing exons were often locatable at the NCBI trace archive in the form of
40
isolated reads that were omitted from the assembly, taking into account the risk of gathering
41
inappropriate exons from unsuspected gene duplications. Recovered opsin sequences were
42
quality checked for the presence of a lysine in the 7th transmembrane helix (bovine rhodopsin site
43
296), conservation of invariant opsin residues and motifs, and better back-blast to opsins than
44
any annotated non-opsin GPCR at the GenBank nucleotide division. With the exception of
45
species too closely related to one already represented in the dataset, all metazoan genomes at
46
GenBank as of 1 Dec 2010 are represented by the dataset used in this study. No species
47
diverging earlier than the ctenophore (notably neither the sponge Amphimedon queenslandica
48
nor the choanoflagellate Monosiga brevicollis) contains a GPCR with a lysine in a position
49
alignable with the K296 motif of bovine rhodopsin. A curated set of recovered opsins from
50
genome trace files, updated monthly, is available at the UCSC genome browser
51
(http://genomewiki.ucsc.edu/index.php/Opsin_evolution:_update_blog).
52
For each opsin extracted from a genome project, the location and phase of each intron
53
within the coding region was determined by alignment to genomic contigs. All opsins to date
54
utilize standard GT-AG splice donors and acceptors. We parsimoniously resolved each
55
significant sequence change (i.e. intron change and indels) as a gain or loss by determination of
56
its ancestral status via multiple outgroups. In bilaterans, notably the tunicate Ciona and insects
57
including Drosophila, very rapid turnover of introns (both gain and loss) occurs; however these
58
limitations do not interfere with opsin classification because enough species with conservatively
59
evolving introns are available to reconstruct the evolutionary history.
60
61
Phylogenetic Analyses
62
For phylogenetic analyses, sequences that spanned less than half of the transmembrane regions
63
of the protein were discarded from the analyses, resulting in 889 transcript plus genome trace
64
opsin sequences. In order to root our opsin phylogenetic analyses, 22 non-opsin GPCRs from the
65
human genome were used as outgroups: somatostatin receptor, opioid receptor mu 3, galanin
66
receptor, chemokine (C-C motif) receptor, bradykinin receptor 1, uracil/cys-leukotriene dual
67
receptor, cys-leukotriene receptor 1, purinergic receptor, orexin receptor, tachykinin receptor,
68
neuromedin U receptor, pyroglutamylated RFamide peptide receptor, human orphan receptor 19,
69
pancreatic polypeptide receptor, neuropeptide Y receptor, prolactin releasing hormone, human
70
orphan receptor 161, alpha-1D-adrenergic receptor, thyrotropin-releasing hormone receptor,
71
thyrotropin receptor, adenosine A3 receptor, and opiate receptor-like 1. This set of sequences
72
was selected as outgroups based on previous phylogenetic studies of opsin and GPCR evolution
73
(Davies et al. 2010; Fredriksson et al. 2003; Plachetzki et al. 2010 Porter et al. 2007; Suga et al.
74
2008) as well as based on a rigorous procedure of BLASTing human opsins against the all other
75
human GPCRs (for description of detailed BLAST procedure see:
76
http://genomewiki.ucsc.edu/index.php/Opsin_evolution#GPCR_outgroup_sequences).
77
Furthermore, rerunning the phylogeny reconstruction without outgroup sequences does not
78
significantly change the sequences within each of the major clades nor the relationships among
79
them, with the exception of three sequences (Platynereis dumerilii TMT1 and TMT2;
80
Stronglocentrus purpuratus encephalopsin) placed at the base of the ‘C-type’ clade when rooted,
81
which are placed at the base of the ‘Cnidops’ clade when unrooted (data not shown).
82
Amino acid sequences of the 889 opsins mined from Genbank and the 22 human
83
outgroup GPCR sequences were aligned using the online MAFFT v6.0 server
84
(http://mafft.cbrc.jp/alignment/server/) (Katoh et al. 2005a; Katoh et al. 2005b; Katoh et al.
85
2002). The aligned dataset was then trimmed to remove the N- and C-terminal sequences,
86
leaving only the transmembrane and loop regions of the protein for further phylogenetic
87
analyses. The resulting alignment has been provided as a supplemental FASTA data file.
88
The aligned and trimmed dataset was used to reconstruct a maximum likelihood
89
phylogeny using Randomized Axelerated Maximum Likelihood (RAxML) v.7.2.7 with rapid
90
bootstrapping as implemented on the Cyberinfrastructure for Phylogenetic Research (CIPRES)
91
Portal v.3.1 (Miller et al. 2010; Stamatakis 2006; Stamatakis et al. 2008; Stamatakis et al. 2005).
92
Using the resulting phylogeny, character mapping of amino acid residues at particular counterion
93
sites was accomplished in Mesquite v2.72 (Maddison & Maddison 2010) using unordered
94
parsimony reconstruction.
95
96
References
97
Davies, W. L., Hankins, M. W., & Foster, R. G. 2010. Vertebrate ancient opsin and melanopsin:
98
divergent irradiance detectors. Photochemical & Photobiological Sciences 9, 1444-1457.
99
Fredriksson, R., Lagerstrom, M. C., Lundin, L. G. & Schioth, H. B. 2003. The G-protein-coupled
100
receptors in the human genome form five main families. Phylogenetic analysis, paralogon
101
groups, and fingerprints. Molecular Pharmacology 63, 1256-1272.
102
103
104
105
Katoh, K., Kuma, K., Miyata, T. & Toh, H. 2005a Improvement in the accuracy of multiple
sequence alignment program MAFFT. Genome Inform 16, 22-33.
Katoh, K., Kuma, K., Toh, H. & Miyata, T. 2005b MAFFT version 5: improvement in accuracy
of multiple sequence alignment. Nucleic Acids Res 33, 511-8.
106
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. 2002 MAFFT: a novel method for rapid multiple
107
sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059-66.
108
109
110
111
112
Maddison, W. P. & Maddison, D. R. 2010 Mesquite: a modular system for evolutionary
analysis. Version 2.72.
Miller, M. A., Holder, M. T., Vos, R., Midford, P. E., Liebowitz, T., Chan, L., Hoover, P. &
Warnow, T. The CIPRES Portals. In CIPRES, vol. 2010.
Plachetzki, D. C., Caitlin, R., & Oakley, T. H. 2010. The evolution of phototransduction from
113
an ancestral cyclic nucleotide gated pathway. Proceedings of the Royal Society B 277,
114
1963-1969.
115
Porter, M. L., Cronin, T. W., McClellan, D. A., & Crandall, K. A. 2007. Molecular
116
characterization of crustacean visual pigments and the evolution of pancrustacean
117
opsins. Molecular Biology and Evolution 24, 253-268.
118
119
120
121
122
123
124
125
126
127
128
Stamatakis, A. 2006 RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with
thousands of taxa and mixed models. Bioinformatics 22, 2688-90.
Stamatakis, A., Hoover, P. & Rougemont, J. 2008 A rapid bootstrap algorithm for the RAxML
Web servers. Syst Biol 57, 758-71.
Stamatakis, A., Ludwig, T. & Meier, H. 2005 RAxML-III: a fast program for maximum
likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456-63.
Suga, H., Schmid, V., & Gehring, W. J. 2008. Evolution and functional diversity of jellyfish
opsins. Current Biology 18, 51-55.
Download