Determining Evolutionary Relatedness Using Amino Acid and

advertisement
Determining Convergent Evolution with
Amino Acid Sequences of Hemoglobin and Prestin Genes
Instructor’s edition
Lauren Kennedy*, Jackie Doucette*, Trisha Buchanan*, and David Marcey, CLU Biology Department, 2012
* CLU Students in Biology 426, Molecular Biology
Objectives:
1. To understand the basic principles of convergent evolution.
2. To construct a cladogram based on hemoglobin protein sequences.
3. To construct a cladogram based on prestin protein amino acid sequences and to identify
signatures of convergent evolution at the molecular level.
Overview:
Convergent evolution (CE) is the independent evolution of similar biological traits in different evolutionary
lineages (clades). CE often results from similar selective pressures that drive the evolution of the specific trait in
distantly related species.
One of the more interesting cases of convergent evolution at the molecular level is the evolution of
echolocation in bats and dolphins and other cetaceans. The motor protein prestin endows sensitive and selective
hearing in mammals. The prestin amino-acid sequences of echolocating dolphins have converged to resemble
those of echolocating bats. As we know, bats and cetaceans are in completely different lineages. Based on
BLAST searches that are done comparing hemoglobin sequences, students will construct a phylogenetic
cladogram that places the hemoglobins of bats, dolphins, and other mammals in relative positions that reflect
cladograms built on other derived traits. However, when BLAST searches are done on the prestin amino acid
sequence, and a cladogram is constructed, students may deduce the prestin gene of bats and cetaceans shared
a “common ancestor” more recently than other mammals, a result that contradicts their previous conclusion.
Dolphins and porpoises share at least 14 derived amino acid sites in prestin with echolocating bats. This
example is one of the best showing convergent molecular evolution discovered to date because it is adaptive
and the rapid convergence indicates positive selection. One of these potential selective pressures is the
necessity to hear very high frequencies. These prestin studies identify probable selective pressures driving
molecular convergence and emphasize the necessity of using caution when employing single character traits to
derive evolutionary histories.
I. Retrieving HEMOGLOBIN Sequences from a Bioinformatics Database:
You will first begin by utilizing the data in this table. All protein sequences are given a unique identifier
in the database known as an “Accession Number”. You will first obtain the Accession Number and protein
sequence of the species listed in the following table:
Table 1: Accession Numbers
Organism
Horseshoe Bat
Sperm Whale
Bottlenose Dolphin
Cows
Mole-rat
Humans
Dogs
Scientific Name
Rhinolophus ferrumequinum
Physeter catodon
Tursiops truncatus
Bos taurus
Heterocephalus glaber
Homo sapien
Canis lupus
Accession
Number
ACC62118.1
P09904.1
P18990.1
NP_001070890.2
EHB14142.1
NP_000549.1
NP_001257814
1. Go to the NCBI: http://www.ncbi.nlm.nih.gov/guide/. NCBI stands for the National Center for
Biotechnology Information.
2. You will be looking for the hemoglobin sequence of all 7 species. You will do the following instructions one at a
time, repeating them 7 times, individually for each species.
3. Enter your query (search terms) in the following format. Using the Horseshoe bat as an example, use the
scientific name:
hemoglobin AND Rhinolophus ferrumequinum
AND is called a boolean term and it makes the search contain all of the words. In your search results note that
each entry has an Accession number, a unique identifier that specifies the databank and number of the entry in
that databank.
4. For each species, once you locate a search result in which the Accession number enter that number in Table 1.
Then click on the FASTA link below the Accession number. This will open a page that provides the amino
acid sequence of the alpha hemoglobin protein from a particular species in a standard format, called FASTA. In
bioinformatics, FASTA (“fast-all”) is a text-based format for representing either nucleotide sequences or protein
sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows
for sequence names and comments to precede the sequences, designated with a “>”. For each species, copy the
chosen FASTA sequence using your copy function and paste the function in NOTEPAD (not WORD!) or a
similar text-only word processing document, e.g.:
>gi|183396443|gb|ACC62118.1| hemoglobin subunit alpha (predicted) [Rhinolophus ferrumequinum]
MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQIKGHGKKVG
DALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHNPGEFTPAIHASLDKFL
ASVSTVLTSKYR
In the above examples, note that the bold type represents the actual amino acid sequence, which should be on a
different line than the header. For safety, remember to save your file after each sequence is pasted. It is suggested
that you use the Horseshoe bat hemoglobin sequence as your first entered sequence. Our goal in this experiment
is to determine which species’ hemoglobin amino acid sequences are most closely related to that of the horseshoe
bat, and make a cladogram based on this information.
To make your analysis easier, replace the information in the header line with the common name of each species.
Using the above examples:
>Horseshoe bat
MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQIKGHGKKVGDAL
TKA
VGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHNPGEFTPAIHASLDKFLASVSTVLTSK
YR
II. Using ClustalW2 to compare HEMOGLOBIN protein sequences
1. Go to the following European site which provides many bioinformatics tools for protein and DNA alignment
analysis. EBI stands for the European Bioinformatics Institute.
http://www.ebi.ac.uk/
2. Click on Tools in the black bar.
3. In the drop down menu, click on Sequence Analysis.
4. On the side menu, click on ClustalW2. ClustalW2 is alignment tool. In the drop-down menu above the field
where you will enter sequences, choose the “Protein” option.
5. You may choose to do one or both of the following methods of data entry for subsequent
analysis.
1) Select all 7 protein sequences at once and paste them into the ClustalW2 sequence field.
2) Select the horseshoe bat hemoglobin sequence and one of the other species’ sequence and
paste them into the ClustalW2 sequence field.
.
6. Click run. The alignment can take a few minutes.
7. Scroll down to the protein sequence alignments.
8. Your alignment will look like this (except that you will have either 7 sequences aligned or two
sequences aligned, depending on which method you chose in step 5):
Horseshoe
MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHF-DLSHGSAQIKG- 58
Sperm
VHLTGEEKSGLTALWAKV--NVEEIGGEALGRLLVVYPWTQRFFEHFGDLSTADAVMKNP 58
: *: .:**.:.* * ** *. * *.*** *::: :* *: :* ** *** ..* :*.
Horseshoe
Sperm
Horseshoe
Sperm
9.
----HGKKVGDALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHN 114
KVKKHGQKVLASFGEGLKHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVVVLARHF 118
**:** :: :.: :*:* *::::**:**..**:*** **:**.: *:*.** *
PGEFTPAIHASLDKFLASVSTVLTSKYR 142
GKEFTPELQTAYQKVVAGVANALAHKYH 146
**** :::: :*.:*.*:..*: **:
“Show Colors”. Sample Result:
Horseshoe
MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHF-DLSHGSAQIKG- 58
Sperm
VHLTGEEKSGLTALWAKV--NVEEIGGEALGRLLVVYPWTQRFFEHFGDLSTADAVMKNP 58
: *: .:**.:.* * ** *. * *.*** *::: :* *: :* ** *** ..* :*.
Horseshoe
Sperm
Horseshoe
Sperm
----HGKKVGDALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHN 114
KVKKHGQKVLASFGEGLKHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVVVLARHF 118
**:** :: :.: :*:* *::::**:**..**:*** **:**.: *:*.** *
PGEFTPAIHASLDKFLASVSTVLTSKYR 142
GKEFTPELQTAYQKVVAGVANALAHKYH 146
**** :::: :*.:*.*:..*: **:
From the ClustalW2 Help section, this is what the colors mean:
RED
Small (small+ hydrophobic (incl.aromatic -Y))
DE
BLUE
Acidic
RK
MAGENTA
Basic
STYHCNGQ
GREEN
Hydroxyl + Amine + Basic - Q
Others
Gray
CONSENSUS SYMBOLS:
An alignment will display by default the following symbols denoting the degree of conservation observed in
each column:
An "*" (asterix) means that the residues or nucleotides in that column are identical in all sequences in the
alignment.
A ":" (colon) means that conserved substitutions have been observed, according to the COLOUR table above.
Conserved mean that the change is with an amino acid which is chemically similar to another in the same
classification.
A "." (period) means that semi-conserved substitutions are observed.
A “- “(dash) indicates a missing amino acid.
10. You may choose to save your results, by printing your alignments for your lab notebook. For examples, either
use a screen capture tool or use the “Print Screen” key and paste it into a word processing document.
We are interested in those areas which have a “.” or “:” or “ “ (space) at the bottom or a “-“ in the sequence. They
represent the places where differences in amino acid identity in a sequence have evolved. Subtle differences in
the species have accumulated due to mutations of the DNA sequence. For convenience, we will note changes in
protein sequence relative to the HORSESHOE BAT prestin sequence. By examining the alignments you have
achieved, fill in the table on the next page by comparing each organism’ sequence to the horseshoe bat sequence
at the top. Determine the number of amino acids that do not exactly match the horseshoe bat sequence.
Table 2: Similarities and differences in the amino acid sequences of hemoglobin and percentage of conserved
amino acids of select organisms compared to the Horseshoe bat.
B.
A.
Number
Percentage of
Number
of amino
Identical
of amino
acids
Amino Acids
acids
not
(Column A /
matching
matching
Comparative
Horseshoe
Horseshoe
total amino
Organism
Bat
Bat
acids * 100%)
Horseshoe
Sperm
Bat vs.
Whale
54
88
38.03
Bottlenose
Dolphin
55
87
38.73
Cow
121
21
85.21
Mole-rat
14
128
9.86
Human
124
18
87.32
Dog
116
26
81.69
12. Complete Table 2 by using the numbers gathered above and then calculating the percent conservation, of the
percentage of amino acids that are the same and fall in the same position on the sequence for each organism
compared to horseshoe bat.
Examine the table above. From the data gathered in this alignment and before proceeding to the next page, can
you determine which species are most likely related? Which organisms are not? Explain your reasoning.
MAKE A CLADOGRAM.
MAKING A CLADOGRAM FOR PHYLOGENY OF COMPARED SPECIES
From the ClustalW2 Help Section:
“A Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are
proportional to the amount of inferred evolutionary change.
A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of
equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time"
separating taxa.”
1. Under CLUSTALW Results, Click on “Guide Tree” and “Show as Cladogram Tree”
III. Retrieving PRESTIN Sequences from a Bioinformatics Database:
You will first begin by utilizing the data in this table. All protein sequences are given a unique identifier
in the database known as an “Accession Number”. You will first obtain the Accession Number and protein
sequence of the species listed in the following table:
Table 3: Accession Numbers
Organism
Horseshoe bat
Sperm Whale
Bottlenose Dolphin
Humpback Whales
Cows
Mole-rat
Humans
Dogs
Scientific Name
Rhinolophus ferrumequinum
Physter catodon
Tursiops truncatus
Megaptera novaeangliae
Bos taurus
Heterocephalus glaber
Homo sapien
Canis lupus
Accession
Number
ACI02071.1
ADE75013.1
ADI59756.1
ADE75011.1
NP_001179807.1
EHB08960.1
AAP31417.1
XP_540393.2
1. Go to the NCBI: http://www.ncbi.nlm.nih.gov/guide/. NCBI stands for the National Center for Biotechnology
Information.
2. You will be looking for the prestin sequence of all 7 species. You will do the following
time, repeating them 7 times, individually for each species.
instructions one at a
3. Enter your query (search terms) in the following format. Using the Horseshoe bat as an example, use the
scientific name:
prestin AND Rhinolophus ferrumequinum
AND is called a boolean term and it makes the search contain all of the words. In your search results note that
each entry has an Accession number, a unique identifier that specifies the databank and number of the entry in
that databank. Look for 741-744 aa.
4. For each species, once you locate a search result in which the Accession number enter that number in Table 1.
Then click on the FASTA link below the Accession number. This will open a page that provides the amino
acid sequence of the prestin protein from a particular species in a standard format, called FASTA. In
bioinformatics, FASTA (“fast-all”) is a text-based format for representing either nucleotide sequences or protein
sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows
for sequence names and comments to precede the sequences, designated with a “>”. For each species, copy the
chosen FASTA sequence using your copy function and paste the function in NOTEPAD (not WORD!) or a
similar text-only word processing document, e.g.:
>gi|205277608|gb|ACI02071.1| prestin [Rhinolophus ferrumequinum]
MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNIIYMFLPITKW
LPAYNFKEYVLGDLVSGISTGVLQLPQGLAFAMLAAVPPVFGLYSSFYPVIMYCFFGTSRHISIGPFAVI
SLMIGGVAVRLVPDDIAVPGGVNATNGTEFRDALRVKVAMSVTLLAGIIQFCLGVCRFGFVAIYLTEPLV
RGFTTAAAVHVFTSMLKYLFGVKTKRYSGIFSVVYSTVAVLQNVKNLNVCSLGVGLMVFGLLLGGKEFNE
RFKEKLPAPIPLEFFAVVMGTGISAGFGLHESYNVDVVGTLPLGLLPPANPDTSLFHLVYVDAIAIAIVG
FSVTISMAKTLANKHGYQVDGNQELIALGLCNSTGSLFQTFAISCSLSRSLVQEGIGGKTQLAGCLASLM
ILMVILATGFLFESLPQAVLSAIVIVNLKGMFMQFSDLPFFWRTSKIELVIWLSTFVSSLFLGLDYGLIT
AVIIALMTVIYRTQSPTYTVLGQLPDTDVYIDIDAYEEVKEIPGIKIFQINAPIYYANSDLYSNALKRKT
GVNPSFILGARRKAMKKYAKEGGNINIANATDVKADAEVDAEDGTKPEEEEDEVKYPPVVIKSTFPEELQ
RFMPPLENVHTIILDFTQVNFIDSVGVKTLQGIVKEYGDVGIYVYLAGCSAQVVSDLTRNRFFENPALLD
LLFHSIHDAVLGSLVREALEEKEAAAATPQEDSEPNATPDV
In the above example, note that the bold type represents the actual amino acid sequence, which should be on a
different line than the header. For safety, remember to save your file after each sequence is pasted. It is suggested
that you use the Horseshoe bat prestin sequence as your first entered sequence. Our goal in this experiment is to
determine which species’ prestin amino acid sequences are most closely related to that of the horseshoe bat.
5.
To make your analysis easier, replace the information in the header line with the common name of each
species. Using the above examples:
>Horseshoe Bat
MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNIIYMFLPITKW
LPAYNFKEYVLGDLVSGISTGVLQLPQGLAFAMLAAVPPVFGLYSSFYPVIMYCFFGTSRHISIGPFAVI
SLMIGGVAVRLVPDDIAVPGGVNATNGTEFRDALRVKVAMSVTLLAGIIQFCLGVCRFGFVAIYLTEPL
V
RGFTTAAAVHVFTSMLKYLFGVKTKRYSGIFSVVYSTVAVLQNVKNLNVCSLGVGLMVFGLLLGGKEF
NE
RFKEKLPAPIPLEFFAVVMGTGISAGFGLHESYNVDVVGTLPLGLLPPANPDTSLFHLVYVDAIAIAIVG
FSVTISMAKTLANKHGYQVDGNQELIALGLCNSTGSLFQTFAISCSLSRSLVQEGIGGKTQLAGCLASLM
ILMVILATGFLFESLPQAVLSAIVIVNLKGMFMQFSDLPFFWRTSKIELVIWLSTFVSSLFLGLDYGLIT
AVIIALMTVIYRTQSPTYTVLGQLPDTDVYIDIDAYEEVKEIPGIKIFQINAPIYYANSDLYSNALKRKT
GVNPSFILGARRKAMKKYAKEGGNINIANATDVKADAEVDAEDGTKPEEEEDEVKYPPVVIKSTFPEEL
Q
RFMPPLENVHTIILDFTQVNFIDSVGVKTLQGIVKEYGDVGIYVYLAGCSAQVVSDLTRNRFFENPALLD
LLFHSIHDAVLGSLVREALEEKEAAAATPQEDSEPNATPDV
IV. Using ClustalW2 to compare protein sequences
1. Go to the following European site which provides many bioinformatics tools for protein and DNA alignment
analysis. EBI stands for the European Bioinformatics Institute.
http://www.ebi.ac.uk/
2. Click on Tools in the black bar.
3. In the drop down menu, click on Sequence Analysis.
4. On the side menu, click on ClustalW2. ClustalW2 is alignment tool. In the drop-down menu above the field
where you will enter sequences, choose the “Protein” option.
5. You may choose to do one or both of the following methods of data entry for subsequent
analysis.
1) Select all 7 protein sequences at once and paste them into the ClustalW2 sequence field.
2) Select the horseshoe bat prestin sequence and one of the other species’ sequences and paste them into
the ClustalW2 sequence field.
6. Click run. The alignment can take a few minutes depending on how many people all over the world are doing
alignments at that time.
7. Scroll down to the protein sequence alignments.
8. Your alignment will look like this (except that you will have either 7 sequences aligned or two
aligned, depending on which method you chose in step 5):
sequences
Horseshoe
MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNI 60
Bottlenose
MDHVEETEILAATQRYYVERPIFSHPVLQERLHKKDKISESIGDKLKQAFTCTPKKIRNI 60
***.*********:*********** *************:********************
9. Click “Show Colors”. Sample Result:
Horseshoe
MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNI 60
Bottlenose
MDHVEETEILAATQRYYVERPIFSHPVLQERLHKKDKISESIGDKLKQAFTCTPKKIRNI 60
***.*********:*********** *************:********************
From the ClustalW2 Help section, this is what the colors mean:
RED
Small (small+ hydrophobic (incl.aromatic -Y))
DE
BLUE
Acidic
RK
MAGENTA
Basic
STYHCNGQ
GREEN
Hydroxyl + Amine + Basic - Q
Others
Gray
CONSENSUS SYMBOLS:
An alignment will display by default the following symbols denoting the degree of conservation observed in
each column:
An "*" (asterix) means that the residues or nucleotides in that column are identical in all sequences in the
alignment.
A ":" (colon) means that conserved substitutions have been observed, according to the COLOUR table above.
Conserved mean that the change is with an amino acid which is chemically similar to another in the same
classification.
A "." (period) means that semi-conserved substitutions are observed.
A “- “(dash) indicates a missing amino acid.
10.
You may choose to save your results, by printing your alignments for your lab notebook. For examples, either
use a screen capture tool or use the “Print Screen” key and paste it into a word processing document.
11.
We are interested in those areas which have a “.” or “:” or “ “ (space) at the bottom or a “-“ in the sequence.
They represent the places where differences in amino acid identity in a sequence have evolved. Subtle
differences in the species have accumulated due to mutations of the DNA sequence. For convenience, we will
note changes in protein sequence relative to the HORSESHOE BAT prestin sequence. By examining the
alignments you have achieved, fill in the table on the next page by comparing each organism’ sequence to the
horseshoe bat sequence at the top. Determine the number of amino acids that do not exactly match the
horseshoe bat sequence.
Table 4: Similarities and differences in the amino acid sequences of prestin and percentage of conserved amino
acids of select organisms compared to the Horseshoe bat.
B.
A.
Number
Percentage of
Number of
of amino
Identical Amino
amino
acids
Acids
acids
not
(Column A /
matching
matching
Comparative
Horseshoe
Horseshoe
total amino
Organism
bat
bat
acids * 100%)
Horseshoe
Sperm
bat vs.
Whale
695
46
93.8%
Bottlenose
Dolphin
699
42
94.3%
Humpback
whale
694
47
93.7%
Cow
691
50
93.3%
Mole-rat
665
76
89.7%
Human
680
61
91.8%
Dog
685
56
92.4%
12.
Complete Table 2 by using the numbers gathered above and then calculating the percent conservation,
of the percentage of amino acids that are the same and fall in the same position on the sequence for each
organism compared to horseshoe bat.
Examine the table above. From the data gathered in this alignment and before proceeding to the next page, can
you determine which species most likely convergently evolved with the horseshoe bat? Which organism clearly
did not? Explain your reasoning.
MAKING A CLADOGRAM FOR CONVERGENT EVOLUTIOM OF SPECIES
From the ClustalW2 Help Section:
“A Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are
proportional to the amount of inferred evolutionary change.
A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of
equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time"
separating taxa.”
Under CLUSTALW Results, Click on “Guide Tree” and “Show as Cladogram Tree”
Conclusion:
Convergent Evolution is clearly shown with this example of prestin evolution in both bats and cetaceans. The cladograms show
how comparison of hemoglobin sequences, present in all mammals, yields a cladogram consistent with known evolutionary phylogenies.
However, with prestin sequences, molecular homoplasy is revealed because comparison of these yields a cladogram inconsistent with a
broad empirical fundament. See: http://www.umich.edu/~zhanglab/publications/2010/Comment_Li_2010_CurrBiol.pdf
Download