Determining Convergent Evolution with Amino Acid Sequences of Hemoglobin and Prestin Genes Instructor’s edition Lauren Kennedy*, Jackie Doucette*, Trisha Buchanan*, and David Marcey, CLU Biology Department, 2012 * CLU Students in Biology 426, Molecular Biology Objectives: 1. To understand the basic principles of convergent evolution. 2. To construct a cladogram based on hemoglobin protein sequences. 3. To construct a cladogram based on prestin protein amino acid sequences and to identify signatures of convergent evolution at the molecular level. Overview: Convergent evolution (CE) is the independent evolution of similar biological traits in different evolutionary lineages (clades). CE often results from similar selective pressures that drive the evolution of the specific trait in distantly related species. One of the more interesting cases of convergent evolution at the molecular level is the evolution of echolocation in bats and dolphins and other cetaceans. The motor protein prestin endows sensitive and selective hearing in mammals. The prestin amino-acid sequences of echolocating dolphins have converged to resemble those of echolocating bats. As we know, bats and cetaceans are in completely different lineages. Based on BLAST searches that are done comparing hemoglobin sequences, students will construct a phylogenetic cladogram that places the hemoglobins of bats, dolphins, and other mammals in relative positions that reflect cladograms built on other derived traits. However, when BLAST searches are done on the prestin amino acid sequence, and a cladogram is constructed, students may deduce the prestin gene of bats and cetaceans shared a “common ancestor” more recently than other mammals, a result that contradicts their previous conclusion. Dolphins and porpoises share at least 14 derived amino acid sites in prestin with echolocating bats. This example is one of the best showing convergent molecular evolution discovered to date because it is adaptive and the rapid convergence indicates positive selection. One of these potential selective pressures is the necessity to hear very high frequencies. These prestin studies identify probable selective pressures driving molecular convergence and emphasize the necessity of using caution when employing single character traits to derive evolutionary histories. I. Retrieving HEMOGLOBIN Sequences from a Bioinformatics Database: You will first begin by utilizing the data in this table. All protein sequences are given a unique identifier in the database known as an “Accession Number”. You will first obtain the Accession Number and protein sequence of the species listed in the following table: Table 1: Accession Numbers Organism Horseshoe Bat Sperm Whale Bottlenose Dolphin Cows Mole-rat Humans Dogs Scientific Name Rhinolophus ferrumequinum Physeter catodon Tursiops truncatus Bos taurus Heterocephalus glaber Homo sapien Canis lupus Accession Number ACC62118.1 P09904.1 P18990.1 NP_001070890.2 EHB14142.1 NP_000549.1 NP_001257814 1. Go to the NCBI: http://www.ncbi.nlm.nih.gov/guide/. NCBI stands for the National Center for Biotechnology Information. 2. You will be looking for the hemoglobin sequence of all 7 species. You will do the following instructions one at a time, repeating them 7 times, individually for each species. 3. Enter your query (search terms) in the following format. Using the Horseshoe bat as an example, use the scientific name: hemoglobin AND Rhinolophus ferrumequinum AND is called a boolean term and it makes the search contain all of the words. In your search results note that each entry has an Accession number, a unique identifier that specifies the databank and number of the entry in that databank. 4. For each species, once you locate a search result in which the Accession number enter that number in Table 1. Then click on the FASTA link below the Accession number. This will open a page that provides the amino acid sequence of the alpha hemoglobin protein from a particular species in a standard format, called FASTA. In bioinformatics, FASTA (“fast-all”) is a text-based format for representing either nucleotide sequences or protein sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences, designated with a “>”. For each species, copy the chosen FASTA sequence using your copy function and paste the function in NOTEPAD (not WORD!) or a similar text-only word processing document, e.g.: >gi|183396443|gb|ACC62118.1| hemoglobin subunit alpha (predicted) [Rhinolophus ferrumequinum] MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQIKGHGKKVG DALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHNPGEFTPAIHASLDKFL ASVSTVLTSKYR In the above examples, note that the bold type represents the actual amino acid sequence, which should be on a different line than the header. For safety, remember to save your file after each sequence is pasted. It is suggested that you use the Horseshoe bat hemoglobin sequence as your first entered sequence. Our goal in this experiment is to determine which species’ hemoglobin amino acid sequences are most closely related to that of the horseshoe bat, and make a cladogram based on this information. To make your analysis easier, replace the information in the header line with the common name of each species. Using the above examples: >Horseshoe bat MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQIKGHGKKVGDAL TKA VGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHNPGEFTPAIHASLDKFLASVSTVLTSK YR II. Using ClustalW2 to compare HEMOGLOBIN protein sequences 1. Go to the following European site which provides many bioinformatics tools for protein and DNA alignment analysis. EBI stands for the European Bioinformatics Institute. http://www.ebi.ac.uk/ 2. Click on Tools in the black bar. 3. In the drop down menu, click on Sequence Analysis. 4. On the side menu, click on ClustalW2. ClustalW2 is alignment tool. In the drop-down menu above the field where you will enter sequences, choose the “Protein” option. 5. You may choose to do one or both of the following methods of data entry for subsequent analysis. 1) Select all 7 protein sequences at once and paste them into the ClustalW2 sequence field. 2) Select the horseshoe bat hemoglobin sequence and one of the other species’ sequence and paste them into the ClustalW2 sequence field. . 6. Click run. The alignment can take a few minutes. 7. Scroll down to the protein sequence alignments. 8. Your alignment will look like this (except that you will have either 7 sequences aligned or two sequences aligned, depending on which method you chose in step 5): Horseshoe MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHF-DLSHGSAQIKG- 58 Sperm VHLTGEEKSGLTALWAKV--NVEEIGGEALGRLLVVYPWTQRFFEHFGDLSTADAVMKNP 58 : *: .:**.:.* * ** *. * *.*** *::: :* *: :* ** *** ..* :*. Horseshoe Sperm Horseshoe Sperm 9. ----HGKKVGDALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHN 114 KVKKHGQKVLASFGEGLKHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVVVLARHF 118 **:** :: :.: :*:* *::::**:**..**:*** **:**.: *:*.** * PGEFTPAIHASLDKFLASVSTVLTSKYR 142 GKEFTPELQTAYQKVVAGVANALAHKYH 146 **** :::: :*.:*.*:..*: **: “Show Colors”. Sample Result: Horseshoe MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHF-DLSHGSAQIKG- 58 Sperm VHLTGEEKSGLTALWAKV--NVEEIGGEALGRLLVVYPWTQRFFEHFGDLSTADAVMKNP 58 : *: .:**.:.* * ** *. * *.*** *::: :* *: :* ** *** ..* :*. Horseshoe Sperm Horseshoe Sperm ----HGKKVGDALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHN 114 KVKKHGQKVLASFGEGLKHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVVVLARHF 118 **:** :: :.: :*:* *::::**:**..**:*** **:**.: *:*.** * PGEFTPAIHASLDKFLASVSTVLTSKYR 142 GKEFTPELQTAYQKVVAGVANALAHKYH 146 **** :::: :*.:*.*:..*: **: From the ClustalW2 Help section, this is what the colors mean: RED Small (small+ hydrophobic (incl.aromatic -Y)) DE BLUE Acidic RK MAGENTA Basic STYHCNGQ GREEN Hydroxyl + Amine + Basic - Q Others Gray CONSENSUS SYMBOLS: An alignment will display by default the following symbols denoting the degree of conservation observed in each column: An "*" (asterix) means that the residues or nucleotides in that column are identical in all sequences in the alignment. A ":" (colon) means that conserved substitutions have been observed, according to the COLOUR table above. Conserved mean that the change is with an amino acid which is chemically similar to another in the same classification. A "." (period) means that semi-conserved substitutions are observed. A “- “(dash) indicates a missing amino acid. 10. You may choose to save your results, by printing your alignments for your lab notebook. For examples, either use a screen capture tool or use the “Print Screen” key and paste it into a word processing document. We are interested in those areas which have a “.” or “:” or “ “ (space) at the bottom or a “-“ in the sequence. They represent the places where differences in amino acid identity in a sequence have evolved. Subtle differences in the species have accumulated due to mutations of the DNA sequence. For convenience, we will note changes in protein sequence relative to the HORSESHOE BAT prestin sequence. By examining the alignments you have achieved, fill in the table on the next page by comparing each organism’ sequence to the horseshoe bat sequence at the top. Determine the number of amino acids that do not exactly match the horseshoe bat sequence. Table 2: Similarities and differences in the amino acid sequences of hemoglobin and percentage of conserved amino acids of select organisms compared to the Horseshoe bat. B. A. Number Percentage of Number of amino Identical of amino acids Amino Acids acids not (Column A / matching matching Comparative Horseshoe Horseshoe total amino Organism Bat Bat acids * 100%) Horseshoe Sperm Bat vs. Whale 54 88 38.03 Bottlenose Dolphin 55 87 38.73 Cow 121 21 85.21 Mole-rat 14 128 9.86 Human 124 18 87.32 Dog 116 26 81.69 12. Complete Table 2 by using the numbers gathered above and then calculating the percent conservation, of the percentage of amino acids that are the same and fall in the same position on the sequence for each organism compared to horseshoe bat. Examine the table above. From the data gathered in this alignment and before proceeding to the next page, can you determine which species are most likely related? Which organisms are not? Explain your reasoning. MAKE A CLADOGRAM. MAKING A CLADOGRAM FOR PHYLOGENY OF COMPARED SPECIES From the ClustalW2 Help Section: “A Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change. A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa.” 1. Under CLUSTALW Results, Click on “Guide Tree” and “Show as Cladogram Tree” III. Retrieving PRESTIN Sequences from a Bioinformatics Database: You will first begin by utilizing the data in this table. All protein sequences are given a unique identifier in the database known as an “Accession Number”. You will first obtain the Accession Number and protein sequence of the species listed in the following table: Table 3: Accession Numbers Organism Horseshoe bat Sperm Whale Bottlenose Dolphin Humpback Whales Cows Mole-rat Humans Dogs Scientific Name Rhinolophus ferrumequinum Physter catodon Tursiops truncatus Megaptera novaeangliae Bos taurus Heterocephalus glaber Homo sapien Canis lupus Accession Number ACI02071.1 ADE75013.1 ADI59756.1 ADE75011.1 NP_001179807.1 EHB08960.1 AAP31417.1 XP_540393.2 1. Go to the NCBI: http://www.ncbi.nlm.nih.gov/guide/. NCBI stands for the National Center for Biotechnology Information. 2. You will be looking for the prestin sequence of all 7 species. You will do the following time, repeating them 7 times, individually for each species. instructions one at a 3. Enter your query (search terms) in the following format. Using the Horseshoe bat as an example, use the scientific name: prestin AND Rhinolophus ferrumequinum AND is called a boolean term and it makes the search contain all of the words. In your search results note that each entry has an Accession number, a unique identifier that specifies the databank and number of the entry in that databank. Look for 741-744 aa. 4. For each species, once you locate a search result in which the Accession number enter that number in Table 1. Then click on the FASTA link below the Accession number. This will open a page that provides the amino acid sequence of the prestin protein from a particular species in a standard format, called FASTA. In bioinformatics, FASTA (“fast-all”) is a text-based format for representing either nucleotide sequences or protein sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences, designated with a “>”. For each species, copy the chosen FASTA sequence using your copy function and paste the function in NOTEPAD (not WORD!) or a similar text-only word processing document, e.g.: >gi|205277608|gb|ACI02071.1| prestin [Rhinolophus ferrumequinum] MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNIIYMFLPITKW LPAYNFKEYVLGDLVSGISTGVLQLPQGLAFAMLAAVPPVFGLYSSFYPVIMYCFFGTSRHISIGPFAVI SLMIGGVAVRLVPDDIAVPGGVNATNGTEFRDALRVKVAMSVTLLAGIIQFCLGVCRFGFVAIYLTEPLV RGFTTAAAVHVFTSMLKYLFGVKTKRYSGIFSVVYSTVAVLQNVKNLNVCSLGVGLMVFGLLLGGKEFNE RFKEKLPAPIPLEFFAVVMGTGISAGFGLHESYNVDVVGTLPLGLLPPANPDTSLFHLVYVDAIAIAIVG FSVTISMAKTLANKHGYQVDGNQELIALGLCNSTGSLFQTFAISCSLSRSLVQEGIGGKTQLAGCLASLM ILMVILATGFLFESLPQAVLSAIVIVNLKGMFMQFSDLPFFWRTSKIELVIWLSTFVSSLFLGLDYGLIT AVIIALMTVIYRTQSPTYTVLGQLPDTDVYIDIDAYEEVKEIPGIKIFQINAPIYYANSDLYSNALKRKT GVNPSFILGARRKAMKKYAKEGGNINIANATDVKADAEVDAEDGTKPEEEEDEVKYPPVVIKSTFPEELQ RFMPPLENVHTIILDFTQVNFIDSVGVKTLQGIVKEYGDVGIYVYLAGCSAQVVSDLTRNRFFENPALLD LLFHSIHDAVLGSLVREALEEKEAAAATPQEDSEPNATPDV In the above example, note that the bold type represents the actual amino acid sequence, which should be on a different line than the header. For safety, remember to save your file after each sequence is pasted. It is suggested that you use the Horseshoe bat prestin sequence as your first entered sequence. Our goal in this experiment is to determine which species’ prestin amino acid sequences are most closely related to that of the horseshoe bat. 5. To make your analysis easier, replace the information in the header line with the common name of each species. Using the above examples: >Horseshoe Bat MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNIIYMFLPITKW LPAYNFKEYVLGDLVSGISTGVLQLPQGLAFAMLAAVPPVFGLYSSFYPVIMYCFFGTSRHISIGPFAVI SLMIGGVAVRLVPDDIAVPGGVNATNGTEFRDALRVKVAMSVTLLAGIIQFCLGVCRFGFVAIYLTEPL V RGFTTAAAVHVFTSMLKYLFGVKTKRYSGIFSVVYSTVAVLQNVKNLNVCSLGVGLMVFGLLLGGKEF NE RFKEKLPAPIPLEFFAVVMGTGISAGFGLHESYNVDVVGTLPLGLLPPANPDTSLFHLVYVDAIAIAIVG FSVTISMAKTLANKHGYQVDGNQELIALGLCNSTGSLFQTFAISCSLSRSLVQEGIGGKTQLAGCLASLM ILMVILATGFLFESLPQAVLSAIVIVNLKGMFMQFSDLPFFWRTSKIELVIWLSTFVSSLFLGLDYGLIT AVIIALMTVIYRTQSPTYTVLGQLPDTDVYIDIDAYEEVKEIPGIKIFQINAPIYYANSDLYSNALKRKT GVNPSFILGARRKAMKKYAKEGGNINIANATDVKADAEVDAEDGTKPEEEEDEVKYPPVVIKSTFPEEL Q RFMPPLENVHTIILDFTQVNFIDSVGVKTLQGIVKEYGDVGIYVYLAGCSAQVVSDLTRNRFFENPALLD LLFHSIHDAVLGSLVREALEEKEAAAATPQEDSEPNATPDV IV. Using ClustalW2 to compare protein sequences 1. Go to the following European site which provides many bioinformatics tools for protein and DNA alignment analysis. EBI stands for the European Bioinformatics Institute. http://www.ebi.ac.uk/ 2. Click on Tools in the black bar. 3. In the drop down menu, click on Sequence Analysis. 4. On the side menu, click on ClustalW2. ClustalW2 is alignment tool. In the drop-down menu above the field where you will enter sequences, choose the “Protein” option. 5. You may choose to do one or both of the following methods of data entry for subsequent analysis. 1) Select all 7 protein sequences at once and paste them into the ClustalW2 sequence field. 2) Select the horseshoe bat prestin sequence and one of the other species’ sequences and paste them into the ClustalW2 sequence field. 6. Click run. The alignment can take a few minutes depending on how many people all over the world are doing alignments at that time. 7. Scroll down to the protein sequence alignments. 8. Your alignment will look like this (except that you will have either 7 sequences aligned or two aligned, depending on which method you chose in step 5): sequences Horseshoe MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNI 60 Bottlenose MDHVEETEILAATQRYYVERPIFSHPVLQERLHKKDKISESIGDKLKQAFTCTPKKIRNI 60 ***.*********:*********** *************:******************** 9. Click “Show Colors”. Sample Result: Horseshoe MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNI 60 Bottlenose MDHVEETEILAATQRYYVERPIFSHPVLQERLHKKDKISESIGDKLKQAFTCTPKKIRNI 60 ***.*********:*********** *************:******************** From the ClustalW2 Help section, this is what the colors mean: RED Small (small+ hydrophobic (incl.aromatic -Y)) DE BLUE Acidic RK MAGENTA Basic STYHCNGQ GREEN Hydroxyl + Amine + Basic - Q Others Gray CONSENSUS SYMBOLS: An alignment will display by default the following symbols denoting the degree of conservation observed in each column: An "*" (asterix) means that the residues or nucleotides in that column are identical in all sequences in the alignment. A ":" (colon) means that conserved substitutions have been observed, according to the COLOUR table above. Conserved mean that the change is with an amino acid which is chemically similar to another in the same classification. A "." (period) means that semi-conserved substitutions are observed. A “- “(dash) indicates a missing amino acid. 10. You may choose to save your results, by printing your alignments for your lab notebook. For examples, either use a screen capture tool or use the “Print Screen” key and paste it into a word processing document. 11. We are interested in those areas which have a “.” or “:” or “ “ (space) at the bottom or a “-“ in the sequence. They represent the places where differences in amino acid identity in a sequence have evolved. Subtle differences in the species have accumulated due to mutations of the DNA sequence. For convenience, we will note changes in protein sequence relative to the HORSESHOE BAT prestin sequence. By examining the alignments you have achieved, fill in the table on the next page by comparing each organism’ sequence to the horseshoe bat sequence at the top. Determine the number of amino acids that do not exactly match the horseshoe bat sequence. Table 4: Similarities and differences in the amino acid sequences of prestin and percentage of conserved amino acids of select organisms compared to the Horseshoe bat. B. A. Number Percentage of Number of of amino Identical Amino amino acids Acids acids not (Column A / matching matching Comparative Horseshoe Horseshoe total amino Organism bat bat acids * 100%) Horseshoe Sperm bat vs. Whale 695 46 93.8% Bottlenose Dolphin 699 42 94.3% Humpback whale 694 47 93.7% Cow 691 50 93.3% Mole-rat 665 76 89.7% Human 680 61 91.8% Dog 685 56 92.4% 12. Complete Table 2 by using the numbers gathered above and then calculating the percent conservation, of the percentage of amino acids that are the same and fall in the same position on the sequence for each organism compared to horseshoe bat. Examine the table above. From the data gathered in this alignment and before proceeding to the next page, can you determine which species most likely convergently evolved with the horseshoe bat? Which organism clearly did not? Explain your reasoning. MAKING A CLADOGRAM FOR CONVERGENT EVOLUTIOM OF SPECIES From the ClustalW2 Help Section: “A Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change. A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa.” Under CLUSTALW Results, Click on “Guide Tree” and “Show as Cladogram Tree” Conclusion: Convergent Evolution is clearly shown with this example of prestin evolution in both bats and cetaceans. The cladograms show how comparison of hemoglobin sequences, present in all mammals, yields a cladogram consistent with known evolutionary phylogenies. However, with prestin sequences, molecular homoplasy is revealed because comparison of these yields a cladogram inconsistent with a broad empirical fundament. See: http://www.umich.edu/~zhanglab/publications/2010/Comment_Li_2010_CurrBiol.pdf