Investigating Evolutionary Questions Using Online Molecular Databases Adapted from M. Puterbaugh and NABT, from Puterbaugh, M.N. & Burleigh, J.G. (2001). How does an evolutionary biologist decide how closely related two different species are? The simplest way is to compare the physical features of the species (their “morphologies”). This method is very similar to comparing two people to determine how closely related they are. We generally expect that brothers and sisters will look more similar to each other than two cousins might. If you make a family tree, you find that brothers and sisters share a common parent, but you must look harder at the tree to find which ancestor the two cousins share. Cousins do not share the same parents; rather, they share some of the same grandparents. In other words, the common ancestor of two brothers is more recent (their parents) than the common ancestor of two cousins (their grandparents), and in an evolutionary sense, this is why we say that two brothers are more closely related than two cousins. Similarly, evolutionary biologists might compare salamanders and frogs and salamanders and fish. More physical features are shared between frogs and salamanders than between frogs and fish, and an evolutionary biologist might use this information to infer that frogs and salamanders had a more recent common ancestor than did frogs and fish. This methodology certainly has problems. Two very similar looking people are not necessarily related, and two species that have similar features also may not be closely related. Comparing morphology can also be difficult if it is hard to find sufficient morphological characteristics to compare. Imagine that you were responsible for determining which two of three salamander species were most closely related. What physical features would you compare? When you ran out of physical features, is there anything else you could compare? Many biologists turn next to comparing genes and proteins. Genes and proteins are not necessarily better than morphological features except in the sense that differences in morphology can be a result of environmental conditions rather than genetics, and differences in genes are definitely genetic. Also, there are sometimes more molecules to compare than physical features. In the following exercises, you will use data from a public protein database of gene products (proteins) to evaluate evolutionary relationships. You will work with the hemoglobin beta chain, although you could easily compare any of the amino-acid sequences of thousands of proteins coded for by thousands of genes from thousands of different organisms. Hemoglobin, the molecule that carries oxygen in our bloodstream, is composed of four subunits. In adult hemoglobin, two of these subunits are identical and coded for by the alpha hemoglobin gene. The other two are identical and coded for by the beta-hemoglobin genes. The hemoglobin genes are worthy of study themselves, but today we will just use the protein sequences as a set of traits to compare among species. PART I: “Are Bats Birds?” Calvin, from the “Calvin and Hobbes” cartoon series, failed his first school report "Bats are Bugs". He is trying to repair his report and has changed the title to "Bats are Birds". Help convince Calvin that just because bats have wings, they are not really birds, but in fact, bats share more features with mammals than with birds. http://mimosasonthefrontlawn.blogspot.com/2009/02/bats-arent-bugs.html Procedure Part I: 1. What morphological features do bats share with mammals? With birds? Fill in Table 1 by placing X's in the appropriate places. Use Wikipedia or http://www.ucmp.berkeley.edu/vertebrates/tetrapods/tetraintro.html as needed. Table 1 Features hair feathers mammary glands wings endothermy 4 chambered heart Birds Bats Mammals 2. How similar are the amino acid sequences of the hemoglobin molecules from different organisms? To find out, you will generate a distance matrix for the beta-hemoglobin chain for two bird species, two bat species, and two non-bat mammal species. Remember that the similarity in amino acid sequences is a direct reflection of the similarity of their DNA base sequence! Follow the steps below to do this: Step 1: Begin by going to the ExPASy server at http://ca.expasy.orgIn the top left pull down menu that says Query All Databases, select UniProtKB, (or go directly to http://www.uniprot.org/) Step 2: In the adjacent open box, type the phrase hemoglobin beta and click on “search.” The computer will retrieve many entries and display them. Step 3: Scroll through the names of the many entries to find one for either a bird, a bat, or some other mammal. When you find one, check to make sure that it is the hemoglobin beta chain (preferably without a number after it) and not the alpha or gamma or other hemoglobin subunit. If the sequence is for the beta chain and it is for an appropriate species, click on it and the computer will retrieve the sequence. Step 4: The next screen contains a lot of information. Copy the following information only for the first organism that you are studying: Gene name protein name Taxonomic classification of your organism sequence length (how many amino acids long?) function subunit structure of entire protein (is it made of more than one polypeptide) tissue specificity (in which cells are the genes expressed) - Secondary structure (color coded): Does it have alpha helices? beta pleated sheets (strands)? Turns due to H bonds? The protein sequence is near the bottom of the information sheet in the "Sequences"section. The amino acids are indicated with their single-letter symbols and every 10th amino acid is marked with its position. What are the first two amino acids that make up hemoglobin B? What are the last two amino acids that make up hemoglobin B? The amino acids are listed in a single letter code: G A L M F W K Q E S Glycine Alanine Leucine Methionine Phenylalanine Tryptophan Lysine Glutamine Glutamic Acid Serine Gly Ala Leu Met Phe Trp Lys Gln Glu Ser P V I C Y H R N D T Proline Valine Isoleucine Cysteine Tyrosine Histidine Arginine Asparagine Aspartic Acid Threonine Pro Val Ile Cys Tyr His Arg Asn Asp Thr amino acid #60 out of 147 Step 5: In the center, click on “FASTA format.” This will simply provide you with the condensed sequence for that species, along with the species identification. Here is a sample of that species information and its beta hemoglobin sequence: >sp|P02067|HBB_PIG Hemoglobin subunit beta OS=Sus scrofa GN=HBB PE=1 SV=3 MVHLSAEEKEAVLGLWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSNADAVMGNPK VKAHGKKVLQSFSDGLKHLDNLKGTFAKLSELHCDQLHVDPENFRLLGNVIVVVLARRLG HDFNPNVQAAFQKVVAGVANALAHKYH Step 6: Select and copy the information below Step 7: Repeat the above steps until you have the FASTA formatted sequences for six different species: two bird species, two bat species, and two non-bat mammal species. **The only information that you need for the other organisms is their FASTA formatted amino acid sequence, not the information in step 4! Bird1 - name: Bird2 - name: Bat1 - name: Bat2 - name: non-bat Mammal1 - name: non-bat Mammal2 - name: Step 8: To align the sequences and determine how similar they are, go to an internet alignment program, e.g., “LALIGN” at: http://fasta.bioch.virginia.edu/fasta/lalign.htm Step 9: Select, copy and paste one sequence into the first (query) sequence box and another sequence into the second sequence box. For simplicity’s sake, just copy the protein sequence and not any of the identification information. However, make sure you keep track of which two species' data you have entered. Step 10: Click on “Align Sequences.” (You may need to wait a short time initially for this to be done.) Step 11: The computer will return a set of information including the “percent identity in the 146 aa overlap”. Record that piece of information in the distance matrix table below. This value is essentially the percent of amino acids that are identical. If all the amino acids were the same, the percent identity would be 100%. LALIGN also shows you the actual alignment of the two sequences. Identical amino acids are marked with two dots between them (:). If there is one dot (.), the change in amino acid is conservative (both amino acids have similar properties and charge - this is included in the % similar value), and if there are no dots, then the two amino acids have different biochemical properties. Step 12: A distance matrix is a table that shows all the pairwise comparisons between species. Continue to make all pair-wise comparisons until the table is filled. For each comparison, use the percent identity for the overlap of all the 146 amino acids. Bat 1 Bat 2 Bird 1 Bird 2 Mammal 1 Mammal 2 Bat 1 100% Bat 2 Bird 1 Bird 2 Mammal 1 Mammal 2 100% 100% 100% 100% 100% Should bats be classified as birds or mammals? Use the distance matrix table to defend your answer. To help answer this further, produce a cladogram of your results at: http://www.phylogeny.fr/version2_cgi/simple_phylogeny.cgi USE STEPS 1-3 ONLY IF "ONE CLICK" MODE IS NOT WORKING 1. Select Phylogeny Analysis / Advanced 2. Select the last 2 options in the Workflow Settings (PhyML and TreeDyn) 3. Select Create Workflow 4. Upload your sequences in FASAT format by pasting them in the entry box as shown below. Click Submit at the bottom left of the page. SAMPLE >crocodile ASFDPHEKQLIGDLWHKVDVAHCGGEALSRMLIVYPWKRRYFENFGDISNAQAIMHNEKV QAHGKKVLASFGEAVCHLDGIRAHFANLSKLHCEKLHVDPENFKLLGDIIIIVLAAHYPK DFGLECHAAYQKLVRQVAAALAAEYH >alligator ASFDAHERKFIVDLWAKVDVAQCGADALSRMLIVYPWKRRYFEHFGKMCNAHDILHNSKV QEHGKKVLASFGEAVKHLDNIKGHFANLSKLHCEKFHVDPENFKLLGDIIIIVLAAHHPE DFSVECHAAFQKLVRQVAAALAAEYH >sparrow VQWTAEEKQLITGLWGKVNVAECGGEALARLLIVYPWTQRFFASFGNLSSPTAVLGNPKV QAHGKKVLTSFGEAVKNLDSIKNTFSQLSELHCDKLHVDPENFRLLGDILVVVLAAHFGK DFTPDCQAAWQKLVRVVAHALARKYH >condor VHWSAEEKQLITGLWGKVNVAECGAEALARLLIVYPWTQRFFASFGNLSSPTAIIGNPMV RAHGKKVLTSFGEAVKNLDNIKNTFAQLSELHCEKLHVDPENFRLLGDILIIVLAAHFAK DFTPDCQAAWQKLVRAVAHALARKYH >mallard VHWTAEEKQLITGLWGKVNVADCGAEALARLLIVYPWTQRFFASFGNLSSPTAILGNPMV RAHGKKVLTSFGDAVKNLDNIKNTFAQLSELHCDKLHVDPENFRLLGDILIIVLAAHFPK EFTPECQAAWQKLVRVVAHALARKYH >dove VHWSAEEKQLITSIWGKVNVADCGAEALARLLIVYPWTQRFFASFGNLSSATAISGNPNV KAHGKKVLTSFGDAVKNLDNIKGTFAQLSELHCNKLHVDPENFKLLGDILVIVLAAHFGK DFTPECQAAWQKLVRVVAHALARKYH Explanation: crocodiles are most closely related to alligators; sparows are most closely related to condors; doves are most closely related to non-bird reptiles. This is supported by the distance matrix %'s, since the crocodile had the highest % with the alligator and the smallest % identical to the sparrow. Copy, paste and explain your cladogram below. Be sure to indicate a clade, a common ancestor, 2 closely related organisms, the 2 least closely related. How does your cladogram support your distance matrix % identical hemoglobin amino acid sequences? Part II: “Reptiles With Feathers?” The vertebrate class Reptilia now includes Birds in order to be a correct "clade" and contain all of the species that arose from the most recent common ancestor to this group. Just how similar are non-bird reptiles and birds in terms of the betahemoglobin chain? You will evaluate this question in this exercise using a BLAST (Best Local Alignment Search Tool) search. Procedure Part II: Step 1: Begin by going to the ExPASy server at http://ca.expasy.org In the top left pull down menu that says Query All Databases, select UniProtKB, ( or go directly to http://www.uniprot.org/) Step 2: Find a beta-hemoglobin chain for any type of crocodile. You can do this by typing in hemoglobin beta crocodile. Step 3: A BLAST (Best Local Alignment Search Tool) search takes a particular sequence and then locates the most similar sequences in the entire database. A BLAST search will result in a list of sequences with the first sequence being most close to the one entered and the last sequences being least similar. The easiest way is for us to do a BLAST search is by clicking go next to the BLAST pull down on the right. It may take a few minutes for the results to appear. Step 4: On the screen that appears next, you will see the crocodile sequence pasted into the BLAST search field at the top and a list of sequences in order of similarity. Click on the first ten sequences to determine to what species they belong (look under Organism) or copy the scientific name into your browser search field and use Wikipedia to determine what type of organism you have. Step 5: List those species in the table below, beginning with the first most similar species that is not a crocodile. Similarity Scientific and common names 1st 2 3 4 5 6 7 8 9 10th So, how similar are birds to other reptiles, especially crocodiles? OK - it is your job to prove the findings in the article below, or some other evolutionary relationship that you might be interested in (dogs and wolves, house cats and lions, etc.) using the Bioinformatics tools that you have just used. Take screenshots of all relevant data! cladogram for PartII courtesy of Liu Volpe '13