Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Background How does an evolutionary biologist decide how closely related two different species are? The simplest way is to compare physical features of the species (as we did with the wild and domestic canines.) We generally expect that brothers and sisters will look more similar to each other than two cousins might. If you make a family tree, you find that brothers and sisters share a common parent, but you most look harder at the tree to find which ancestor the two cousins shore. Cousins do not share the same parents; rather, they share some of the same grandparents. In other words, the common ancestor of two brothers is more recent (their parents) than the common ancestor of two cousins (their grandparents), and in evolutionary sense, this is why we say that two brothers are more closely related than two cousins. Similarly, evolutionary biologists might compare salamanders and frogs and salamanders and fish. More physical features are shared between frogs and salamanders than between frogs and fish, and an evolutionary biologists might use this information to infer that frogs and salamanders had a more recent common ancestor than did frogs and fish. But, no process is without problems. Two very similar looking people are not necessarily related, and two species that have similar features also may not be closely related. Comparing morphology can also be difficult if it is hard to find sufficient morphological characteristics to compare. Remember the problems with the canids! Imagine that you were responsible for determining which two of three salamander species were most closely related. What physical features would you compare? When you ran out of physical features, is there anything else you could compare? Many biologists turn next to comparing genes and proteins. Genes and proteins are not necessarily better than morphological features except in the sense that differences in morphology can be a result of environmental conditions rather than genetics, and differences in genes are definitely genetic. Also, there are sometimes more molecules to compare than physical features. There are three exercises that follow in which you will use protein databases that are in the public domain. You will be able to investigate gene products (which are proteins) and evaluate evolutionary relationships. The protein you will work with is the hemoglobin beta chain. You will obtain your data from public online databases that contains the amino-acid sequences of proteins coded for by many different organisms. Hemoglobin, the molecule that carries oxygen in our bloodstream, is composed of four subunits. In adult hemoglobin, two of these subunits are identical and coded for by the alpha hemoglobin gene located on 16 chromosome. The other two are identical and coded for by the beta-hemoglobin gene found on the 11 chromosome. We will use the protein sequences of the beta hemoglobin as a set of traits to compare among species. Part I “Are Bats Birds?” 1 In Parts I and II, there are no hypotheses. Hence, you are not collecting data to determine the validity of your hypothesis. You are learning a new skill – the mining of molecular information available to all on the internet. In Part III, you will use your skills to test a hypothesis. Then, you will need to determine the Experimental Design Questions. We’ll discuss these in class. However, since Part III does have a hypothesis, you will collect data to determine the validity of your hypothesis. Procedure Part I To be completed as a team. Answer questions in spaces provided on Data Tables Table 1: Morphological comparison of birds, bats, and other mammals Feature Birds Bats Presence of hair Presence of feathers Presence of mammary glands Presence of wings Homeothermy Four-chambered hearts 1a. What morphological features do bats share with mammals? Other mammals 1b. Based on morphology, are bats more similar to birds or mammals? A distance matrix is a table that shows all the pairwise comparisons between species. Continue to make all pairwise comparisons until Table 3 is filled. For each comparison, use the percent identity for the overlap of all the 146 amino acids. Table 2 The distance matrix for Part I Bat 1 Bat 2 Bat 1 100% Bat 2 100% Bird 1 Bird 2 Mammal 1 Mammal 2 Shaded boxes are simply repeat data. Bird 1 Bird 2 Mammal 1 Mammal 2 100% 100% 100% 100% Use your internet access to complete Part I 2. Generate a distance matrix for the beta-hemoglobin chain for two birds species, two bat species\, and two non-bat mammal species into a word processing worksheet. Follow the steps below to do this. Step 1: Begin by going to www.uniprot.org Step 2. In the Search In dialogue box, use down arrow to select Protein Knowledgebase(UniProt). In the Query box, type “ hemoglobin beta”. Click “Search”. Step 3. This will take you to the beginning of the database. http://www.uniprot.org/uniprot/?query=hemoglobin&sort=score Step 4. Use the right-hand scroll bar to scroll through the names of the many entries. Find one for either a bird, a bat, or some other mammal. (I selected chicken) When you find one, check to make sure that it is the hemoglobin beta chain (preferably one without a number after it) and not the alpha or gamma or other hemoglobin subunit. If 2 the sequence is for the beta chain and it is for an appropriate species, click on it and the computer will retrieve the sequence. Step 5. Once you have selected your organisms, then hit Retreive at the bottom of the page. Step6. Once you are at the next page, you will see the UniProt Identifiers Step 7. Hit Align at the bottom of the page. Step 8. Go to the FASTA choice and click Open FASTA Sequence data in FASTA format. [ Download ( 2 KB* ) | Open ] 9. You will then see the FASTA data using the single letter identifiers for amino acids. >sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG KEFTPPVQAAYQKVVAGVANALAHKYH >sp|P02112|HBB_CHICK Hemoglobin subunit beta OS=Gallus gallus GN=HBB PE=1 SV=2 MVHWTAEEKQLITGLWGKVNVAECGAEALARLLIVYPWTQRFFASFGNLSSPTAILGNPM VRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHVDPENFRLLGDILIIVLAAHFS KDFTPECQAAWQKLVRVVAHALARKYH 10. However, you do not need to actually use these FASTA format, but you now know what they look like. 11. What you need to do is click Align which is found at the bottom of the page next to Retrieve. Scroll down to see what was found in the Uniprot database. Alignment 1 --MLTAEEKAAVTAFWGKVKVDEVGGEALGRLLVVYPWTQRFFESFGDLSTADAVMNNPK 58 P02070 HBB_BOVIN1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK 60 P68871 HBB_HUMAN *********.:**:*: ***:******:****************:******* ***::** 59 VKAHGKKVLDSFSNGMKHLDDLKGTFAALSELHCDKLHVDPENFKLLGNVLVVVLARNFG 118 P02070 HBB_BOVIN 61 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG 120 P68871 HBB_HUMAN *********.:**:*: ***:******:****************:******* ***::** 119 KEFTPVLQADFQKVVAGVANALAHRYH 145 P02070 HBB_BOVIN 121 KEFTPPVQAAYQKVVAGVANALAHKYH 147 P68871 HBB_HUMAN ***** :** :*************:** Here are some items to note in this “alignment” 3 Query >sp|P02112|HBB_CHICK Hemoglobin subunit beta OS=Gallus gallus GN=HBB PE=1 SV=2 MVHWTAEEKQLITGLWGKVNVAECGAEALARLLIVYPWTQRFFASFGNLSSPTAILGNPM VRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHVDPENFRLLGDILIIVLAAHFS KDFTPECQAAWQKLVRVVAHALARKYH >sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG KEFTPPVQAAYQKVVAGVANALAHKYH Date run Sept 2, 2011 Running time 9.2 seconds Identical positions 102 Identity 69.388% Similar positions 29 Algorithm clustalw 147 amino acids in each HBB * indicated identical amino acids : and . indicate similarities between two amino acids (conserved properties) Blank space indicates no similarity at all between two amino acids. 12. Continue scrolling down until you see Job Information Some analysis of this information: 102/147 amino acids are identical = 69.388% 29/147 are similar 16/147 are not identical 13. Now go back and do a search, retrieval and alignment for Human HBB and Bovine HBB. What would you expect to see to the % identity? What do your results tell you? 14. Of course, you could compare all three or any number of organisms at once. At this point, you should have the tools to complete Parts I, II and III. 4 Table 3 . Amino Acid Symbols 5 Procedure Part II “Reptiles with Feathers?” You may complete as a team. Some phylogenetic systematists (scientists who work to make the classification of organisms match their evolutionary history) complain that the vertebrate class Reptilia is improper because it should include birds. In technical terms, the vertebrate class Reptilia is paraphyletic because it contains some but not all of the species that arose from the most recent common ancestor to this group. Just how similar are reptiles and birds in terms of the beta-hemoglobin chain? Should birds be considered a type of reptile? Evaluate this question using a BLAST (Best Local Alignment Search Tool) search. A BLAST (Best Local Alignment Search Tool) search takes a particular sequence and then locates the most similar sequences in the entire database. A BLAST search will result in a list of sequences with the first sequence being most close to the one entered and the last sequence being least similar Repeat steps in Procedure Part I to gather data to answer the questions posed above. Table 4: Results of a BLAST search on the crocodile beta-hemoglobin sequence Similarity Species name & name of protein First most similar (do not use crocodile) Second most familiar Third most familiar Fourth most familiar Fifth most familiar Sixth most familiar Seventh most familiar Eighth most familiar Ninth most familiar Tenth most familiar 1. 2. 3. 4. 5. Were any of those species birds? One unusual reptile is the tuatara, whose name is Sphenodon punctatus. How similar is the tuatara to the crocodile? Does the tuatara appear in your list of ten? If not, how far down on the BLAST search list does it occur, fifteenth, twentieth? Most importantly, which species are more similar to the crocodile? (birds, or other reptiles?) Do the molecular data suggest that Reptilia is paraphyletic, or monophyletic? Explain. 6 Part III Problem: The first two parts of this exercise gave you background and tools to consider phylogenetic relationships among a variety of organisms. Do you ever wonder how closely related certain organisms were to each other? Well, now you have the background and tools to develop a working hypothesis. Select a group of 5-7 organisms (plants, animals, fungus, protists, bacteria, viruses) and develop a cladogram that depicts the phylogenetic relationship among the organisms. You should have data to back-up your cladogram arrangement. Hypothesis: Materials: Procedure: See Parts I and II above. Data: Analysis Conclusion Do your data support or not support your hypothesis? Cite specific reference to data. Error Analysis: Further Question/s to Investigate 7 8