SYSTEMATICS AND MOLECULAR PHYLOGENETICS Prelab Reading and Questions Classifying Organisms Have you ever noticed that when you see an insect or a bird, there is real satisfaction in giving it a name, and an uncomfortable uncertainty when you can't? Along these same lines, consider the bewildering number and variety of organisms that live, or have lived, on this earth. If we did not know what to call these organisms, how could we communicate ideas about them, let alone the history of life? Thanks to taxonomy, the field of science that classifies life into groups, we can discuss just about any organism, from bacteria to man. Carolus Linnaeus pioneered the grouping of organisms based on scientific names using Latin. His system of giving an organism a scientific name of two parts, sometimes more, is called binomial nomenclature, or "twoword naming". His scheme was based on physical similarities and differences, referred to as characters. Today, taxonomic classification is much more complex and takes into account cellular types and organization, biochemical similarities, proteomic, and genetic similarities. Taxonomy is but one aspect of a much larger field called systematics. Taxonomic ranks approximate evolutionary distances among groups of organisms. For example, species belonging to two different superkingdoms are most distantly related (their common ancestor diverged in the distant past), with progressively more exclusive groups indicated by phylum, class and so on, down to species. Taxonomists, scientists who classify living organisms, define a species as any group of closely related organisms that can produce fertile offspring. Two organisms are more closely "related" as they approach the level of species, that is, they have more genes in common. Taxonomic Classification of Man Homo sapiens Superkingdom: Eukaryota Kingdom: Metazoa Phylum: Chordata Class: Mammalia Order: Primata Family: Hominidae Genus: Homo Species: sapiens Carolus Linnaeus was also credited with pioneering systematics, the field of science dealing with the diversity of life and the relationship between life's components. Systematics reaches beyond taxonomy to elucidate new methods and theories that can be used to classify species based on similarity of traits and possible mechanisms of evolution, a change in the gene pool of a population over time. Phylogenetic systematics is that field of biology that does deal with identifying and understanding the evolutionary relationships among the many different kinds of life on earth, both living (extant) and dead (extinct). Evolutionary theory states that similarity among individuals or species is attributable to common descent, or inheritance from a common ancestor. Thus, the relationships established by phylogenetic systematics often describe a species' evolutionary history and, hence, its phylogeny, the evolutionary relationships among organisms In phylogenetic studies, scientist used to use physical and biochemical characteristcs to draw conclusions about evolutionary relatedness of organisms. Nowadays, physical, biochemical, genomic and proteomic data is to draw these conclusions. Scientists then show these evolutionary relationships among organisms through illustrations called phylogenetic trees. Node: represents a taxonomic unit. This can be either an existing species or an ancestor. Branch: defines the relationship between the taxa in terms of descent and ancestry. Topology: the branching patterns of the tree. Branch length: represents the number of changes that have occurred in the branch. Root: the common ancestor of all taxa. Clade: a group of two or more taxa or DNA sequences that includes both their common ancestor and all of their descendants. Questions 1) What is taxonomy? What is systematics? 2) What is binomial nomenclature? 3) What is Phylogenetic systematics? 4) What do phylogenetic trees show? 5) What data was used in the past to construct phylogenetic trees? What data is used nowadays? Why is this more reliable? Mining Biological Databases on the Internet Lab Objectives Your performance will be satisfactory when you are able to Locate scientific publications and biological databases of DNA and protein sequences on the Internet Retrieve and compare sequence information from databases Compare evolutionary relatedness and draw phylogenetic trees from sequence comparisons Procedure Part A: Determine the evolutionary relatedness of species through comparisons of protein sequences 1. You will compare the sequences of the protein hemoglobin from bats, birds, and mammals. Decide whether you want to do your work with α-hemoglobin or β -hemoglobin. These are the two protein chains that carry oxygen in the circulatory systems of animals. Both of these proteins have been studied extensively in a large number of species and should work equally well. Alternatively, you may want to collaborate with a partner and do companion searches, one doing searches with α-hemoglobin, β -hemoglobin. At the end of the exercise, you can compare your results with each other to determine whether your different proteins showed the same evolutionary relationships between species of bats, birds, and mammals. 2. Go to http://www.uniprot.org/ 3. In Enter search key work, type “alpha hemoglobin” and click submit. The results of this search will come up on your screen. How many protein sequences were reported to you from this query? 4. You may scroll down and look through this long list of α-hemoglobin sequences for one from a bat species, but it may be faster to narrow your search. Go back to the Enter search key work and type “bat alpha hemoglobin” and click submit. When you get the results of this search, how many sequences of alpha hemoglobin did you get for bat species? NOTE: Check the species names and common names for each of the α-hemoglobins in this sequence report to make sure that they are bat sequences. Sometimes a search won’t recognize the difference, for example, between “bat” and some other word, such as “wombat”! 5. Select one bat α-hemoglobin sequence to save to a word document by clicking on the entry for that protein sequence it is located to the left of the entry name/accession code. An accession code is how protein sequences are identified and archived in databases. In the case of α-hemoglobin sequences, the entry name will start with the letters “HBA.” The symbols for all α- hemoglobins will begin with “HBA.” NOTE: If you are working with a partner who is doing a companion study with α -hemoglobin, you will have to collaborate on your selection of which bat α-hemoglobin sequences to save. 6. The page that opens will contain information about the sequence, such as the taxonomy of the organism from which it came. Scroll down to the Sequence section and click on the link for FASTA format, this will give you the protein sequence written with single-letter designations of the amino acids. This is the best way to save sequence information, because it is a sequence format that all computer search programs can understand. Click the link and this will bring up a page containing the sequence. 7. Copy and paste this sequence into a word document. Rename the sequence with an appropriate name. 8. Return to the web page with the list of bat alpha hemoglobin sequences (clicking twice on the Back button on the web browser will get you there). Identify another sequence for a bat α-hemoglobin and repeat the process of highlighting the FASTA formatted amino acid sequence to your word document. 9. When you have saved two α-hemoglobin sequences from two bat species, repeat steps 3-8 to get 2 sequences from bird species and 2 sequences from mammalian species. It doesn’t matter which species you choose, as long as 2 are from birds and 2 are from mammals. You might want to choose species that you think are related to bats. If you are collaborating with a partner searching for β-hemoglobin sequences, your partner should search for the same species that you have chosen.(Note: make sure that all of your sequences are the same for comparison, in other words they must all be HBA-1 or alpha subunit 1 and all approximately the same length). => IMPORTANT: Be aware that if you are limiting your search for bird α-hemoglobin sequences with the keyword “bird,” the search will only locate protein entries where the word “bird” appears. If the entry was archived under other descriptions such as “hawk” or “eagle” or “penguin,” you will not find entries using the keyword “bird.” When you have saved six α-hemoglobin sequences to your word document (two from bats, two from birds, and two from mammals), go to http://clustalw.genome.ad.jp . CLUSTALW is a computer program that you can use to search for sequence similarities between many sequences at a time and display regions of alignment. 11. Copy all of your sequences from the word document and paste it into the entry box in CLUSTALW and click Submit. (Make sure that protein sequence is selected) Note that the sequence descriptions preceded by the “>” mark will be copied in with the protein sequences. This will not be a problem with your search. Without changing any of the default settings on your search, click on the blue colored Execute Multiple Alignment bar. 10. 12. 13. The next page will show the alignment of amino acid sequences for the 6 proteins that you have retrieved from the SWISSPROT database, using the single-letter designations for amino acids. An asterisk will appear along the bottom row of amino acid alignment at positions where there is an amino acid that is found in all 6 proteins. These amino acids are said to be highly conserved, since they haven’t changed since these species diverged from a common ancestor. a. How many of the amino acids are found to be the same in all of the 6 α-hemoglobin sequences in your alignment? b. What percentage of all the α-hemoglobin amino acids are conserved in all 6 proteins? c. Are there any specific regions of the α-hemoglobin sequences that are especially conserved? Is one end of the molecule more conserved than the other? Describe your observations. d. Are there any amino acids that appear more frequently in conserved regions of the protein than in the nonconserved regions? If so, which amino acids are they? Go to the table at the end of this Lab Exercise to decode the single-letter designation for amino acids. At the top of your CLUSTALW report, you will find the exact percentages of amino acids in the sequence alignment that are identical when comparing only two sequences at a time. For example, if your report says “Sequences (1:2) Aligned. Score: 87.2”, this means that when the first two sequences saved on your disk were aligned, 87.2% of the amino acids were identical in both sequences. Transfer these percentages into a table format, in which the species whose sequences you have aligned are headers for both the columns and the rows. Your table should look similar to this: Table 1: Percent identity in amino acid alignment SPECIES (bat #1 100 (bat name #1 ) (bat# name 2) (bird Nam #1 e) (bird name) #2 name) (mammal #1 name) (mammal #2 name) (bat #2 87.2 name )100 (bird #1 name) (bird #2 name) (mammal #1 name) (mammal #2 name) 100 100 100 100 Notice that you need not fill out both halves of this table since the information is redundant. From this table, can you see whether the α-hemoglobin sequences are more similar for bats and birds, compared with bats and mammals? What does this suggest about the evolutionary relatedness of these species? Which species diverged from each other the most recently and have the most recent common ancestor? Which species have been divergent from each other the longest and have the most ancient common ancestor? From the information in this you should be able to predict that bats are more closely related to either birds or mammals. 14. A phylogenetic tree can present the relatedness of species from sequence similarity data such as your Table 1. These trees link species that are more closely related in branches, and the length of the branches is their evolutionary distance. You can draw a phylogenetic tree from your amino acid alignment report by pairing species that have the most sequence similarities to make short branches. Species who have fewer sequence similarities will branch from each other on the tree farther apart. The CLUSTALW on the page that your report appears on will automatically draw a phylogenetic tree for you. At the bottom of the page, click on the drop down select tree menu and choose one of the rooted phylogenetic tree options )check out the unrooted too) Print out or copy the tree that appears on the screen. Does the information on this tree agree with your analysis above of the Percent identity in amino acid alignment for α- hemoglobins table? Explain. 15. One way to evaluate the validity of the phylogenetic tree that you drew for bats, birds, and mammals is to compare it with trees constructed from sequences of other proteins. Compare your tree with a tree constructed by your partner searching for the β-hemoglobin sequences. Does your tree agree with theirs? Are the relative lengths of the branches the same? Repeat the comparisons that you made (steps 1-16 above) with other species such as the following: a. Compare whales to mammals and fish. b. Compare reptiles to birds and mammals. Print out or Draw the phylogenetic tree for each of these a write a brief conclusion about your findings. 16. Systematics and molecular phylogenetics lab helpful hints “accession code” = entry Scroll down to “sequence” and click on FASTA The file will look similar to this >sp|P11753|HBA_CYNSP Hemoglobin subunit alpha OS=Cynopterus sphinx GN=HBA PE=1 SV=1 VLSPADKTNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHFDLAHGSPQVKGHGK KVGDALTNAVSHIDDLPGALSALSDLHAYKLRVDPVNFKLLSHCLLVTLANHLPSDFTPA VHASLDKFLASVSTVLTSKYR Paste here