April 5, 2006 Bio/CS-251 Creating Phylogenetic Trees With NCBI BLAST and ClustalW Laboratory 9 Objective: This laboratory will serve as an introduction to Phylogenetic Analyses. This is the scientific procedure that allows you to make intelligent hypotheses about the evolutionary history of a group of organisms or sequences. We will begin with the identification of one gene and then look for related orthologs using BLAST. After choosing a group of ortholog sequences, we will use ClustalW to compute ‘distances’ between the sequences and using a neighbor joining technique, and create a possible phylogenetic tree. We will also in this lab encourage you to develop independence in your investigative techniques in preparation for your larger project which will be assigned for the end of the course. Laboratory manual: for this Lab, you will refer to Bioinformatics for Dummies (BFD), pp. 382-397. The target gene: In this investigation we will consider various homologs of the pyruvate kinase gene. As usual we will begin with the E.coli pyruvate kinase gene. Activity 1: Go to the pubmed portion of the ncbi.nlm.nih.gov web site and look up some reference on pyruvate kinase and give a brief, one sentence description of the function of this gene. Assembling our cohort of genes: This portion of the lab will be the most time consuming part of the activity. We will differ from the procedure followed in lab 6 as far as accumulating our file of FASTA sequences. Go to www.ncbi.nlm.nih.gov/entrez and choose “gene” in the Search box and so we all have the same E.coli pyruvate kinase enter the accession number AAC74746 in “Search for” box at the top of the page . When you arrive at the default view of the information about this gene that contains the protein sequence, temporarily switch the view to FASTA and display the amino acid sequence in FASTA format. Open a WORD document and paste all lines of this display into the WORD document. You will be pasting 16 more sequences into this document. Activity 2: Erase the description of the amino acid sequence and replace this line with: >E. coli. Press the Back button and return to your data page about the gene. Copy the protein sequence and proceed to www.ncbi.nlm.nih.gov/BLAST Choose protein-to-protein BLAST and paste the sequence in the window. Go down to the options portion of the window and choose to display 1000 sequences and 500 alignments. Now BLAST away! Activity 3: Complete the following table using the results from BLAST. HINT to search these results you can use the “Find” option under Edit in the web page pull down menu. Enter the first part of the Latin name. You may, in a few cases, have to use “Find Next”, i.e., repeat the search. Species Sallmonella typhimurium Yersinia Pestis Bacillus Anthracis Ames* Nostoc sp. Glycine max Solanum tuberosum Aspergillus Niger Aspergillus Nidulans Agaricus bisporus Mus Musculus Rattus norvegicus Homo Sapiens Xenopus laevis Anopheles gambiae Drosophila melanogaster Gallus gallus * Type Accession # E-value Bacterium Bacterium Bacterium Blue-green Alga Green pea Potato Filamentous Fungus Filamentous Fungus Mushroom fungus Mouse Rat Human African Clawed Frog Mosquito Fruit Fly Chicken This is the strain of Anthrax that infected individuals via the mail after 9/11 %ID % Similarity Activity 4: Use either the Accession number or the reference number in the listing of the sequences in the BLAST report to go to the appropriate page for each of the above pyruvate kinase genes and paste the FASTA sequence into the WORD document. Once again remove the first line(s) and type in substitute lines such as >Frog, >Mouse, etc. For the Bacteria, Alga, and some fungi, use the Latin names to distinguish them. ClustalW will pick up only the first word of each description, thus if you want to label the sequence as Fruit Fly, write Fruit_Fly. Save the WORD Document in your Lab 8 folder, but keep it handy for the next phase of the lab. Now you are ready to use ClustalW to create a Phylogenetic tree. The steps listed at the bottom of Page 394 and on pages 395-397 BFD are very explicit. Follow these steps. If you are unfamiliar with using ClustalW, refer to Chapter 9 in your manual and last week’s lab. Activity 5: Save a copy of the Multiple Sequence Alignment in your Lab 8 folder as a Web Page (Complete). Actvity 6: Save a copy of the ClustalW Guide Tree and also the Phylogenetic Tree by pasting them into a WORD DOCUMENT. For the Phylogenetic Tree use the Phylogram View of the tree. This gives you a better feeling for distance from the root of the tree. In order to get a copy of the tree, center the part of the webpage containing the tree in your browser window. Now simultaneously press the Alt key and the Print Screen keys. This will copy that window to the clipboard. Go to the WORD document and paste this image in the document. If you want to get more fancy you can eliminate the Web page borders by first pasting the screen capture in Paint and cutting out just the Tree and pasting it in the WORD document. Discussion 7: Analyzing your Phylogenetic Tree. Answer the following questions about phylogenetic trees: 1. What is a node? What does it represent? 2. What does it mean to say that the branch length is scaled in your tree? 3. Is your tree rooted or unrooted? What does this mean? 4. From your phylogenetic tree, identify two clades, and list the leaves, or OTUs, that belong to each of the clades that you have chosen. Answer the following questions about your tree: 5. Generally, how many different clade groups (groups of closely related sequences) appear to occur in this dataset? List the species in each clade group. Do these clade groups appear to make evolutionary sense? 6. Green plants versus fungi: Which group appears to be more closely related to animals (flies, mosquitos, chickens, frogs, mammals)? How can you tell? 7. Do any of the sequences appear to represent an outgroup? What is meant by this term?