EXPERIMENT PHYLOGENETIC ANALYSIS Traditional Phylogeny has been based on morphological comparison and anatomy Presence / absence of true tissues Diploblastic/triploblastic (3 germ layers) Type of body symmetry Presence / absence of body cavity (coelom) Body cavity enclosed/not enclosed in mesoderm Pattern of coelom development (acoelomate, pseudocoelomate or true coelomate) Modern phylogeny is based on genetic data and DNA sequence comparison Advance in DNA sequencing techniques made large-scale sequencing practical and more affordable allowing for a huge accumulation of sequence data for any organism of interest. Data sequences of highly conserved genes across all organism are used in such studies. The most used sequences for phylogenetic and evolution studies have been those of ribosomal RNA (rRNA) which changes extremely slowly over time. It is the comparison of ribosomal RNA sequences from many organisms that led to a new classification into eubacteria, archea and eucaryae and a new much more precise phylogenetic tree of life. In this exercise we will determine the evolutionary relationship between the elongation factors EF-Tu and Ef-1a of a number of organisms based on sequence homology, and construct a phylogenetic tree that will show the evolutionary distances between the various organisms. GOAL: The goal of this exercise is to familiarize you with the basic bioinformatics tools available free on the internet to search, retrieve and analyze both DNA and protein sequences available in public data bases such as the GeneBank or the SWISSPROTEIN. You will need to retrieve the protein sequences from public data bases, make sequence alignments, determine % homology between the sequences and create a phylogenetic tree based on the sequence alignment result. All of those steps will be done on the computer and require no more than two hours. As you familiarize yourself with this type of analysis the time required to construct a phylogenetic tree for any gene available in the public sequence data base will decrease. 2007 MIT Teacher Workshop EXERCISE: 1) Retrieve the DNA or protein sequences from GeneBank and save in a world file. 2) Align the protein sequences using ClustalW and find the conserved motifs and pairwise homology 3) Present a table of pairwise % homology between the various sequences 4) Create of a phylogenetic tree based on the results of the protein or DNA alignment 5) Analyze the results 1) Retrieve protein sequences from the public database Go to the NCBI site: http://www.ncbi.nlm.nih.gov/ Click on PubMed at the top left and click on Go on the top right Click on Protein on the black row. You will get to a page that looks like You want to Search PubMed for elongation factor protein sequences (top of the screen). You will find over 1000 sequences for various organisms. You will retrieve the sequences of ET-Tu and Ef-1alpha for specific organisms: 2007 MIT Teacher Workshop Type EF-Tu E. Coli (or any protein of interest) in the blank space and click Go Stroll, down the page until you see the first mention of Elongation factor Tu (EF-Tu) (P0A6N1) Click on the accession number Scroll to the every end of the page you will see the amino acid sequence of the protein (in one letter code) This is not the right format for alignments. You need to obtain the FASTA format. At the left hand top of the screen you can select in display the format you want. Select Display: FASTA and push return. 2007 MIT Teacher Workshop You now have the protein sequence displayed in capital –one letter code for the amino acid. Copy and paste the sequences in a word document one after the other including the beginning (>gi…..) Select the FASTA report for the following: (feel free to add more) and paste in word document. Two Eubacteria: Two archeabacteria E. coli (Gram-) P0A6N1 and Bacillus Subtilis (Gram +) P33166) Methanocaldococcus jannaschii AAB98308 Pyrobaculum calidifontis YP_001056002 Chloroplast Pisum sativum (pea) CAA74893 Unicellular eukaryotes Candida Albicans XP_717581) Tetrahymena XP_001032213 Many multicellular eukaryotes Porifera Ephydatia cooperensis sponge AAT06177 Cnidaria Hydra magnipapillata BAA11471 Platylelminthes Girardia tigrina CAB89808, Dugesia japonica BAA08663 Nematodes C. elegans AAA81688 Mollusks Mytilus edulis bivalve AAD21859, Annelids Eunice yamamotoi BAA25733 Arthropoda Drosophila Melanogaster NP_996316 Raphia abrupta (yellowmarked caterpillar) AAC47605 Australobius scabrior AAQ77068 Echinodermata Eucidaris tribuloides, AAT06181 Chordata (frog, Zebrafish, Chicken, cow, Human) Xenopus laevis (African clawed frog) CAA39027 Danio rerio (zebrafish) AAY85516 Pelodiscus sinensis (Chinese softshell turtle) AB124568.1 allus gallus (chicken) NP_989488 Bos taurus (cattle) BAB60846, Rattus norvegicus (Norway rat) AAI11708 Sus scrofa (pig) ABG65696 Canis lupus familiaris (dog) XP_850819 Homo sapiens (human) NP_001393 2) Alignment: Once you have retrieved all of the proteins sequences and stored them as FASTA in a word document you need to compare the sequence to each other by aligning them one under the other. To do that you need a program that can do sequence alignments such as: Multiple Sequence Alignment by CLUSTALW Go to : http://align.genome.jp/ 2007 MIT Teacher Workshop Paste the entire document with the various Sequences of elongation factor into the ClustalW window. The format of the sequences must be FASTA and contain the file description at the beginning of the protein sequence. Make a multiple alignment for all the proteins. The ClustalW alignment will let you choose some features. You can also use the ClustalW help to learn about parameter settings and other things.) 3) Table of % homology ClustalW can calculate the pairwise homology between any 2 genes in %. You should get those numbers and present them as a table with each gene listed both vertically and linearly on the table. You should fill each entry with the % homology. 4) Phylogenetic tree ClustalW program can also generate a phylogenetic tree. You have to enter the protein alignment and get back the distance pairwise between each EF_Tu or EF-1 a. You then have to produce your own tree. You can do it by hand or you can use a phylogenetic tree program. There are other programs that can generate phylogenetic tree: http://www.genebee.msu.su/services/phtree_reduced.html (Phylogenetic tree) Phylip 5) DATA ANALYSIS: What does the tree tell you about the evolution of prokaryotes, eukaryotes and Archebacteria? Where does flatworms stand in the evolution of animals? What is the closest phylum to Platyhelminthes? The most divergent? Does the tree fit with the current classification of animals? 2007 MIT Teacher Workshop APPENDIX A: LAB SET UP At each bench A dissecting microscope (for two people) A phase contrast compound microscope ( for 4 people) One Heat block with holes for 1.5 ml large eppendorf tubes at 37 ºC for everyone One Heat block with holes for 1.5 ml large eppendorf tubes at 65 ºC for everyone Plastic tupperware to hold planaria colonies One sleeve of samll petri dishes per team (to hold individual planaria and cut pieces) Scissors and saranwrap ( to put over ice) Gloves Ice bucket Microcentrifuge (one for 4 people-2 teams) Sterile Pasteur pipettes (to spool DNA) 20, 200 µl and 1000 µl pipetmans (for DNA work) Sterile Tips of P20, P200 and P1000 (for DNA work) Labeling tape (color coded for each bench) Marker (color coded for each bench) Twizzors (to move filter discs) One rack for microfuge tubes (color coded for each bench) One multitube rack Kimwipes 70% EtOH in squirt bottle Disposable plastic transfer pipettes (to suck up planaria and add/remove water) Razor blades (to cut planaria) Scintillation vials (to hold planaria in the process of regeneration) Timer A flask containing Poland spring water One pipette aid 10 ml disposable pipettes 60 cc syringe (for conditioning experiment) Plastic cutting board with training trough Waste bins for tips and sharps Waste beakers for water 1.5 ml microfuge tubes At the back of the room: two gel boxes and power supplies for DNA agarose gel Film to take gel pictures, UV box and hand held camera Sterile 250 ml and 1 liter flasks Reagents: Sterile water Poland spring water TBE for agarose gel, DNA sample buffer Phenol-chloroform isoamylalcohol (25:24:1) RNAse A 2007 MIT Teacher Workshop Proteinase K Ethidium Bromide Agarose 1 KB Mw markers 6X DNA loading dye Reagents for immunohistochemistry Reagents for whole mount and staining 2007 MIT Teacher Workshop 2007 MIT Teacher Workshop