Dangerous Ideas, Spring 2005 Name: _________________ Lab 5: Alignment and Phylogenetic analysis of DNA Sequences OBJECTIVES: To understand how DNA can be used to study evolutionary history To become familiar with the process of aligning sequences and constructing phylogenetic trees To explore on-line resources including GenBank and BLAST MATERIALS: Access to a high-speed internet connection INTRODUCTION: Our collection of known DNA sequences has increased dramatically in the last few years due to recent advances in the field of molecular biology. The DNA sequence of an individual contains information that can be used in a wide variety of applications, from forensics to the study of evolution. Evolutionary biologists view DNA as a “document” of evolutionary history. Comparing the DNA sequences of genes from different organisms can reveal evolutionary relationships that might not otherwise be inferred from their morphology. Since genomes acquire mutations gradually, the amount of sequence difference found in two organisms should tell us something about how recently these two organisms shared a common ancestor. In other words, two organisms that share a relatively recent common ancestor should have more similar DNA sequences than two organisms that diverged earlier. Molecular phylogenetics is the field of study that attempts to determine the rates and/or patterns of change occurring in DNA (and other macromolecules) and to reconstruct the evolutionary history of genes and organisms. The evolutionary history revealed by the sequence data is frequently presented in a phylogenetic tree. Phylogenetic trees are branching diagrams depicting the evolutionary relationships of organisms. It is important to note that our current understanding of most evolutionary relationships comes from a variety of data including both traditional morphological approaches as well as molecular data. Researchers attempting to construct phylogenetic trees must go through a series of steps: Step 1: Acquire the DNA sequences- DNA sequences may either be determined directly by sequencing a region of DNA, or indirectly, by acquiring the sequence from a public database or published source. (DNA sequencing will be discussed in lecture; we will use public databases in our exploration today.) Step 2: Align the DNA sequences- Once accurate DNA sequences have obtained, they must be properly aligned to reveal their evolutionary relationships. Consider the following example: Organism 1- A T G G G C T G T C A A Organism 2- A T G G G T G T C A A T At first glance, organism 1 and 2 appear to have dramatically different DNA sequences. In fact, they seem to share only 6 of the 12 bases being examined (50% sequence homology). Now examine these sequences properly aligned: Organism 1- A T G G G C T G T C A A Organism 2- A T G G G - T G T C A A With a gap correctly inserted, it is now apparent that the two organisms share 11 of the 12 bases being examined (92% sequence homology). Correct alignment is difficult and usually done through the use of software such as CLUSTAL. Step 3: Construct a Phylogenetic Tree- With the sequences correctly aligned, a phylogenetic tree can now be constructed. Consider the following, aligned, sequences: Organism 1: A T G G G C T G T C A A Organism 2: A T G G G - T G T C A A Organism 3: A T G G G - T G T C A A Organism 4: A T G G G C T G T C A A These organisms seem to share some evolutionary history as they all have similar DNA sequences. Organisms 2 and 3, however, are both “missing” the C at position 6. Their evolutionary relationships, as predicted by this data set, could be presented as: 1 4 2 3 As the DNA sequence under consideration gets longer and more complicated, so, too, does the process of constructing an appropriate tree. Again, most of this work is done by using one of several software packages. DNA SEQUENCE RESOURCES: The National Center for Biotechnology Information (NCBI)Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease. You can explore NCBI at http://www.ncbi.nlm.nih.gov. Two especially useful services provided at the NCBI website are PubMed and BLAST. (Click the links in the upper header.) PubMed is a searchable database of published scientific papers in the fields of medicine and biotechnology. BLAST is a software program (a suite of algorithms actually) that allows one to search GenBank for similar sequences. This allows for the identification of unknown sequences as well as comparison between similar sequences. GenBankGenBank® is the National Institute of Health’s (NIH) genetic sequence database, an annotated collection of all publicly available DNA sequences. There are approximately 22,617,000,000 bases in 18,197,000 sequence records as of August 2002. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. In other words, this is a global, cooperative effort to share DNA sequence information as it’s acquired. This is the database against which BLAST will search to identify a sample sequence and/or find similar sequences in the database. YOUR EXPLORATION: Today we will access several DNA sequences from a public database, align them, and construct a phylogenetic tree. The sequences we will analyze today are from human mitochondrial DNA. (Remember that mitochondria contain their own DNA, and that this DNA is always maternal in origin.) Mitochondrial DNA has been extensively studied in an attempt to understand human evolution and prehistoric migratory patterns. Some anthropologists have argued that people evolved at least partly from the Neanderthals. The opposing theory is that modern humans evolved in Africa, then spread outward, overwhelming earlier hominids including Neanderthals. The short, squat Neanderthals inhabited much of Europe from about 100,000 years ago until dying out about 28,000 years ago. Analyzing mitochondrial DNA has provided data with which to evaluate these two different hypotheses. Acquiring Sequences: To access the sequence information for this exercise, you will need to follow these steps: 1. 2. 3. 4. Open an Internet browser and go to http://www.bioservers.org. Go to the butler labeled “Sequence Server” and click the “Enter” button below it. Click the “Manage Groups” button in the top center of your screen. From the pull-down menu under “Sequence Sources”, select “Prehistoric Human mtDNA”. 5. Eight different entries will appear in your window. Note that you can view these sequences by clicking on the red “View” button next to each. 6. Select all eight sequences by clicking in the box on their left. Click on “OK” after all are selected. Aligning the Sequences: We will now ask the server to align all eight of our sequences using a program called Clustal. 1. Select all eight of your sequences by clicking in the box to their left. With all the sequences selected, click on the “Compare” box directly above. 2. You will now be shown an alignment. The yellow color indicates regions where all the sequences do not align. Scroll through the sequence and note the high levels of variation! Constructing a Phylogenetic Tree: 1. Return to the previous screen by clicking on “Done”. 2. Be sure all eight of your sequences are highlighted. (Boxes to their left should be checked.) 3. Click on the toggle menu bar that currently says “CLUSTAL W”. Select “Phylogenetic Tree” and click on the “Compare” Button. 4. A window will open containing a phylogenetic tree based on the mtDNA sequence provided. TO TURN IN: Using the tree you just created, and the bioserver database, answer the questions on the following page. Lab 5: EXPLORING PHYLOGENETICS Names of Group Members: 1. What is the hypothesis being tested in this analysis? (Hint: There are two, conflicting hypotheses; you’ll have to pick one!) 2. What do you predict you’ll see in the phylogenetic tree if your hypothesis is correct? 3. In the space below (or on a separate sheet), draw the tree generated from the mitochondrial sequences analyzed: 4. Does this tree support your hypothesis? Explain. 5. To further clarify your data, return to bioserver. Close the window containing your tree. Click on “Manage Groups” again to import another set of sequences. This time select “modern human mtDNA”. Both sets of sequences will now appear in your window. Select one or two of the modern sequences and generate another phylogenetic tree. Draw this tree in the space below (or on a separate sheet). 6. Does this tree support your hypothesis? Explain.