HUMAN ANCESTRY AND MIGRATION USING YOUR MITOCHONDRIAL DNA OBJECTIVES To learn how to isolate human genomic DNA To learn how to do enzymatic reactions such as PCR To learn how to design primers for PCR To understand the PCR reaction and its’ uses To learn how to use gel electrophoresis to analyze PCR reactions To understand how DNA is sequenced To learn how to “read” a DNA sequence To learn how to use software such as ClustalW to compare DNA sequences To understand what DNA haplotypes are and what they can be used for To learn how to analyze human mitochondrial control region polymorphisms. To understand how single nucleotide polymorphisms can be used to create evolutionary or ancestral trees. To learn how to use DNA sequences to study human migrations and human ancestry Every human cell has a second genome, found in the cell’s energy-generating organelle, the mitochondrion. Each mitochondrion has several copies of its own genome, and there are several hundred to several thousand mitochondria per cell. The mitochondrial genome is therefore highly amplified. The entire sequence of the human mitochondrial (mt) genome, all 16,569 nucleotides, was determined in 1981, well before the completion of the Human Genome Project. The mt genome contains 37 genes, all of which are involved in the production of energy and its storage in ATP. Thirteen genes encode proteins involved in oxidative phosphorylation, which are translated in the mitochondrion with mt-encoded transfer RNA’s (22) and ribosomal RNA’s (2). Mammalian mt genes have a unique genetic code, where UGA = tryptophan, AUA =methionine, and AGA and AGG = stop. Genes take up the majority of the mt genome. However, a noncoding region of about 1,200 nucleotides spans either side of the arbitrary “0” position of the mt genome and goes by three confusing terms: control region, D-loop, and hypervariable region. Control region refers to the fact that this region contains the signals that control RNA and DNA synthesis. A single promoter on each DNA strand initiates transcription in each direction, and a single origin initiates replication of each strand. D-loop refers to the early phase of replication, when the first newly-synthesized strand displaces one of the parental strands, forming bubble or loop. The DNA sequence of the control region is termed hypervariable, because it accumulates point mutations at approximately 10 times the rate of nuclear DNA. This high mutation rate results in unique patterns of single nucleotide polymorphisms (SNPs), which are inherited in a Mendalian fashion. The control region is relatively tolerant of the high mutation rate, because binding sites for DNA and RNA polymerase are defined by only short nucleotide sequences. The high mutation rate of mt DNA is probably due to two factors. First, the DNA polymerase that replicates the mt genome, Pol III, lacks some of the mutation correcting properties of Pol II, which replicates nuclear DNA. Second, the mt genome in located in close proximity to 1 oxygen free radicals that are generated as by products of respiration. These reactive oxygen molecules mutate DNA by interacting with the carbon-hydrogen (C-H) bonds in deoxyribose. DNA damage by oxygen free radicals suggests an accelerating degradation of mitochondrial function over time. Accumulating mutations in the genes encoding electron transporters (NADH, dehydrogenase, cytochromes, and coenzyme Q) lead to decreased transfer efficiency, which, in turn, leads to higher production of superoxide and hydroxyl free radicals. Mutations in ribosomal and transfer RNAs lead to inefficient or errant translation of proteins encoded by the mt genome. There is growing evidence for this “mitochondrial theory of aging”. Since 1988, mutations in mitochondrial genes have been implicated in a number of degenerative diseases – including Alzheimer disease, mitochondrial myopathy, Kearns-Sayre syndrome, CPEO (chronic progressive external phthalmoplegia), Leigh syndrome, Pearson syndrome, dystonia, and diabetes. Not surprisingly, most of these diseases affect organs and tissues that have a high demand for energy. Mt mutations also accumulate in tumor cells, are so highly amplified that they can be detected in bodily fluids. In the 1980’s, Alan Wilson and coworkers at the University of California at Berkeley used mt DNA polymorphisms to create a human family tree showing ancestral relationships between modern populations. Reasoning that all human populations arose from a common ancestor in the distant evolutionary past, Wilson’s group calculated how long it would take to accumulate the pattern of mutations observed in modern populations. They concluded that the ancestor of all modern humans arose in Africa about 150,000 years ago. This common ancestor was widely reported as the “mitochondrial Eve”. This confusing simplification (which appeared to leave out Adam) is due to the peculiar inheritance of mt DNA. Mitochondria are inherited almost exclusively from the mother. Allthough a sperm cell has many mitochondria, they are located at the base of the flagellum. At fertilization, the tail is left behind, and only the male pronucleus (with 23 chromosomes) enters the egg cell. In addition to 23 nuclear chromosomes, the egg cell contributes all the cytoplasm and organelles to a zygote. Thus hundreds of maternal mitochondria are passed on to each daughter cell when the cytoplasm divides during the final stages of mitosis. There is some evidence for paternal contribution of mitochondrial DNA, although this is likely small. Mt DNA polymorphisms are also used in forensic biology, and are especially important in cases where the tissue samples are very old, very small, or badly degraded. Whereas there are only two copies of each chromosomal DNA sequence, there are hundreds to thousands of copies of a mt DNA sequence in each cell. Because of this high copy number, it is possible to obtain a mt DNA type from the equivalent of one cell’s worth of mt DNA. For example, control region polymorphisms have been used to: Identify remains of the Unknown Soldier killed in the Vietnam War. Identigy remains of the Romanov royal family killed in the Russian Revolution. Determine the relationship of Neandertal remains (about 30,000 years old) to modern humans. This experiment examines a 460 nucleotide sequence within the mt control region. Because of the large number of mt DNA molecules in any cell sample, this is one of the simplest human DNA sequences to visualize with PCR amplification. PERIOD 1: DNA ISOLATION AND PCR During this lab period you will isolate total DNA from your check cells and use PCR to amplify the D-loop region. MATERIALS 2 Saline solution (sterile, 0.9% NaCl) in bottle with 10 mL aliquot dispenser 1% SDS 3 M potassium acetate Isopropanol Sterile water Paper cup per student Plastic test tube per student, (15 mL plastic disposable screw top Falcon tube) 1.5 mL tubes (2 tubes per student) 100-1000 uL micropipette and tips 1-10 uL micropipette and tips Clinical centrifuges Vortexes Microcentrifuges Boiling water bath 65 C water bath with 1.5 mL tube float or rack Ready-To-Go-PCR Beads (1 tube per student) Primer/loading buffer mix (each PCR/student needs 22.5 uL, supply sterile stocks) Final concentration Stock solution for (1) rxn for (26) rxn 0.5 uM 5’ mtDNA primer 25 uM 0.5 uL 13 uL 0.5 uM 3’ mtDNA primer 25 uM 0.5 uL 13 uL 13.9% Sucrose 20% 15.6 uL 405.6 uL 0.0082% Cresol red 0.1% 1.4 uL 36.4 uL 10 mM Tris pH 8.0 100 mM 2.25 uL 58.5 uL 0.1 mM EDTA 1 mM 2.25 uL 58.5 uL Thermocycler PROCEDURE: ISOLATION OF YOUR DNA 1. Obtain 10 mL of saline solution in a paper cup. 2. Pour the saline solution into your mouth, vigorously swish for 10 seconds, then expel back into the paper cup. 3. Pour your sample in to a labeled plastic test tube, and in a balanced configuration use a clinical centrifuge to spin your sample for 10 min at high speed. Save your cup for later use. 4. Pour the supernatant back into the paper cup being careful not to disturb the cell pellet at the bottom of the test tube. 5. Take the paper cup to the bathroom and pour its contents into the toilet. Place the cup in the bathroom trash bin. 6. Use a pipette to transfer 350 uL of 1% SDS to the tube containing your cell pellet. 7. Mix cells and 1% SDS by vortexing for 15 seconds. Incubate for 5 min at room temperature to lyse cells. Cell lysate should become viscous. 8. Transfer 300 uL of cell suspension to a 1.5 mL tube containing 100 uL of 3 M potassium acetate (you will have to add using a micropipette). Mix by inverting the tube several times. A precipitate should form. 9. Place your tube along with other students’ tubes in a balanced configuration in a microcentrifuge and spin for 2 min. 10. Use a clean tip on a micropipette to transfer 300 uL of the supernatant to a new 1.5 mL tube containing 300 uL of isopropanol (you will have to add the isopropanol). Mix by rapidly inverting the tube several times. 3 11. Let stand at room temperature for 2 min to precipitate DNA. A small fluff of DNA should be visible in the tube once the DNA has precipitated. 12. Place tube in a microcentrifuge and spin for 3 min to pellet the DNA. Pellet should appear as a tiny, tear-drop shaped smear at the bottom of the tube. Do not become concerned if you see no pellet. 13. Pour of supernatant completely, and air dry 4 minutes. 14. Resuspend DNA in 100 uL of sterile distilled water. Incubate in a 65 C water bath for 5 min to speed up the resuspension process. Vortex or mix vigorously with finger for 15 seconds. 15. Keep your sample on ice now, and store in the freezer until next week if needed. PROCEDURE: AMPLIFICATION OF YOUR MITOCHONDRIAL DNA 1. Choose one PCR tube and record its number (do not forget your number no one else will know the identity of your tube, your DNA sample is anonymous from this point on). Your instructor has the PCR tubes on ice, eight tubes to a strip. Each tube holds a small bead which contains some of the components required for PCR including the DNA polymerase, magnesium chloride, and deoxynucleotide triphosphates. The strips of tubes will only labeled on the sides of the first and last tubes. 2. Use a micropipette to add 22.5 uL of the primer/loading buffer mix to your PCR tube containing a Ready-To-Go PCR Bead. Tap the tube with finger to dissolve bead. a. Primers in the primer/loading mix are: 5’ mt D-loop 5’ttaactccaccattagcacc-3’ 3’ mt D-loop 5’gaggatggtggtcaagggac-3’ 3. Use a fresh tip to add 2.5 uL of your DNA to your PCR tube. Pool reagents at the bottom of tube by sharply tapping tube bottom on the lab bench or pulse spin in a microcentrifuge. 4. Place your PCR tube in the thermocycler and begin the amplification. The thermocycler will run 30 cycles, with each cycle consisting of: 30 sec at 94 C, 30 sec at 58 C, and 30 sec at 72 C. 5. After amplification your samples will be placed in the freezer 4 PERIOD 2: GEL ELECTROPHORESIS In this exercise you will carryout agarose gel electrophoresis and examination of your PCR results to verify a product before sequencing. Only those amplifications that were successful in creating a single sharp band of the expected size can be used in sequencing reactions. MATERIALS Agarose (0.4 g pre-weighed; 3/class) 1X TAE buffer 25 ml graduated cylinder microwave gel boxes and combs EtBr solution (1 mg/L) in tray for gel staining Water in tray for gel destaining Rapid agarose gel electrophoresis rigs and power supplies (3 per class) 10-well combs (3 per class) DNA size markers near 400 bp in loading buffer, in aliquots 10 uL micropipettors and tips 1.5 mL microcentrifuge tubes and racks Latex or non-latex gloves Transilluminator Ethidium stained gel waste Latex gloves PROCEDURE: VERIFICATION OF PCR PRODUCT BEFORE SEQUENCING 1. 3 students will prepare agarose gels for the class. Add 20 ml 1X TAE to the agarose, swirl and cover with a plastic beaker. Microwave carefully just until completely melted. Let cool 5 min. and then pour into the casting tray. Add the 10-well comb. Let cool. 2. Obtain, thaw, and keep your PCR sample in the instructors ice bucket, but do not separate your tube from the strip. You will use only a small portion of the PCR for gel electrophoresis to see if your reaction was successful. The remaining amounts of the only the successful reactions will be used for sequencing. 3. After the gel has polymerized assemble the electrophoresis station as demonstrated by your instructor. Pour 250 mL of 1x TAE in the bottom, place the gel inside, and add 250 mL of cooling water. Once the cooling water has been poured over the gels, you should load your samples quickly, as the TAE buffer will start to leach out of the gel eventually. 4. 8 student PCRs will be loaded on each gel. Skip lanes 1 and 10, as these will be used for ladder and negative control. Load 5 uL of your sample to one lane on a gel. Use a separate pipette tip for each sample, and load carefully to avoid air bubbles. 5. After all 8 student samples are loaded, load 5 ul of the ladder in lane 1, and 5 ul of the negative control in lane 10. 6. Draw a map of the gel in your notebook with the lanes labeled. 7. Place the cover on the gel box, connect the electrophoresis station to a power supply, and run the electrophoresis at 210 V for 9 min or until the dye has move about 2/3 of the way down the gel. Do not run the dye off the gel! 5 8. Stain the gel for 3 min (no longer) in 1 mg/L ethidium bromide solution, and observe, or destain in water for 10 min before observing. Used gloved hands to handle the gel during staining and at all times afterwards. 9. Place the gel tray on the transilluminator and view the results. Create a figure of your gel in your notebook complete with a descriptive figure title. 10. Using gloves, dispose of gel in the ethidium waste. 11. During the next few days your or some of your classmates PCR products will be sequenced using the di-deoxy chain termination method and the sequences provided to you next period. a. For an animated tutorial on di-deoxy sequencing visit www.dnalc.org and click on the box in the upper right called DNA. Once at the DNA interactive page click on “manipulation” in the blue bar across the top. At the Manipulation page click on “techniques” and after the new page opens click on “sorting and sequencing” near the top of the page. Finally, view both the “Early DNA sequencing” and the “Cycle sequencing” animations. 12. Determine your own maternal lineage for as far back as possible before next lab. This will allow you to test hypotheses concerning your ancestry with your sequence data. COMPLETE / ANSWER THE FOLLOWING 1. On your gel figure: a. Label the sizes of the DNA size markers. b. Identify and label the band(s) that are likely to be a result of amplification of a portion of your mtDNA? For what reason did you select the band you did? What mechanism of the PCR amplification allows you to pick this band with confidence? 2. Are the bands from other students similar in size to yours or different? 3. About how much size difference in terms of base pairs between two hypothetical bands on your gel, would you estimate to be required before you could say with some confidence that they are different lengths? 4. We will be sequencing the PCR products displayed in your gel and therefore will be able to determine exactly how long the products are. Would you predict differences in length among the different samples? Why or Why not? 5. Would you predict differences in actual sequence? Why or Why not? 6. The high mutability of the mitochondrial genome means that it evolves more quickly than the nuclear genome. This makes the mitochondrial D-loop region a laboratory for the study of DNA evolution. However, can you think of any drawbacks to this high mutation rate? 7. There are numerous insertions of mitochondrial DNA into nuclear chromosomes. Notably, scientists recently discovered a 540 bp fragment of the mitochondrial D-loop region that inserted into chromosome 11 approximately 350,000 years ago. Would you expect any difference in the mutation rates of the D-loop region sequence in the mitochondrial genome versus the chromosome 11 insertion? What implication does this have in the study of human evolution? 6 6. Answer the following questions using the 40 bases of sequence information for 6 students: Student Student Student Student Student Student 1 2 3 4 5 6 1 GCGCGGGGGC GCGCGGGGGC GCGCGGGGGC GCGCGGGGGC GCCCGGGCGC GCCCGGGCGC ATTATCGTAA ATTATCGTAA ATAATCGTTA ATAATCGTTA ATCATCGTCA ATCATCGTCA ATGCGCGCGA ATGCGCGCGA ATGCGCGCGC ATGCGCGCGC ATTCGCGCTA ATTCGCGCTA 40 CTGAATTTTC CTGAATTTTT CTGAATTTTA CTGAATTTTG CTGAATTTTC CTGAATTTTT a. A position in the sequence where the identity of a base varies between individuals is called a single nucleotide polymorphism or SNP (snip). How many SNP’s are present in the data set above? b. What are the base positions of the SNP’s? Use the scale at the top of the sequence data. There are 40 bases to a line occurring in blocks of 10 to make it easy to count. c. Use the SNP’s to put the students into groups. Students that share identical sequence at a SNP are said to be more closely related to each other than to students that have a different base at that SNP. Put students with the most similar SNP’s patterns in common groups. How many groups do you have? What students are in each group? d. Use your SNP data to create a way to quantify relatedness between groups. Which of the two groups are most similar? e. Which of the two groups are most dis-similar? f. How much greater is the difference between the most distant pair and the closest pair? g. Create a forked line (dendrogram, tree of life sort of diagram, etc.) diagram to illustrate relatedness among the 6 students. 7. Answer the following questions using the 40 bases of sequence information for 6 students: Student Student Student Student Student Student 1 2 3 4 5 6 1 CCGCGGGGGC GCGCGGGGGT GCGCGGGGGC GCGCGGGGGC GCGCGGGGGC GCGCGGGGGC ATTATCGTAA ATTATCGTAA TTTATCGTAA ATTATCGTAC ATTATCGTAA ATTATCGTAA ATGCGCGCGA ATGCGCGCGA ATGCGCGCGC ATGCGCGCGC TTGCGCGCGA ATGCGCGCGG 40 CTGAATTTTC CTGAATTTTC CTGAATTTTC CTGAATTTTC CTGAATTTTC CTGAATTTTC a. How many SNP’s are present in the data set above? b. What are their positions? c. Put the students with common SNP’s in groups. How many groups do you have? d. Were all the SNP’s present in the data informative in creating your groups? Why or Why not? 7 PERIOD 3: HUMAN MITOCHONDRIAL D-LOOP SEQUENCE ANALYSIS In these exercises you will analyze human mtDNA D-loop sequences for single nucleotide polymorphisms and use this data to investigate the evolutionary origins of modern humans and then using similar methods carry out your own student designed investigation using student mitochondrial D-loop sequence data. MATERIALS Human mtDNA sequence Computer room EXERCISE 1: HUMAN ANCESTRY Since their discovery in the Neander Valley of Germany in 1856, the heavy-set bones of Neandertal have fascinated scientists, as well as the general public. Neandertal was an archaic member of the genus Homo, which lived in Europe beginning about 300,000 years ago and became extinct about 30,000 years ago. Clearly, during part of its span on earth, Neandertal shared its European habitat with modern humans (Homo sapiens). There has long been controversy about whether or not Neandertal was the direct ancestor of modern humans. Alternately, if Neandertal and Homo sapiens were separate, was there any significant exchange of genes between the two populations? According to the multiregional model, modern humans developed concurrently from several different archaic populations living in different parts of the world. Under this model, Neandertal was the ancestor of modern Europeans, while Java man (Homo erectus) was the ancestor of modern Asians. According to the displacement model, better known as “Out of Africa”, Homo sapiens arose from a single founding population that emerged from Africa in the last 100,000 to 200,000 years. This group migrated successively to Europe and Asia, displacing archaic hominids. In 1997, an international research team headed by Svante Paabo, extracted DNA from the humerus of the original Neandertal speciman, amplified the sample by PCR, and cloned the resulting products in E. coli. The cloned fragments were then used to reconstruct a 379 bp stretch of the mitochondrial D-loop. Now, you will use the DNA Sequence Server at the DNA Learning Center WWW site (http://www.bioservers.org/bioserver/) to recreate this study and answer the questions that follow to learn methods that will be helpful for completing your student investigation. PROCEDURE 1. Open browser and go to http://www.bioservers.org/bioserver/ and enter the “Sequence Server” page workspace. 2. Click “Manage Groups” to open a new window in which you can view mitochondrial (mt) Dloop region sequences currently in the DNA Sequence Server database. 3. In the new window that opened, use the pull-down menu in the upper right hand corner under “Sequence sources” to select and view a list of groups of mt D-loop sequences. Select “Public” under “Sequence sources” then scroll and look for the CSU Chico Biology 6a file. When you have found this file check the box to the left of it and then click “OK”. This will bring the entire set of CSU Chico Biology 6a sequence files to your workspace. 4. The CSU Chico Biology 6a files should now be present on your workspace. Use the scroll button to locate your sequence and leave it selected in the window. Beneath this window is a second with “none” selected in it. Use the scroll button to highlight another student sequence within this window. 8 5. Back in the Sequence Server workspace, visually gauge the quality of the sequences in the following way: a. Find the "View" button to the right of the sequence name you wish to check, and click on it. Every sequence will begin with nucleotides (A, T, C, G) interspersed with Ns, indicating that the nucleotides could not be determined at these positions. In “good” sequences, where experimental conditions were near optimal, the Ns at the beginning of a sequence will end abruptly. The remaining sequence will have very few, if any, internal Ns. Then, at the end of the “read”, the sequence will abruptly change over to an uninterrupted string of Ns. In non-optimal cases, a large number of Ns will be interspersed throughout the sequence. When possible, use good sequences in all your subsequent comparisons. 6. Back in the Sequence Server workspace, click the two check boxes left of the two student sequences. Now, set the pull-down menu next to the “Compare” button on the upper left of the screen to “ClustalW”. Then click on “Compare”. A new window should open displaying a multiple sequence comparison. 7. Click on the “trimmed” radio button and then click on “redraw” 8. In the new window count the number of SNPs between the two CSU Chico sequences. A yellow box indicates positions with a nucleotide difference. positions with an “N”, where a nucleotide could not be determined. A gray box indicates Do not count any internal Ns. Also count dashes (-), which indicate a deletion, a nucleotide that is absent at that position in the sequence. 9. How many SNPs exist between the CSU Chico students? When finished, click done at the bottom of this window. The CSU Chico Biology 6a files should still be present on your workspace. Use the scroll button of the "none" window to highlight another student sequence (a third, you already have two) within this window. 10. Click on “Manage Groups” and again to go back to “Sequence sources”. Then use the pulldown menu in the upper right hand corner to select “Prehistoric human mtDNA” sequences, then check the boxes next to all three of the “Neandertal” sequences, then click on “OK” to import these sequences to your workspace. 11. Back in the Sequence Server workspace, click all the check boxes left of the three student sequences and the three Neandertal. Now, set the pull-down menu next to the “Compare” button on the upper left of the screen to “ClustalW”. Then click on “Compare”. A new window should open displaying a multiple sequence comparison. 12. Click on the “trimmed” radio button and then click on “redraw” 13. In the new window count the number of SNPs between the group of Neandertal sequences and the group of CSU Chico sequences. Once again A yellow box indicates positions with a nucleotide difference. positions with an “N”, where a nucleotide could not be determined. A gray box indicates Do not count any internal Ns. Also count dashes (-), which indicate a deletion, a nucleotide that is absent at that position in the sequence. And this time Do not count any SNPs that are unique to a single individual. Try to count only those SNPs that would be useful in grouping students versus Neandertals. 9 14. How many SNPs exist between the Neandertal group and the CSU Chico group? finished, click done at the bottom of this window. When 15. Now import 3 Chimpanzee sequences to the workspace. To do this, once again click on manage groups”, then under “sequence sources” scroll and select “Non-human mtDNA, check the box to the left of the “Primate mtDNA” sequences, and then click “OK”. In the workspace look for the “Primate mtDNA” file and use the scrolling windows to bring in three Chimpanzee sequences. 16. Back in the Sequence Server workspace, click all the check boxes left of the three student sequences, the three Neandertal, and the three Chimpanzee. Now, set the pull-down menu next to the “Compare” button on the upper left of the screen to “ClustalW”. Then click on “Compare”. A new window should open displaying a multiple sequence comparison. 17. Click on the “trimmed” radio button and then click on “redraw” 18. Count the number of SNPs between the group of Chimpanzee sequences and the group of CSU Chico sequences. 19. How many SNPs exist between the Chimpanzee group and the CSU Chico group? When finished, click done at the bottom of this window. 20. Based on your data, to which group, Neandertals or Chimpanzees, are CSU Chico students most similar to? EXERCISE 2: HAPLOTYPE ANALYSIS A problem with individual SNP’s is that, as there are only four bases, it is possible for the same mutation to occur independently in different lineages. When comparing sequences this can lead to confusion, so it is better to use multiple SNP’s to form groups, as you did in the earlier assignment. The term for a group of sequences that all have the same specific SNP’s is “haplotype.” When describing haplotypes, you usually leave out all of the bases that were the same for all of the sequences, and only show the parts of the sequence where there are SNPs. Thus, if a sequence alignment of two sequences gave this result, Sequence 1 Sequence 2 1 5 10 15 AACGTTTGGCTGGGATCC AACATTCGGCTTGGATCC Then the haplotypes would be 4 7 12 Haplotype 1 G T G Haplotype2 A C T In the haplotypes above, only the fourth, seventh and twelfth positions are shown, as they were the only ones that were different. Researchers have defined several different haplogroups for human mitochondria (a haplogroup is a collection of closely related haplotypes) and use these to follow human migrations and evolution. As mitochondria are only inherited from the mother, the mitochondrial haplotypes trace the maternal lineage only. You should have, barring a rare mutation, the same mitochondrial haplotype as your mother, and as her mother, and so on. Over time, the mitochondria will mutate in different women and new variant haplotypes will appear. At the same time, haplotypes are lost when all of the women carrying a particular haplotype fail to have any female daughters. Knowing the rate of mutation of the mitochondrial DNA and the current diversity of mitochondrial sequences, it is possible to determine the haplotype of the last common female ancestor of a particular haplogroup, and to estimate how long ago she lived. This can also be done for the last common ancestor of all of the haplogroups, and this woman is referred to as the “Mitochondrial Eve.” Current analysis suggest that she lived in Africa approximately 150,000 years 10 ago. While there were almost certainly other women alive then, all of the other mitochondrial sequences from then have been lost, so we are all her direct descendents. Over time as humans migrated to different parts of the world, new mutations have appeared creating various haplogroups. Each of these haplogroups is given a letter designation, such as A, L, X, etc. Related similar haplogroups may also get a numerical designation, such as U1, U2, etc. Studies on native people for a region can reveal the haplogroups that evolved there, although frequent migrations and mixing of people can make this difficult. For instance, there are seven haplogroups known to have originated in Europe, yet it is possible to find these haplotypes anywhere in the world now. Below is a map of the current distribution and presumed migrations of some of the more common haplogroups. Figure from http://www.familytreedna.com/default.asp Within a particular region, the frequency of the different haplogroups can vary greatly; for instance, haplogroup H makes up 47% of all the haplotypes found in Europe, while K is only 6%. In the next exercise, you will determine your haplotype and compare it to the known haplotypes and the class haplotypes. PROCEDURE 1. Click “Manage Groups” to open a new window in which you can view mitochondrial (mt) Dloop region sequences currently in the DNA Sequence Server database. 2. In the new window that opened, use the pull-down menu in the upper right hand corner under “Sequence sources” to select and view a list of groups of mt D-loop sequences. Select “Public” under “Sequence sources” then scroll and look for the “Human Haplotype” file. When you have found this file check the box to the left of it and then click “OK”. This will bring a set of mitochondrial sequences of known haplotype into your workspace. You should still have the set of CSU Chico Biology 6a sequence files in your workspace, but if not use the same procedure to add them. 11 3. Select your sequence, and several of the known haplotypes as above. Use the map above to pick the ones most likely to be similar to yours, but be sure to also use haplotype H, as all of the standard haplotype descriptions are based on differences from that haplotype. 4. After clicking on the check boxes next the sequences you will use, set the pull-down menu next to the “Compare” button on the upper left of the screen to “ClustalW”. Then click on “Compare”. A new window should open displaying a multiple sequence comparison. 12 5. Do not trim the sequences, you will need to find the correct locations in the sequence so you need to use the sequence of the haplotypes from the beginning. Check all of the SNPs in the table below and fill in the bottom row with the bases in your sequence for each of the locations. Haplotypes H K J T U2 U5 V X I A B C D M M1 L3d L3b L2 mt-Eve L1b L1c Mitochondrial hypervariable region II SNP locations and sequence, numbered from base 16000 of the mitochondrial DNA sequence (add 16000 to the numbers below to get the position in the mitochondrial genome). 5 6 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 1 9 2 2 2 8 8 2 2 3 4 6 7 7 9 9 9 1 1 2 5 6 6 4 7 9 7 9 3 4 0 9 4 0 8 0 4 8 1 9 7 2 0 2 A C T T G C T C T A T C C C C C T T G C T C T A C T T G C T C C A T C C C C C T C G C T C T A T T C G C T C T A T C C C C C T T G C T C T A C T C G C T C T A T C C C C T T T G C T C T G C T T C C T C T A T C C C C C T T G C T C T A C T T G C T C T A T C T C C C T T G C T C T A C T T G C T C T A T C C C C C C T G C T C T A C T T G C C T T A T C C T C C T T G C T C T A C T C G C T T T A T C C C C C T T G C T C T A C T T G C T T T A T C C C T C T T A C T C T A C T T G C C C T A T C C C C C T T G C T C T A C T T G C T T T A T C C C C C C T G T T C T A C T T G C T T T A T C C C C C T T G C C C T A C T T G C T T T A T C C C C C T T G C T C T A C T T G C C T T A C C C C C C T C G C T C T A C C T G C T T T A T C C C C C T T G C T C T A C C T G C T T T A T C C T C C T T G C T C C A C T T G C T T T A T C C T C C T T G C T C T A C T T G T C T T G T C C T C C T C G C T C T A C T C G T C T T A T T T T C C T C G C T C T A C T T A T C T T A T C C T C T T C G C T T T 3 9 0 G G G G G G G G G G G G G G G G G A G G G While these are some of the major haplotypes, thre are other haplotypes that are also members of the same haplogroups, and there are several haplotypes not included in the table for space reasons. Check your sequence for any other SNPs relative to the H haplotype and write the location here. As we are actually in the 16000 region of the mitochondrial sequence, you need to add 16000 to the locations you find in the alignment. Other SNPs: 6. Below is a figure that shows many of the other haplotypes. If you have any additional SNPs besides the ones in the table above, start with the closest haplotype from the table, find that haplotype in the figure and look at the other nearby haplotypes for SNPs that match yours. The locations of the SNPS that are different from one haplotype to the next are written on the line connecting the haplotypes. Only look at the SNPs that start in the 16000s, as that is all we have sequenced. 13 14 15 6. If there is no haplotype already described that is the same as yours, add your haplotype to the figure wherever you think it belongs, based on the sequence. There is a copy of this figure on the wall, please put the number of your sequence by the correct haplotype (or add one if you needed to do that). Next week you can look at the distribution of sequences from the class. QUESTIONS: The number of differences in mt sequence provides a measure of the genetic distance between populations, that is the amount of time that has elapsed since divergence from a common ancestor. Before one can use mt mutations as a molecular clock, one must set the clock by some reference. The reference for hominid evolution is the estimated divergence between humans and chimpanzees 4 million years ago. 1. Looking at the haplotype tree above and the map showing where the haplotypes are found, which continent has the most genetic diversity? Why do you think that is? Look at the location of the H haplotype in the tree above. Does it seem like the logical choice to compare all of the SNPs to? If not, why do you think the H haplotype was used as the reference sequence? 2. Assuming that the mt mutations occur at a constant rate, use the human-chimp divergence estimate and the average number of chimp-human sequence differences to calculate the average time span between mutations. Hint, the units of your answer will be years/mutation. 3. Use this value to calculate a divergence time for a Neandertal-modern human divergence. 4. Using the data from the figure above on diverse groups of modern humans, calculate a divergence time for an African - Native American divergence, for a CSU Chico student (yours if available) - African divergence, and for a CSU Chico student - Native American divergence. 5. Scientists have used both mt and chromosomal DNA mutations to calculate a divergence among groups of modern humans that began about 150,000 years ago. Why is this number 3 to 4 fold less than your calculation? Hint, in additional comparisons between !Kung and Yoruba sequences (both African groups) and between !Kung and Algeria sequences (again both African groups) 3 and 4 SNPs can be detected respectively. At 111,000 years / mutation this would give a divergence dating back 333,000 to 444,000 years. 6. What does this tell you about the relationship between Neandertals and diverse groups of modern humans? EXERCISE 3: D-LOOP POLYMORPHISM STUDENT DESIGNED INVESTIGATION Use your new skill s and start your own D-loop polymorphism investigation using your student sequences and others from the database. Make sure your investigation is question/hypothesis driven. Some examples of questions you could ask with the available sequences are: Which human haplotypes are closest to the ancestral haplotype (compare human sequences to neandertal or chimp)? Where did the Native Americans come from? Australian aboriginies? African Americans? Are Germans different from the English or Spanish? Are humans more closely related to chimpanzee or gorillas? Etc. Your group will orally present your findings during the next period but prior to our Lab final exam. You should be sure to present the question, hypothesis, what sequences you used, summary data table of SNP differences, and conclusion. You should be able to defend your choice of particular sequences as good choices for testing your hypothesis. Your presentation is expected to take about 10 min and be presented with 2 to 3 well–designed overheads. The 16 presentation is worth 20 points and each of your collaborators will provide your instructor with their own estimation of your contribution to the project for determining your grade on this project. 17