Jones 1 THE STUDY OF THE EVOLUTION OF MODERN HUMANS THROUGH THE ISOLATION, AMPLIFICATION, AND EXAMINATION OF THE D-LOOP SEQUENCE Debra Jones Abstract: During this experiment the genomic DNA of an entire class population of 54 students was isolated, and the D-loop sequence from the mitochondrial DNA was isolated and amplified through PCR and gel electrophoresis. These DNA sequences were then compared to sequences from a chimpanzee, a Neanderthal, and from other humans from the class population and from around the world. The global sequences used were obtained from the NCBI BLAST database and search tool. The average proportional divergence between humans and Neanderthals was found to be 0.058508182, and the divergence time was calculated to be 761,811.2802 years. The average divergence between chimpanzees and humans was found to be 0.16562, and the divergence time was already given as 5,000,000 years. The average proportional divergence of modern humans (the class population) was found to be 0.015242915, and the last common ancestor of modern humans was calculated to have occurred 424,161.7184 years. This divergence time matches the displacement model of human evolution. Introduction: Mitochondrial DNA is found as a circular genome that is found in the mitochondria. It was chosen for this experiment because it is quickly and highly amplified. Hundreds of mitochondria can be found in each cell of the body, and each mitochondria has several copies of its own genetic material (Genetics Home Reference). These high levels of mitochondrial DNA allow for it to be easily isolated from small samples cells. Jones 2 The section of DNA that was sequenced was the D-loop region. The D-loop region is a non-coding region of mt DNA. It is approximately 1,200 nucleotides in length on each side of the initial position of the mitochondrial genome. During replication, this segment of DNA creates a loop that can be easily isolated. It is for this reason that the D-loop sequence was chosen to be isolated. Also, the D-loop sequence experiences frequent mutations and is highly irregular. The mutations that occur in the D-loop DNA are inherited by an organisms offspring. Therefore, the mutations can be easily tracked back to thousands of years (Olivo et al, 1983). From this, evolutionary patterns can easily be followed, and a common ancestor from which all of the observed organism are related to can be found During this experiment, a 400 nucleotide sequence of mitochondrial DNA was extracted from my own cheek cells, and from cheek cells of the entire classroom population. The extracted DNA was isolated and amplified using PCR and a thermocycler. PCR allows for high levels of amplification in a very short amount of time. The DNA obtained from the cheek cells was used as a template strand. A primer was added to the target sequence, and DNA polymerase paired nucleotides to the template strand starting at the 3’end of the primer (Shi and Chiang, 2005). This creates a complimentary fragment that is identical to the other single stranded template of DNA. Through this process, the desired region of DNA was able to be greatly amplified. These amplified segments of DNA were then run through gel electrophoresis, the DNA strands were separated by size, and the DNA was sequenced. The results were compared to the DNA sequences of the rest of the class. Most of mtDNA of the class population had different DNA sequences. These differences are called single nucleotide polymorphisms, or SNPs. SNP is a type of variation in DNA where a single nucleotide differs between organisms (Altshuler et al, 2000). These SNPs were used in order to analyze the discrepancies between human DNA from Jones 3 both the class population, and from the global population. The DNA sequences from the global population were obtained from the GenBank database of the National Center for Biotechnology Information (NCBI). The BLAST function was used in order to find DNA sequences that were similar to mine. BLAST compared my DNA sequence to all of those that were stored in the database, and provided a list of the top 100 DNA sequences that were most similar to the sequence that was input into the tool. Each sequence contained a summary that included the size of the region of the sequence, the Max score, the E value, and the % identity. The differences between the sequences of DNA allow for patterns of evolution to be observed. The DNA isolated from my lab partner and myself was also compared to determine the proportional divergence, the average number of substitutions, and the divergence time between her and myself. Her and I are twins, so we hypothesized that we had zero divergence, zero nucleotide substitutions, and that the divergence time would also be zero. . The mutations that were found within the D-loop DNA can be traced back to a common ancestor. The reason that our DNA differs from that of our common ancestor is because when an organism splits off from that common ancestor, the begin to accumulate mutations. It is from this theory that the molecular clock arose. The molecular clock theory claims that a linear relationship occurs between the number of mutations an organism experiences and the time since it had diverged from its common ancestor. This theory can be used to compare two groups of organisms, so long as the amount of time since the two organisms had shared a common ancestor is proportional (Gojobori et al, 1990). From the divergence time of two organisms since their last ancestor, the theory of how they evolved was hypothesized. One theory, the multiregional theory, states that several Jones 4 different global archaic populations contributed to the evolution of modern humans. The divergence time for this model is 1.5 million years to 3 million years. The divergence model’s divergence time is much shorter, at 200,000-500,000 years. This theory states that a single population that was located in Africa evolved to form modern humans. Materials and Methods DNA Extraction: To extract the cells that were needed for PCR amplification, 10 mL of 0.9% NaCl saline solution was swished around in one’s mouth for thirty seconds. This was done to collect cells that would detach from the walls of one’s cheek. 1.5 mL of this DNA solution was transferred to a 1.5 mL tube. The sample of DNA solution was then placed in a microcentrifuge and spun for five minutes at maximum speed. Most of the supernatant was removed, and the remaining pellet that contained the cells was resuspended. This resuspended cell solution was then transferred to a microcentrifuge tube that contained 100 µL of Chelex solution. The Chelex binds to the DNA and prevents degradation. This solution was heated in thermocycler for ten minutes at 100°C. The heating of the DNA denatured it, and caused the double helix to unwind, thus creating single stranded DNA. After the denaturing step, the DNA sample was spun in a microcentrifuge for a minute. This causes the DNA to become suspended in the supernatant, and the unwanted cell parts and Chelex beads to pellet at the bottom of the tube. 50 µL of the DNA containing supernatant was transferred to a fresh 1.5 mL tube, and the DNA was stored in a freezer for one week. DNA Amplification by PCR Jones 5 22.5 µL of a mixture containing the primers and the ddH2O was added to a tube that contained a ready-to-go-PCR bead. The ddH2O and primers mixture contained the forward primer HVIF15971: 5'-TTAACTCCACCATTAGCACC-3', the reverse primer HVIR16410: 5'GAGGATGGTGGTCAAGGGAC-3', and ddH2O, while the ready-to-go PCR bead contained 1.5 units of Taq DNA polymerase, 10mM TrisHCl, 50mM KCl, 1.5mM MgCl2, and 200µM of each dNTP. 2.5 µL of my genomic DNA was added to this mixture. The whole solution was mixed, then run through 30 cycles on a thermal cycler set to the following program: Initial denature of DNA for 2 minutes at 94°C, Denaturing for 30 seconds at 94°C, Annealing for 30 seconds at 58°C, Extending for 30 seconds at 72°C. Steps 2-4 are then repeated thirty times before its final extension for 6 minutes at 72°C, and indefinite hold at 4°C. The primers are nucleic acids that are added to the DNA in order to create a starting point for DNA synthesis. The Taq DNA polymerase binds to the 3’ end of these two primers and adds bases to replicate identical strands of each single strand. All other portions of the PCR mixture aid in this process. The steps of the thermo cycler also aid in this process. The denaturing steps separate the double helix to create single strands. The annealing step is when the DNA primers bind to the single strands of DNA. The extending step is when, starting at the 3’ end of the DNA primer, the Taq polymerase adds bases to the single strands of DNA. The repeating of this process allows for efficient amplification of the section of genomic DNA that is being studied. Agarose Gel Electrophoresis: A 1.0% agarose gel was prepared by American University graduate students. 25 µL of the PCR amplified DNA was added to a single well of the agarose gel. A marker containing 1Kb Jones 6 Plus DNA Ladder was added to another well. The gel was run for an hour to allow the DNA to separate. Sequencing of the Gel-Purified D-loop PCR Product and Electrophoresis: 3.4 µL of the column-purified PCR template was added to the DNA sequencing master mix that contained dNTPs, sequencing primers, ddH2O, DNA polymerase, and sequencing reaction buffer. Next 1.8 µL of this mixture was added to each ddNTP tubes (one contained ddATP, another ddCTP, another ddGTP, and finaly ddTTP). Each ddNTP lacks the 3’ –OH group which is necessary for elongation. Therefore, the ddNTP acts as a chain terminator. Each ddNTP binds to a different base on the template DNA strand, and they create different lengths of copied fragments. After the Master Mix, Template, and ddNTP had been combined, the mixture was placed on ice before being placed in the thermo cycler. This was done to prevent degradation of the enzymes. The mixture was put through 30 cycles in the thermocycler. The cycles are as follows: 2 minutes at 92°C, 30 seconds at 92°C, 30 seconds at 53°C, and 1 minute at 70°C. Steps two through four were repeated an additional 29 times. After the 30 cycles were complete, the reaction was held at 4°C. This process allowed for the ddNTPs to bind to the single stranded templates, and for many fragments of these shortened DNA strands to be created. After the DNA fragments were amplified, they were first run through a polyacrylamide gel, then they were run on an automated sequencer. Running the fragments through the polyacrylamide gel allowed the different fragments to separate by length. The longest ones stayed towards the top, while the smaller ones travelled all the way through the gel towards the bottom. By reading the gel from the bottom to the top, the sequence of the fragment of DNA Jones 7 being studied could be ascertained. However, this method is very time consuming, so the automated sequencer was used in order to catalyze the process. Nucleotide Sequence searches in the NCBI Database: For this process, the National Center for Biotechnology Information’s genome database, GenBank, was used. The Basic Local Alignment Search Tool (BLAST) was used. My D-loop nucleotide sequence, the query sequence, was entered into the data base to be compared to other sequences that were stored within the genome database. The database that was used was the Nucleotide collection (nr/nr), the organism was Homo sapiens, and the optimization was for somewhat similar sequences (blastn). Once all of the data was entered, the data base will find similar or identical sequences that matched my query sequence. The matched sequences included an accession number that linked the sequence to the data base, a Max score (S), and an E score. A high Max score indicated a more significant match, while conversely, an E score that was closer to 0 was indicative of a more significant match. For this part of the experiment, the two sequences that were the most similar to my query sequence were examined, and the accession number, the max score, the identity, and the associated nucleotide positions were recorded. Calculating Divergence: The average proportional divergence between humans and chimpanzees and between humans and Neanderthals was calculated by taking the average of the humans (done using excel) and combining them and comparing them with the average of the chimpanzee and the Neanderthal respectively. From the average proportional divergence (Pd), the number of substitutions per site (Kn) was calculated. This conversion was performed by using the following Jones 8 equation: -ln [1-Pd]. The average proportional divergence was then taken from all of the humans in the class by taking the average of all of the divergence. Then, the same Kn equation was used to find the average number of substitutions per site. Next the rate of nucleotide substitution between humans and chimpanzees, humans and Neanderthals, and myself and my lab partner was calculated. The human and chimpanzee nucleotide substitution rate was found by dividing the Kn of Humans and chimpanzees by the human chimpanzee divergence time of 5,000,000 years. For the human and Neanderthal nucleotide substitution rate, their Kn rate was taken and divided by the modern human divergence of 200,000 years. For the nucleotide substitution rate between my lab partner and myself, the Kn value was divided by the rate of substitution between Neanderthals and modern humans that was previously calculated. From these calculated rates of substitution, the divergence time was calculated. For the divergence between chimpanzees and humans, this was done by dividing the Kn of humans by the rate of nucleotide substitution for humans and chimpanzees. The divergence between Neanderthals and humans was calculated by dividing the Kn of humans and Neanderthals by the rate of substitution between humans and Neanderthals. The divergence time was again calculated to determine the divergence time between my lab partner and myself. Again, this was done by dividing the Kn values of her and myself by the rate of substitutions between Neanderthals and humans. Results: My D-loop DNA was successfully sequenced, and the resulting 400 base pair sequence was compared to other sequences that were stored in the GenBank data base through BLAST. Jones 9 Ten separate sequences of DNA were found to have 100% identical sequences in the observed region to my DNA. The accession numbers for the ten matching sequences can be found in table 1. For each of the ten sequences, the Max score was found to be 722, and the E score was zero. The match number for each of the ten sequences from the data base matched my DNA 400/400 base pairs. For the first match (KC878724.1), the nucleotide positioning was 15992-16392. The remaining nine matches all have nucleotide positioning of 15991-16390. All of the matches and my DNA sequence were of the K1a1b1a haplogroup, which is commonly found in Ashkenazi Jews (Table 1). Accession number Max score E value % identity Nucleotide postioning Haplogroup Match Accession number Max score E value % identity Nucleotide postioning Haplogroup Match 1st match KC878724.1 722 0 100% Query: 1400 Subj: 1599216392 K1a1b1a 400/400 6th JQ705204.1 722 0 100% Query: 1400 Subj: 1599116390 K1a1b1a 400/400 2nd match KC914580.1 722 0 100% Query: 1400 Subj: 1599116390 K1a1b1a 400/400 7th JQ704654.1 722 0 100% Query: 1400 Subj: 1599116390 K1a1b1a 400/400 3rd JQ706006.1 722 0 100% Query: 1400 Subj: 1599116390 K1a1b1a 400/400 8th JQ703855.1 722 0 100% Query: 1400 Subj: 1599116390 K1a1b1a 400/400 4th JQ705745.1 722 0 100% Query: 1400 Subj: 1599116390 K1a1b1a 400/400 9th JQ703485.1 722 0 100% Query: 1400 Subj: 1599116390 K1a1b1a 400/400 5th JQ705628.1 722 0 100% Query: 1-400 Subj: 1599116390 K1a1b1a 400/400 10th JQ703069.1 722 0 100% Query: 1-400 Subj: 1599116390 K1a1b1a 400/400 Table 1 This table includes the data collected from the GenBank genomic data base. All of the data was collected using BLAST. The data includes the first ten hits on BLAST, and the accession number, max score, % identity, nucleotide positioning, match number, E value, and haplogroup of each sequence was collected. Jones 10 My D-loop sequence was not only compared with this data base, but also with those of the members of the class, and with two different species: chimpanzees and Neanderthals. The proportional divergence, number of nucleotide substitutions, rate of nucleotide substitutions, and divergence time were calculated. The most divergence was observed between the human and chimpanzee DNA. The proportional divergence was calculated to be 0.16562, the number of substitutions per site was 0.181066345 per site, and the divergence time was given to be 5,000,000 years (Tables 2 and 3). The second most divergence can be seen between humans and Neanderthals. The proportional divergence was 0.058508182, and the number of nucleotide substitutions was 0.060289621 substitutions per site (Table 2). From the rate of substitution of humans and Neanderthals (7.6801412x10^-8 substitutions per site per year), the divergence time was found to be 761,811.2802 years (Tables 2 and 3). The rate of human divergence observed in the classroom population was more close to the human versus Neanderthal divergence that the human versus chimpanzee divergence. The proportional divergence of the classroom population was 0.015242915, the number of nucleotide substitutions was 0.015360282 substitutions per site (Table 2). The divergence time, also known as the amount of time to which the class population had shared a common ancestor, was 424,161.7184 years (Table 3). The divergence between my DNA and that of my lab partner, Julia, was also calculated. It should be noted that Julia and I are twin sisters. Therefore, the proportional divergence was zero and the number of substitutions was zero substitutions per site (Table 2). The divergence time was therefore calculated to be zero years (Table 3). Our D-loop sequences were completely identical. Human v Chimp Average Proportional Divergence Average number of substitutions per site Rate of nucleotide substitution (substitutions per site per year) Human v Neanderthal Human v Human Julia v Me Jones 11 0.16562 0.058508182 0.015242915 0 0.181066345 0.060289621 0.015360282 0 3.621326898x10^8 7.6801412x10^-8 3.621326898x10^-8 7.6801412x10^-8 Table 2 The calculated values of average proportional divergence, average number of nucleotide substitutions per site, and the rate of nucleotide substitutions for human DNA versus chimpanzee DNA, human DNA versus Neanderthal DNA, the DNA of the classroom population, and Julia’s DNA versus my DNA. Divergence time (years) Human v Human v Human v Chimp Neanderthal Human Julia v Me 5000000 761811.2802 424161.7184 0 Table 3 The divergence time, in years, was calculated for humans versus chimpanzees, humans versus Neanderthals, the Humans in the classroom population, and Julia versus myself. Discussion: From the data collected, several patterns of evolution can be observed. The easiest pattern of evolution to observe is the divergence time between the observed organisms. The divergence time is the point in the past where the organisms had a common ancestor. The longest divergence time that was observed was between chimpanzees and humans. Chimpanzees and humans last shared a common ancestor 5,000,000 years ago. The second longest divergence time was between Neanderthals and humans. Their last common ancestor was alive 761,811.2802 years ago. This is also the point in time in which modern humans split off and evolved from Neanderthals. The common ancestor between all modern humans was alive 424,161.7184 years. This is the shortest divergence time. The data indicates that the closest relative to humans is first Neanderthals, followed by chimpanzees. In fact, other studies have shown that the Neanderthal is the closest known relative to modern humans (Noonan, 2010). Jones 12 Population geneticists have calculated that the divergence time for modern humans was 200,000 years ago. The calculated divergence time from the data is about twice as long as this time. This could be due to an over representation or under representation of certain populations and DNA types. For example for the chimpanzees and the Neanderthal data comes from one sample of DNA each, while the human DNA came from the entire class population of 54 humans. This could cause some fluctuations in the calculations and averages. The theory that the Neanderthal is the closest known relative to modern humans is again supported by the average proportional divergence and average substation per site rate. The average proportional divergence between Neanderthals and humans is 0.058508182, while the substation rate is 0.0602896213 substitutions per site. Between humans and chimpanzees, the average proportional divergence was much larger, at 0.16562, while the substitution rate was 0.1810663449 substitutions per site. Compared to the average proportional divergence of 0.015242915, and the average substitution rate of 0.0153602824 substitutions per site that was taken from the class population, the Neanderthal DNA sequence diverged much less, i.e. was much closer to that of the modern human. Based on the divergence time that was calculated from the average substitution rate, the model of human evolution was determined to be the displacement model. The displacement model of evolution, also known as the single origin model of evolution, suggests that Homo sapiens came from a single starting population in Africa about 200,000-500,000 years ago (Hansen et al, 2000). This model is the best fit, because the calculated divergence time is within the estimated arising of the single starting human population. The data does not support the multiregional model, because the estimated time of evolution for this theory is much longer. Jones 13 Similar calculations were done to determine the point of divergence between my lab partner and I. As previously mentioned, my lab partner and I are twins. From this, we had hypothesized that we would have an identical D-loop sequence. This hypothesis was supported by the data obtained. The DNA sequences that were isolated were identical. This means that the proportional divergence and the number of substitutions between us is zero. Therefore, the divergence time is also zero. Both my partner and I used BLAST to compare our D-loop sequence with those that were stored in the GenBank database. We both had the same ten matches. All of them had a max score of 722, an E value of 0, and a % identity of 100%. The high max score and the low E value indicate a significant match. For all ten observed sequences, all 400 of the bases were identical. This sequence had the haplogroup of k1a1b1a. This haplogroup is found in Ashkenazi Jews and Europeans. 10% of Europeans and 45% of Ashkenazi Jews fit into this haplogroup. Also, about 19% of Ashkenazi Jews that fit into the k1a1b1a haplogroup are of Polish descent (Grzybowski et al, 2007). This haplogroup matches both my lab partner’s and my heritage. We are both Ashkenazi, and distantly are of Polish descent on out maternal grandmother’s side. The data that was obtained from the comparison of the D-loop sequence matches the personal data of my lab partner and myself. References: Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407: 513-516, 2000. Gojobori T, Moriyama EN, Kimura M, Molecular clock of viral evolution, and the neutral theory. CrossMark 87(24):10015-10018: 1990. Grzybowski T, Malyarchuk BA, Derenko MV, Perkova MA, Bednarek J, Wozniak M, Complex interactions of the Eastern and Western Slavic populations with other European groups as Jones 14 revealed by mitochondrial DNA analysis. Forensic Science International: Genetics 1(2): 141147, 2007. Hansen TF, Armbruster WS, Antonsen L, Comparative Analysis of Character Displacement and Spatial Adaptations as Illustrated by the Evolution of Dalechampia Blossoms. The American Naturalist 156(S4):S17-S34, 2000. Olivo PD, Van de Walle MJ, Laipis PJ, Hauswirth WW, Nucleotide sequence evidence for rapid genotypic shifts in the bovine mitochondrial DNA D-loop. Nature 306: 400-402, 1983. Noonan JP, Neanderthal genomics and the evolution of modern humans. Genome Research 20:547-553, 2010. Shi R, Chiang VL, Facile means for quantifying microRNA expression by real-time PCR. BioTechnniques 39: 519-525, 2005. "Mitochondrial DNA." - Genetics Home Reference. N.p., n.d. Web. 01 Apr. 2014