DNA Profiling using Short Tandem Repeats This slide show includes information from the following website: http://www.cstl.nist.gov/biotech/strbase/ WWW.CRIMESCENE.COM thanks John M. Butler, Ph.D at the NIST Biotechnology Division for his help and permission to include information and graphics in this presentation. • • DNA Profiling using STRs: An Overview STR analysis of DNA samples is a DNA profiling technique that uses PCR (polymerase chain reaction) to copy samples of DNA at distinct locations and analyze their sizes (the size of the DNA piece being the basis of comparison between samples). STRs are Short Tandem Repeats of patterns of nucleotides spread throughout our DNA AATG AATG AATG AATG AATG AATG AATG DNA molecule 7 short, tandem (back to back) repeats of the nucleotide sequence AATG • • • • • The number of repeats at a certain distinct region (locus, plural=loci) of DNA is highly variable from person to person allowing their use in human identity testing The number of nucleotides involved in the repeats can vary between 9 and 80 (called variable number of repeats, VNTRs, or minisatellites) or between 2 and 5 (called microsatellites, SHORT tandem repeats, STRs) Several loci along our DNA have been identified as possessing STRs (thanks in part to the Human Genome Project), and the DNA profiling community has selected 13 regions for identity analysis These 13 loci ALL contain 4 nucleotide (tetrameric) repeats Through population studies, the numbers and types (nucleotides involved) of these repeats at these loci have been analyzed affording probability estimates in certain ethnicities 13 STR Loci for DNA Profiling •13 STR loci have been officially chosen to be used in the Combined DNA Index System (CODIS) and are scattered among our 23 chromosome pairs •CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, and D21S11 •While not an STR, the AMEL (amelogenin) locus on chromosome 23, the sex chromosome, is included for gender determination TPOX D3S1358 D8S1179 THO1 D5S818 VWA FGA CSF1PO D7S820 AMEL D13S317 D16S539 D18S51 D21S11 AMEL DNA Profiling using STR: An Overview • • • • Because we have 23 pairs of chromosomes (23 chromosomes from our mother, 23 from our father), the loci are actually duplicated; each of the 13 CODIS loci exist on one chromosome from our mother and one from our father. Therefore, when analyzing the number of repeats at a certain locus, generally we get 1 (the number of repeats is the same) or 2 (the number of repeats is different) answers, representing the pair. These are known as alleles. We get one allele from each parent. If the number of repeats is the same, we are “homozygous” at that locus. If the number of repeats is different, we are “heterozygous” at that locus. Any variety of homozygosity or heterozygosity can be present at the 13 loci. Therefore, you can see how STRs can be used for maternity and paternity testing. AATG AATG AATG AATG AATG AATG AATG 7 short, tandem repeats of the nucleotide sequence AATG AATG AATG AATG AATG AATG AATG AATG AATG AATG 8 short, tandem repeats of the nucleotide sequence AATG DNA molecule from mother DNA molecule from father Advantages of STR Analysis • • The allelic variation (number of repeats) of STRs is more easily discernable than other techniques (a difference in repeat of just one, or 4 nucleotides, can be seen with current methods The number of repeats at the STR loci is discrete, meaning from current studies, there are a set amount of answers, facilitating interlaboratory comparisons. – • • • E.g., at locus THO1, which is found on chromosome 11, studies have shown that the tetramer AATG repeats anywhere from 3 to 14 times. If one strand of DNA (from your mother) contains 3 repeats and the other (from your father) contains 5 repeats, the profile is dubbed “3,5”. This profile is then used for comparison to other DNA samples. Because PCR (polymerase chain reaction) is used to amplify the STR loci, very small quantities of DNA are needed (a blood droplet the size of the head of a pin) Because the size of the STR loci are relatively small, the odds that the STR locus will be completely intact and therefore available for analysis in a degraded DNA sample (DNA that has already been somewhat cut from just being deposited in environments where decomposition could occur) are higher Technology allows complete analysis in a matter of hours Sources of Biological Evidence • • • • • • • • Blood Semen Saliva Urine Hair Teeth Bone Tissue DNA in the Cell chromosome cell nucleus Double stranded DNA molecule Orange = A Target Region for PCR T C G Green = T Purple = C Yellow = G T C T G A A A T C A T T G C AC IndividualA T G nucleotides G A (A, T, C, G) Steps in DNA Sample Processing Sample Obtained from Crime Scene or Paternity Investigation Biology DNA Quantitation DNA Extraction PCR Amplification of Multiple STR markers Technology Separation and Detection of PCR Products (STR Alleles) Comparison of Sample Genotype to Other Sample Results Sample Genotype Determination Genetics If match occurs, comparison of DNA profile to population databases Generation of Case Report with Probability of Random Match DNA 101 •It has been mentioned that the repeats of STRs are composed of nucleotides. Amazingly, the genetic code (the DNA represented by all our 23 pairs of chromosomes) is composed of only four nucleotides in a string: Adenine (A), Thymine (T), Cytosine (C) and Guanine (G). These are the sole letters of the genetic alphabet. •Nucleotides are also known as nitrogenous bases, or just “bases”. •Adenine and guanine are known as the purine nitrogenous bases, while cytosine and thymine are called the pyrimidine bases; adenine binds only to thymine and cytosine binds only to guanine. •In a DNA molecule (on just one chromosome), the structure looks like a twisted ladder, with the rungs representing the pairs of the nitrogenous bases. Nucelotides are therefore also termed base pairs, or bps, when talking about the “double stranded” DNA molecule. •The pattern of these letters constructs genes, that in turn act as templates for proteins, that in turn help to construct and operate the human body. Yet there is enough variation to make us all unique. Mind boggling. Individual nucleotides Orange = A Green = T Purple = C Yellow = G C T TG A T C C T G A A A G C AC TG T T G T C A T A GA C G A DNA Molecule on one chromosome The Polymerase Chain Reaction •The amount of DNA represented in a pure sample dwarfs the amount represented in just the 13 CODIS loci. Therefore, these regions, and only these regions, need to be magnified for analysis, and the polymerase chain reaction (PCR) is used as a molecular Xerox machine just for this purpose. •PCR employs the use of primers, which are short pieces of single stranded DNA complementary to areas along a certain piece of DNA that you want to magnify (i.e. the THO1 locus). •Using the rule from biology 101 that A binds to T and C to G, primers are designed (and subsequently synthesized in a laboratory very easily and quickly) using this rule to bind to a region of DNA just BEFORE the STR region on one DNA strand and just AFTER the STR region on the complementary DNA strand of the double stranded DNA molecule on each chromosome (e. g. chromosome 11 for the THO1 locus). Region for primer binding A G C A T A A T T C A A T G A A T G A A T G C G T A C C T A T C G T T T T A A G T T A C T T T G T T A C G C A T G C A T STR Region in red (3-AATG repeats) Region for primer binding - - - - Rest of DNA on chromosome 11 from one parent DNA Amplification with the Polymerase Chain Reaction (PCR) •The DNA sample is heated to allow the double stranded DNA to “denature” or become single stranded, so that the single stranded primers can get in. The reaction is cooled so that the primers can bind (anneal) to their complementary regions of the sample. •An enzyme (DNA polymerase, which polymerizes the nucleotides to the primers), along with spare nucleotides, is added to this mixture, and the enzyme begins to extend the primers along the regions of interest (i.e. THO1) in complementary fashion, basically copying the THO1 region and its various number of tetrameric repeats. •This reaction (denaturing, annealing, extension) is allowed to repeat itself (in a thermocycler) many times, magnifying the DNA locus, and ONLY that locus, many times. Single stranded (denatured) DNA Single stranded (denatured) DNA A G C A T A A T T C A A T G A A T G A A T G C G T A C C T A C G C A T G C A T DNA polymerase A T G Forward primer Reverse primer G C T Spare T nucleotides G C A G C A T A A T T C CA A T C G T T T T A A G T T A C T T T G T T A C G C A T G C A T DNA Amplification with the Polymerase Chain Reaction (PCR) •Because the newly synthesized fragments of DNA are used for the second round of synthesis and have one finite end representing the beginning of the primer, subsequent cycles will produce an excess of ONLY the region of interest (beginning with the start of the forward primer to the end of the reverse primer) •With 32 cycles, over 1 billion (232) molecules of DNA representing a specific locus (and only that locus) are synthesized and now dwarf the rest of the DNA of the sample. Single stranded (denatured) DNA, original template A G C A T A A T T C A A T G A A T G A A T G C G T A C C T A T C G T T T T A A G T T A C T T T G T T A C G C A T G C A T Newly synthesized DNA Newly synthesized DNA A G C A T A A T T C A A T G A A T G A A T G C G T A C C T A Single stranded (denatured) DNA, original template T C G T T T T A A G T T A C T T T G T T A C G C A T G C A T Multiplex STR Analysis • • • • Original DNA Template PCR Products • Because the primers of each locus have been stringently designed to be specific for only regions before and after their locus, over 10 loci can be copied at once in one tube Sensitivities to levels less than 1 billionth of a gram of DNA are possible Different fluorescent dyes are used to distinguish STR loci with overlapping size ranges Generally, the result (if using 13 STR loci and the sample is pure) is a mixture of as few as 13 and as many as 26 PCR products representing 13 STR loci (13 products if at EVERY locus, the individual is homozygous, or at each locus, the same number of repeats (same size) is present – OR – 26 PCR products if at every locus the individual is heterozygous, or at each locus, a different number of repeats (different size) is present. If the sample contains a mixture, more pieces will be seen. Available Kits for STR Analysis • Kits make it easy for labs to just add DNA samples to a pre-made mix • 13 CODIS core loci – Profiler Plus and COfiler (PE Applied Biosystems) – PowerPlex 1.1 and 2.1 (Promega Corporation) • Increased power of discrimination – CTT (1994): 1 in 410 – SGM Plus™ (1999): 1 in 3 trillion – PowerPlex ™ 16 (2000): 1 in 2 x 1017 STR Analysis •Once PCR has successfully been completed using any of the kits available, the products must be analyzed •One method used is capillary electrophoresis (CE), which involves injecting the PCR products through a thin capillary •Smaller sized fragments will move faster, and thus reach the fluorescence detector first. •The wavelengths emitted by each fluorescent dye is different and can be monitored. •Because it is known which fluorescent dyes are used for each locus, and it has been controlled that loci containing similar size fragments use DIFFERENT dyes, each product can be identified as it is detected. •Standards are included that contain the known sizes produced at the various loci • The fluorescence detection results in peaks representing different sizes and intensities •The amelogenin locus, while not an STR, is included for gender determination •A female is homozygous at the AMEL locus (X, X) and thus will display one peak •A male is heterozygous at the AMEL locus (X, Y) and thus will display two peaks An Example Forensic STR Multiplex Kit AmpFlSTR® Profiler Plus™ Kit available from PE Biosystems (Foster City, CA) 200 bp Color Separation 100 bp Size Separation D3 A vWA D8 D5 FGA 300 bp 400 bp 5-FAM (blue) dye D21 D18 JOE (green) dye D13 D7 NED (yellow) dye ROX (red) GS500-internal lane standard 9 STRs amplified along with sex-typing marker amelogenin in a single PCR reaction Human Identity Testing with Multiplex STRs Two different individuals AmpFlSTR® SGM Plus™ kit Smaller Homozygous at THO1 DNA Size (base pairs) Heterozygous at D16 fragmentsD3 (1 peak) TH01 Larger amelogenin D8 (2 peaks) VWA D16 D19 fragments D21 D18 D2 FGA amelogenin D3 Male (2 peaks) D19 Female (1 peak) D8 VWA TH01 D16 D21 FGA D18 Simultaneous Analysis of 10 STRs and Gender ID D2 Example of STR Allele Frequencies 45 40 TH01 Marker Frequency 35 30 Caucasians (N=427) Blacks (N=414) Hispanics (N=414) 25 20 15 10 *Proc. Int. Sym. Hum. ID 5 (Promega) 1997, p. 34 0 6 7 8 9 9.3 Number of repeats 10 Probability Estimates • • • • • Referring to the previous slide, the graph represents the frequency of a set of repeats at the THO1 locus. While it is known that the number of repeats (comprised of the tetrameric sequence AATG) varies from 3 to 14, only the repeats of 6 to 10 are represented here. Generally, if only using this graph as the basis for probability estimates, the frequency of each allele (repeat number) compared to the total number of samples used (427 for Caucasians, 414 for African Americans and Hispanics) would be used to calculate the probability estimate of THAT allele for THAT locus in THAT specific population This is repeated for all other alleles in each population, thus constructing probability estimates of specific alleles for each of the 13 CODIS loci for each ethnicity **NOTE** How is a repeat of 9.3 possible??? – – – It has been observed that in some loci, repeats cannot be entirely complete; in a stretch of DNA containing an STR, most of the repeats are tetramers (4 nucleotides), but within these, a portion of a repeat (e.g. 2 or 3 of the 4 nucleotides of the repeat) may be present AATG|AATG|AATG|AATG|AATG|AATG|ATG|AATG|AATG|AATG Because the 3-nucleotide fragment occurs within the stretch of 4-nucleotide repeats, it is included as part of the STR, but the notation is different In the above example, the number of repeats is represented as 9.3, because there are 9 intact tetrameric repeats with one partial repeat of 3 nucleotides “.3”. Probability Estimates Databases of the frequencies of each number of repeats of each locus in a given population are used to calculate probabilities. Probability estimates at each locus are multiplied by estimates at other loci to afford an overall probability estimate Because as many as 13 loci are analyzed and compared to questioned samples, probability estimates can reach over 1 in a trillion, eliminating basically everyone on the globe, underscoring the power of STR analysis. Therefore, the more loci examined, the more powerful the analysis.