Biology of STRs Artifacts in Genotyping STRs • A number of artifacts are possible: – Stuttering – Non-template additions – Microvariants – Three peaks – Allele dropouts – Mutations • All interfere with reading a DNA profile accurately and consistently Stuttering • Stuttering is caused by the very structure of the STRs that make them good markers • They are repeats • That are highly polymorphic • Stutter product is a band that has the wrong number of repeats • Either one repeat more or one less • Caused by strand slippage Strand-slippage 5’ ATGCGGCGGCGTGTGTGTGTGGCG 3’ TACGCCGCCGCACACACACACCGCCG GTGT GT GTGT GC 5’ ATGCGGCG 3’ TACGCCGCCGCACACACACACCGCCG DNA Replication Or PCR GT 5’ ATGCGGCGGCGTGTGTGT 3’ TACGCCGCCGCACACACACACCGCCG Misalignment GT 5’ ATGCGGCGGCGTGTGTGTGTGGCGGC 3’ TACGCCGCCGCACACACACACCGCCG Elongation Strand Slippage • Occurs during extension step of PCR • The newly formed strand of DNA skips one repeat unit – starts complementary base pairing with next repeat • Pushing out a non-base paired loop from the template strand of DNA • Usually causes a deletion of one repeat unit – therefore band will be one unit smaller than true genotype Strand Slippage • Naturally this is the mechanism that makes repeats polymorphic • When it happens during PCR it can produce a band that is not real: – Genotype will be wrong – One repeat unit lower or higher than reality • Rarer in Tetranucleotides than any other repeats – which is why tetra’s are used Amount of Stutter Product • Stutter is usually rare • Therefore might show a small bump - can usually be differentiated from a true band • Earlier in PCR reaction strand slips – More stutter product will be produced • Or if genotyping protocol doesn’t work well true band may be very low – Difficult to separate stutter band from true band Stutter Products Call these genotypes: Stutter Stutter Stutter ? Calling Alleles • Biggest problem with stutter bands: – They are the same size as a real allele! • Especially difficult if you know the DNA sample is mixed • Or you are unsure whether sample has been contaminated • Difficult to determine: – Stutter band – Minor allele (because less DNA) 13 CODIS STR Loci • All produce some stutter products • Longer alleles produce more stuttering – Why does this make sense? • Stutter percentages for Tetranucleotides: – From Less than 1 % – Up to 15% - of the true allele size – Therefore always calculate percentage of small band’s peak height – Be sure < 15% height of large band Reducing Stuttering Products • Changing PCR conditions • Faster DNA Polymerase – Faster it works, less chance for slippage • STRs with longer repeats (> 4 bps) – More difficult to “skip” past repeat • STRs with imperfect repeat units – Complex and compound repeats – More difficult to skip past repeat if next repeat unit sequence is different Summary of Stutter Products • One repeat unit more or less than real allele peaks • Less then 15% real allele height • Quantity of stutter band depends on: – When in PCR reaction first slippage occurs – Allele size (bigger alleles, more stutter) – PCR Conditions – Polymerase used – Repeat length and sequence Non-Template Additions • Polymerase often adds an extra Adenosine to the end of the newly formed sequence • Not a part of the template sequence • Makes PCR product one base longer than actual sequence • If your PCR reaction forms both +A and -A products then your band will be wide Non-Template Additions • Want to have peaks as clear as possible • Therefore want all PCR products to be identical • Either all +A or all -A • Imagine case where you were genotyping a dinucleotide, with stutter, and half the products were +A and half were -A • Impossible to separate genotypes Non-Template Additions • Set up PCR conditions so that every product will be +A • Conditions: – Final extension for 10 mins – Allows all products to be fully adenylated – Primer ends in a guanosine • Commercially available kits turn every allele (and ladder) into +A Overloading Sample • Signal on gel is too strong – will be difficult to call • May result in a split peak • Or a peak that is off scale • Caused by: – Too much DNA sample in PCR reaction – Primer concentrations too high • Why DNA quantification is so important Non-Template Additions and Overloading Samples Relative Fluorescence (RFUs) DNA Size (bp) off-scale -A 10 ng template (overloaded) +A D3S1358 VWA FGA 2 ng template (suggested level) Figure 6.5, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press Microvariants • Remember these are variants of the repeat that are not a full repeat unit • Example – TH01 9.3 allele • As opposed to stutter allele microvariants are not same size as expected allele • Problem is – determining whether there is a true microvariant in the person • Or you are seeing a normal band being shifted over for some genotyping reason? Microvariants 1. True microvariants must be validated to happen in many samples • Even if variant is rare it must show up in more than one individual to be considered a true microvariant 2. Exact distance in base pairs should be calculated • • 9.3 means 9 repeats plus 3 bases Always calculate in bases exactly how “off” the microvariant is Sequence Microvariants • Sometimes there are also sequence differences in these polymorphisms as well as length differences • The only way to genotype a sequence variant is to sequence the PCR product • Not necessary for Forensics because you are simply matching genotypes • These variants are not important for Forensics analysis Peaks outside of the Ladder • Sometimes you will see a peak that it outside of the expected range for any marker (between markers?) • What could cause this? – Unsuccessful PCR product – Primer dimers or etc. – Person really has a new allele • Check with different set of primers • Sequence new allele and region Three Peaks • Sometimes three bands may be seen • What could cause three bands? – Stuttering – Mixed or contaminated samples – Genotyping error – True duplication or extra chromosome in the individual • Need to validate what is seen in gel Three Peaks 1. Check other markers in panel: 1. Is there evidence of mixed or contaminated samples in any other markers? 2. Check database information for this marker: 1. More than 50 tri-allelic patterns have been reported as possible with 13 CODIS loci 3. Sequence or genotype this region: 1. Is there truly a duplication or extra chromosome in this person? Allele Dropout • Most worrisome problem • May call a person homozygous when really they are heterozygous • What can this be caused by? – Larger allele is not amplified successfully – Primer site mutation • Rare with chosen tetranucleotides: – Alleles are very similar in size – Primers have been optimized and chosen in regions that are very stable Avoiding Allele Dropout • Chose primers carefully • Work with polymorphisms that have alleles of similar size • Always check genotypes with HardyWeinberg Equation – Make sure you see the expected number of heterozygotes population wide • Most commercial kits have taken care of all these issues “Fixing” Allele Dropouts • Add a “degenerate” primer – Extra primer with known polymorphism – Three primers total will be added • Lower annealing temperature – Reduce the stringency of primer binding • Remember that with Forensics what matters is matching genotypes – As long as allele always drops out, don’t have a problem Mutations • STRs do mutate at an expected mutation rate over time • Mutation may cause: – New Alleles – Change primer binding regions – Sequence changes (less important) • Very rare events • Can be validated by examining families Mendelization of Alleles • Using family members to determine which alleles are possible • If you know parent’s alleles then there are only so many genotypes possible for children • Mendel’s law of segregation • All STRs have been genotyped on CEPH families – huge family sets from Utah Mendelization of Alleles 8/12 3/8 3/14 8/14 3/12 5/11 2/9 2/11 10/11 • As always – must validate mutation • By sequencing or regenotyping Mutation Rates • Mutations rates of 13 CODIS have been calculated over thousands of meioses • All 13 are between 1 to 5 per 1000 generational events • Highest mutation rates: – Markers that are most polymorphic • Lowest mutation rates: – Markers that are least informative Impact of Mutations • Paternity testing – Can cause problems – Because father may not match true child if genotype has change in child – Compare many STR loci • Identity matching – Will not cause a problem – Because mutation will be consistent over a person’s lifetime and in all tissues Genotyping Errors • All the previous were artifacts that can be explained • However the problems you really worry about are unexplained errors • Especially if sample may be: – Contaminated – Mixed samples • Need to always validate any artifact • Be sure it’s not genotyping error Any Questions? • Review Chapters 1 – 6 • Email me at least 2 questions you have about the first 6 chapters • Next class will be review for Exam • Exam One – February 5th