A Short Course of DNA Forensic Statistics (Organized by CWAG in April 12-16, 2010 at Mexico City. Mexico) DNA Forensics Statistics (A 15-lecture Short Course on Statistical Principles of DNA Forensics) Organized and Delivered by Ranajit Chakraborty, PhD Director, Center for Computational Genomic, Institute of Investigative Genetics Professor, Department of Forensic and Investigative Genetics University of North Texas Health Science Center Fort Worth, Texas 76107, USA Tel. (817) 735-2421; Fax (817) 735-2424 e-mail: rchakrab@hsc.unt.edu A. Learning Objectives and Organization of the Course: • • • • • • • • • Designed with the learning objectives for preparing the students to understand basic statistical and population genetic principles that are needed to develop the statistical protocols for interpretation of DNA Forensic data Population data as well as casework evidence (DNA) data and their related statistical interpretations will be discussed Successful completion of this course should ensure that the students are familiar with underlying assumptions of DNA Forensic statistical computations and their interpretations Reference materials supplied in the course should also equip the students to understand the scope of general acceptance of such statistical interpretations in the context of legal use of DNA forensic statistics Pre-requirement: Students should be familiar with laboratory data of DNA forensics, and have some knowledge about the DNA forensic scenarios and how DNA results are used in forensic investigations. Given in the form of 15 lectures (extending over 5 days), each of 90 minutes duration (e.g., two lectures in the morning, and one in the early afternoon). The instructor will be available for additional time to review topics discussed on each day. Such review sessions after Parts III and IV may be devoted to use of software to analyze real cases, and hence participating students can bring their own cases to apply the principles learned in the course. For each student, successful completion of the course will be judged based on student’s: (i) class attendance and participation in the discussions; and (ii) satisfactory performance in a take-home examination conducted at the end of the course. Each student should individually complete their own take-home examination. The students are allowed to consult lecture notes, reading materials, or do internet search on their own to complete the examination, but copying from other students 1 • are strictly forbidden and if so detected, will count against certifying successful completion of the course. Power-point slides, list of reference reading materials, and copies of key publications discussed in the course will be made available to the students for future references to these materials. B. Course Syllabus: B.1. (Part I: Statistical Principles used in DNA Forensic Statistics - Day 1) • Lecture 1: Elements of Probability Theory (Concepts of: Random Experiment; Random Variable; Probability Distribution; Examples of Standard Distributions relevant for Forensics, such as Binomial, Multinomial, Poisson, Normal, Chisquare, etc.; Measures of Central Value and Variability of Random Variables – Mean and Variance) - all illustrated with real examples relevant for DNA Forensics • Lecture 2: Estimation and Hypothesis Testing (Concepts and Differences of Parameter and Statistic; Features of Good estimators; Precision vs. bias of estimation; Methods of estimation, such as Method of Moments, Maximum Likelihood Method; Concept of Simple and composite hypothesis; Testing Procedures and two types of errors (Type I and Type II), Methods of hypothesis testing such a likelihood ratio test), concept and use of confidence interval – all illustrated with DNA Forensics relevant examples • Lecture 3: Nature of DNA Forensic Data (Discrete Multinomial, Bivariate and multivariate distributions, concept of pairwise and mutual dependence; Conditional and Marginal probabilities; Population databases; Data on Relatives – Modeling of such data based on population genetic models) – real life example of data will be provided B.2. (Part II: Population Genetic Principles used in DNA Forensic Statistics- Day 2) • Lecture 4: Estimation of allele and genotype frequency at the level of one or more loci (Gene count method, Hardy-Weinberg Equilibrium, Causes and nature of deviation from HWE, Linkage Disequilibrium and its estimation, Effects of deviation from HWE and LD on genotype frequencies; Effect of presence of relatives in population databases) • Lecture 5: Population Substructure and Genetic Distance (Theory of population substructure and its impact; Empirical data on Human Genetic Variation of global populations, and those based on DNA forensic loci; extent of population substructure in human populations) • Lecture 6: Evolutionary forces underlying maintenance of genetic variation of DNA forensic loci (Concepts of genetic drift, mutation, and natural selection; 2 Features and rates of mutation for autosomal STR, Y-STR, and mtDNA markers; evidence of null and multiple alleles; possible limitations of mutation rate estimates and their differences across different types of loci; drift effect on these loci as seen in population variability in large cosmopolitan versus small isolated populations) B.3. (Part III: DNA Forensic Statistics Protocols – Standard Cases - Day 3) • Lecture 7: DNA Forensic Issues and Their Statistical Assessments (A brief review of History of DNA Forensics and changes in DNA markers used in Forensics; Currently employed sets of markers – autosomal STRs, mtDNA, and Y-STR haplotypes; Three generic types of forensic issues: Transfer evidence, Mixture Analysis, and Kinship Analysis and the associated questions for these issues; Three types of approaches to address these issues – Frequency-based, Likelihood based, and Bayesian logic. Associated assumptions and data requirement underlying these approaches • Lecture 8: Current Paradigm of Solving Transfer Evidence (Issues for DNA from a single source; Discussion on NRC-I and NRC-II rules and their inadequacies and adequacies; Changes of approaches based on questions; Random match probability and conditional random match probability; Likelihood and ratio of likelihoods and their interpretations – concept of prosecutor fallacy; Database search and its impact on RMP calculation) • Lecture 9: Statistics for DNA Mixture Analysis (Methods of deconvolution of DNA mixture and their attendant assumptions; Concept of exclusion probability; likelihood calculations based on mixture hypothesis; Effects of population substructure and relationships between contributors in a mixture B.4. (Part IV: DNA Forensic Statistics Protocols – Complex Cases - Day 4) • Lecture 10: Statistics for Kinship Analysis (Logic of parentage testing; exclusion probability and Paternity Index; Generalization of this to reverse parentage; pedigree-based likelihoods; applications for missing person identification; impact of population substructure and mutations) • Lecture 11: Database search issues and familial search (Pairwise comparison of profiles and their I-T-O expectations; Multilocus allele and genotype sharing; empirical data on observations and expectations, effects of population subdivision and presence of relatives; concept of familial search and limitations of its use in large databases; cautionary guards for using familial search results; impact of mutations) • Lecture 12: Lineage markers and Relevant Statistics (mtDNA and Y-STR markers; Some features of mtDNA and Y-STR population databases; Genetic differences between populations based on mtDNA and Y-STR variation; FST for 3 mtDNA and Y-STR haplotypes; Counting method for using mtDNA and Y-STR in forensic cases; Conservativeness using confidence interval versus substructure effect; effects of mutation on matches based on lineage markers) B.5. (Part V: DNA Forensic Statistics – More Recent Approaches and Limitations – Day 5) • Lecture 13: SNPs and their utility in DNA Forensics (Brief outline of SNPs in the human genome – effect of mutation and genetic drift on SNPs; Different platforms of SNP typing; Concept of Haploblock organization of SNPs and their portability across populations; Number of SNPs needed for efficient use of SNPs for DNA Forensics; Possible approaches for increasing efficiency of utility of SNPs for DNA Forensics) • Lecture 14: Statistics for Low Copy Number (LCN) DNA Evidence (brief biological background of LCN; Current practices of interpreting LCN data and concerns in such approaches; Concerns of interpreting LCN data in DNA mixtures; Possible approaches of refinement) • Lecture 15: Lecture 15: Joint Match Probabilities for mtDNA, Y-Chromosome and Autosomal Markers (Mendelian genetics of Y- and mtDNA-inheritance; Meaning of Y-STR/SNP and/or mtDNA match vs. that based on autosomal loci; Match probability computations for these types of systems (i.e., Y-linked and mtDNA vs autosome); How to define population and how to define parameters of substructure adjustments; Can or should the evidence from autosome, Y-, and mtDNA-markers be combined?; Can they be multiplied for getting a combined value? – Some recent results; Current recommendations) • General Comments and Take Home Examination: (Experiences from presenting court evidence and testimony; Addressing issues of “right” vs. “wrong” questions; Addressing issues of multiple questions require multiple answers which are not indicative of lack of general acceptance; Examples of use of analogy in answering technical questions; e.g., daily use of the concept of sampling and estimation from small sample sizes; use of non-technical interpretation of complicated statistics; e.g., Np rule of cold-hit statistic does not alter the RMP statistic for rarity of a DNA profile) Take Home Examination and Expectation from that 4