The new Y Chromosome Haplotype Reference Database (YHRD) and optimized approaches for the forensic Y-STR analysis Sascha Willuweit & Lutz Roewer Institut für Rechtsmedizin und Forensische Wissenschaften Charité – Universitätsmedizin Berlin 2000 2004 2008 2014 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Workshop schedule 2015, September 1st, 2:30 pm – 6:30 pm • Different frequency estimation methods implemented in the YHRD • Mixture analysis using the YHRD • Kinship analysis using the YHRD • Ancestry information retrievable from YHRD • Subpopulation analysis (AMOVA) using YHRD • Discussion of casework examples ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 YHRD - Increasing numbers Frequency estimation ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Frequency estimation methods Constant estimators Variable estimators • Augmented counting (1/n+1) • Counting with database inflation (Brenner‘s κ) • Surveying method (Krawczak) • Coalescence based estimation (Caliebe) • Discrete Laplace method (Andersen) Enabled in YHRD ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Frequency estimation for Y-STR profiles ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Frequency estimation for rare haplotypes with „Kappa inflation“ (0 observations) Count 19.593 23 loci Singletons with kappa 19.593 n.a. 17 loci 71.246 55.675 K=0.78 3.0 x 10-6 (1.4 x 10-5)* 9 loci 125.700 30.450 K=0.24 6.0 x 10-6 (7.9 x 10-6)* 1 P̂(T h0 | S h0 ) N1 * counting - proportion of singletons estimator of the proportion of not sampled rare haplotypes in the database ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Comparison of estimators for rare haplotypes Discrete Laplace vs. counting, kappa and surveying methods using a simulated population of 1 million, with a database size of 1000 and a kappa proportion of singletons of =0.864 Courtesy of M.M. Andersen (Copenhagen) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Fig. 1 Comparison of (1) the relative frequency of a haplotype (number of times it has been observed divided by the database size) and (2) the estimated haplotype frequency using the discrete Laplace method. Note, that for frequently observed haplotypes, t... Mikkel Meyer Andersen , Poul Svante Eriksen , Niels Morling Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method Forensic Science International: Genetics, Volume 11, 2014, 182 - 194 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Interpretation tools implemented in YHRD • • • • Mixture analysis (Frequency and LR based) Kinship calculation (Frequency and LR based) Population substructure (AMOVA, Fst/Rst, MDS) Ancestry information (AI) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Mixture analysis ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Casework example Delict: sexual assault Evidence: contact stain on clothing Autosomal analysis • only ♀ component • no ♂ admixture in AMELOGENIN Y chromosomal analysis • male mixture (major, minor component) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Analyse with Mixture analysis tool (partial Y23 profiles) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Result for PowerPlex Y23 (20 loci) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Reanalysis using reduced PPY12 profiles ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Result for PowerPlex Y12 (10 loci) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Reanalysis using further reduced 9-locus minHt profiles ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Result for minHt (7 loci) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Kinship ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 The Y chromosom a linearly inherited, haploid marker system ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 For which cases? ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Likelihood Calculation (LR) / Brotherhood (Probability for observing the haplotypes given same fathers vs. probability for observing the haplotypes given different fathers) L (X) = µ/2 x [f(A) + f(B)] L (Y) = f(A) x f(B) µ = mutation rate f = haplotype frequency (YHRD) A B Same or different fathers? • Locus-spezific µ for one-step-mutations, see YHRD • For the X hypothesis for each locus the probability of „non-mutation“ (1- µ) is also considered • Rolf et al. (Int J. Legal Med. 2001); Buckleton et al. (CRC Press, 2005) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Brothers? A B Same or different fathers ? µ = 3.6 x 10-3 * 14, 13, 31, 24, 11, 13, 14, 11-11, 14, 13 f A = 1.4 x 10-4* 14, 13, 31, 25, 11, 13, 14, 11-11, 14, 13 f B = 2.3 x10-5* * YHRD Meioses Related: L(X) = 1.4 x 10-4 x 1 x µ/2 + 2.3 x 10-5 x 1 x µ/2 = 2.9 x 10-7 Unrelated: L(Y) = 1.4 x 10-4 x 2.3 x 10-5 = 3.2 x 10-9 LR (X/Y) = 91 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 A Influence of the local mutation rate on LR B Father – son or unrelated ? µ = 2.1 x 10-3 (moderate)* 14, 13, 31, 24, 11, 13, 14, 11-11, 14, 13 f B = 1.4 x 10-4* 14, 13, 31, 24, 11, 14, 14, 11-11, 14, 13 f A = 2.3 x10-5* * YHRD L(X) = 1.4 x 10-4 x 1 x µ/2 = 1.5 x 10-7 L(Y) = 1.4 x 10-4 x 2.3 x 10-5 = 3.2 x 10-9 LR (X/Y) = 46.8 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 A Influence of the local mutation rate on LR B Father – son or unrelated ? µ = 1.2 x 10-2 (rapid)* 14, 13, 30, 24, 11, 13, 14, 11-12, 14, 13 f B = 1.4 x 10-4* 14, 13, 30, 24, 11, 14, 14, 11-12, 14, 13 f A = 2.3 x10-5* * YHRD L(X) = 1.4 x 10-4 x 1 x µ/2 = 8.4 x 10-7 L(Y) = 1.4 x 10-4 x 2.3 x 10-5 = 3.2 x 10-9 LR (X/Y) = 262.5 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Common ancestor? 5 7 µ = 2.1 x 10-3 (moderate)* A B 14, 13, 31, 24, 11, 13, 13, 11-12, 14, 13 f obs= 1.4 x 10-4* 14, 13, 31, 24, 11, 14, 13, 11-12, 14, 13 fobs = 2.3 x10-5* * YHRD Meioses L(X) = 1.4 x 10-4 x 7 x µ/2 + 2.3 x 10-5 x 5 x µ/2 = 1.1 x 10-6 L(Y) = 1.4 x 10-4 x 2.3 x 10-5 = 3.2 x 10-9 LR (X/Y) = 343 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Common ancestor? 5 7 µ = 1.2 x 10-2 (rapid)* A B 14, 13, 31, 24, 11, 13, 13, 11-12, 14, 13 f obs= 1.4 x 10-4* 14, 13, 31, 24, 11, 14, 13, 11-12, 14, 13 fobs = 2.3 x10-5* * YHRD Meioses L(X) = 1.4 x 10-4 x 7 x µ/2 + 2.3 x 10-5 x 5 x µ/2 = 6.6 x 10-6 L(Y) = 1.4 x 10-4 x 2.3 x 10-5 = 3.2 x 10-9 LR (X/Y) = 2053 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Ranking of Y-STR mutation rates Loci dys438 dys392 dys393 dys437 dys448 dys390 dys385 mc dys19 ygatah4 dys391 dys389i dys635 dys389ii dys456 dys481 dys533 dys439 dys460 dys458 dys518 dyf387S1ab mc dys576 dys570 dys627 dys449 Mutation Rate [95% CI] 2,96E-04 4,04E-04 1,09E-03 1,19E-03 1,65E-03 2,06E-03 2,30E-03 2,32E-03 2,47E-03 2,54E-03 2,68E-03 3,72E-03 3,78E-03 4,19E-03 4,97E-03 5.01E-03 5,35E-03 6,22E-03 6,74E-03 1,84E-02 1,59E-02 1,43E-02 1,24E-02 1,23E-02 1,22E-02 Meioses 10122 14867 13713 10101 6678 15061 25620 15539 7709 14935 13788 7525 13759 6678 1744 1730 10096 1717 6677 1556 1804 1727 1426 1766 1617 Position[MutRate] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Group[MutRate] slow slow slow slow slow medium medium medium medium medium medium medium medium medium medium medium medium medium medium fast fast fast fast fast fast ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Likelihood Ratio (LR, KI) calculation for Y-STRs A f (A) = 1/123* f (D) = 1/388** * Program uses counting (Discrete Laplace extrapolation: 1/311) ** Program uses counting (Discrete Laplace extrapolation: 1/821) D Population analysis (AMOVA) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 YHRD: Test on population substructure (Fst, Rst) (Example: 17,278 Chinese individuals in 52 populations) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Ancestry information ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Fast and slowly mutating Y markers TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * * • Y-SNPs µ = 10-9 - 10-12 irreversible stable phylogeny Time • Y-STRs µ = 10-3 recurrent networks 5 3 2 2 1 4 17,13,30,25,10,11,13,10-14 16,13,30,25,10,11,13,10-14 17,13,31,25,10,11,13,10-14 17,13,30,24,10,11,13,10-14 17,13,30,25,10,11,13,11-14 17,13,29,25,10,11,13,10-15 17,13,30,26,10,11,13,10-14 17,13,30,25,10,11,14,10-14 17,13,30,25,11,11,13,10-14 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Y-STR gradients (7 loci) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Roewer et al. Hum Genet 2005 Y-SNP gradients (R1a) Fechner et al., Am J Phys Anthropology 2008 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Is ancestry prediction possible? Haplogroup J2a Biogeographical analysis using Y doesn‘t predict nationality residency or phenotype Y markers infer very useful information the deep ancestry of a paternal lineage and its proliferation (radiation) over time until today Semino et al. 2004 (n = 2400) 37 Y marker analysis (Geppert et al. 2010) Skeleton in a trolley, 5g femur extracted Haplotype: 14,13,30,22,10,11,12,13-16,... ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Unknown skeletonized person – extract, type, search and add „ancestry information“ ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Ancestry information – three features and heat map ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Heat Map (searched haplotypes are reduced to the most representatively sampled minHt) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Searched haplotype is compared with a database of STR+SNP typed samples ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Hg prediction is prone to IBS errors (as evidenced by YHRD)! Mandatory: Y-SNP analysis using (mini)sequencing • SNaPshot method (Hierarchical Multiplex Analysis) Geppert M & Roewer L (2012) SNaPshot® minisequencing analysis of multiple ancestryinformative Y-SNPs using capillary electrophoresis. Methods Mol Biol. 830:127-40. J2a Turkey, Fertile Crescent, Caucasus, Mediterranean ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 CAVE! 15,13,30,22,10,11,12,15-16 – 2 matches to YHRD „Most frequent neighbour“ - 15,13,29,22,10,11,12,15-16 – 22 matches ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Legende: Each dot is one population sample (on average 120 individuals) with matching populations marked in red But: SNaPshot analysis • • • Haplogroup E-M2 highest frequency in West Africa (~ 80%) and Central Africa (~ 60%), not India Discrepancy between YSTR and YSNP distribution! ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Part II: Casework examples • • • • Frequency estimation Mixture Kinship Ancestry ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015