Haplotypes and imputed genotypes in diverse human populations Noah Rosenberg April 29, 2009 Human Genome Diversity Cell Line Panel 525,910 single-nucleotide polymorphisms in 29 populations M Jakobsson et al. (2008) Nature 451:998-1003 Overview • How do we measure and compare haplotype diversity across populations? • Imputation in diverse populations Which populations and genomic sites have more haplotype diversity? Population 1 0 0 0 0 X 0 0 X X 0 X 0 0 0 0 0 X 0 0 0 0 0 0 X 0 0 X 0 0 X X 0 X 0 0 0 0 0 X 0 0 0 0 X 0 X X X 0 X X X 0 0 X X 0 X 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 X X 0 0 0 0 0 X 0 0 0 0 0 X 0 0 0 0 X 0 X X X 0 0 0 0 0 X 0 0 0 X 0 0 0 X 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X 0 X 0 0 0 X 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 X 0 0 X X 0 X 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 X X X 0 0 0 X X 0 0 0 0 X 0 0 0 0 0 X 0 0 0 0 X 0 X X 0 0 X 0 0 0 0 0 0 X 0 X 0 Population 2 0 X X 0 0 X X 0 0 X X 0 X X 0 X X 0 0 0 0 X 0 0 0 0 0 0 X X 0 X 0 0 0 X 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 X 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 X X 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X 0 0 0 0 0 0 0 0 0 0 0 X X 0 0 0 0 0 0 X X X 0 0 0 X X 0 0 0 0 0 X 0 0 0 X X 0 0 0 0 0 0 X X X 0 0 0 X X 0 0 0 0 0 X 0 0 X 0 X X 0 X 0 0 0 X 0 X X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X X 0 0 X 0 X X 0 X 0 X 0 X 0 0 0 X 0 0 X 0 0 0 0 0 0 0 0 X 0 X X 0 X 0 0 X 0 0 0 0 0 X 0 0 0 0 X 0 X X 0 X 0 X 0 0 0 X X X 0 0 X X X 0 X X 0 0 X 0 0 X X 0 X X 0 0 0 0 0 Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Population 2 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Population 2 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X P Scheet, M Stephens (2006) AJHG 78:629-644 Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X X X X X X X X 1 X 1 1 5 6 6 X X X 6 6 X 6 6 X X X X X X X X X X X X X X X X X X 1 X X X X X Blue X X X X X X X 6 6 6 6 6 6 6 6 6 6 3 3 3 3 Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X X X X X X X X 1 3 3 X 1 1 X 3 2 X X X X X X X X X X X X X X X X X X X X X 1 X X X X X Blue Green X X X X X X X X 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X X X X X X X X 1 3 0 X X X X X X X X X X X X X X X X X X X X X X 1 X X X X X Blue Green Orange X X X X X X X X 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 1 1 X 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 6 6 0 Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X X X X X X X X 1 3 X X X X X X X X X X X X X X X X X X X X X X 1 X X X X X Blue Green Orange Pink X X X X X X X X 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 1 1 X 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 6 6 4 4 4 4 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X X X X X X X X 1 3 X X X X X X X X X X X X X X X X X X X X X X 1 X X X X X Blue Green Orange Pink Yellow X X X X X X X X 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 1 1 X 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 6 6 4 4 4 4 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 Which populations and genomic sites have more haplotype diversity? 0.8 0.6 0.5 Population 1 0.4 Population 2 0.3 0.2 0.1 Site number 25 23 21 19 17 15 13 11 9 7 5 3 0 1 Heterozygosity 0.7 Which populations and genomic sites have more haplotype diversity? X X X X X X X X X X X X X X X X X X X X X X X Population 1 X X X X Less diversity X X X X X X X X X X X X X X X X X X More diversity X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Population 2 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Haplotype cluster frequencies for a “typical” genomic region M Jakobsson et al. (2008) Nature 451:998-1003 More haplotype diversity in Africa C Asia Europe America Oceania Africa Middle East East Asia M Jakobsson et al. (2008) Nature 451:998-1003 Less haplotype homozygosity and more haplotype diversity in Africa M Jakobsson et al. (2008) Nature 451:998-1003 Genetic diversity declines with distance from Africa Haplotype heterozygosity Haplotype clusters recover population structure Europe Middle East Central/South Asia America East Asia Oceania Africa M Jakobsson et al. (2008) Nature 451:998-1003 Haplotype clusters recover population structure M Jakobsson et al. (2008) Nature 451:998-1003 Low haplotype diversity in the lactase region in Europe C Asia Europe America Oceania Africa Middle East East Asia M Jakobsson et al. (2008) Nature 451:998-1003 Haplotype cluster homozygosity as a test for selection Random region Lactase region M Jakobsson et al. (2008) Nature 451:998-1003 Haplotype diversity – summary • Haplotype clusters can be used to encode haplotypes pointwise for measurement of diversity • Haplotype cluster diversity is greatest in Africa • Low haplotype cluster diversity can potentially be used to detect selection Overview • Measuring haplotype diversity using haplotype clusters • Imputation in diverse populations Imputed genotypes can be tested for disease association Genotyped positions Study sample Reference panel Genotypes can be imputed using a reference panel – but imperfectly Evaluating imputation accuracy in worldwide populations • 443 individuals in 29 populations from the Human Genome Diversity Panel • Genotypes at >500,000 SNPs (Jakobsson et al. Nature 451:998-1003, 2008) • 420 HapMap reference haplotypes of ~2,000,000 SNPs, omitting offspring in trios • Randomly hide 15% genotypes in HGDP individuals and impute with MACH • Measure the proportion of alleles imputed correctly Imputation accuracy Imputation accuracy is predicted by haplotype diversity L Huang et al. (2008) AJHG 84:235-250 Imputation accuracy is greatest with a close reference panel L Huang et al. (2008) AJHG 84:235-250 Highest-accuracy reference panels match geographic locations Africa Europe/ W Asia E Asia/ Oceania/ Americas L Huang et al. (2008) AJHG 84:235-250 Imputation accuracy can be increased using HapMap mixtures • Instead of imputing based on separate HapMap panels, impute from mixtures • Choose mixtures to have optimal size given specified ratios L Huang et al. (2008) AJHG 84:235-250 Imputation accuracy can be increased using HapMap mixtures L Huang et al. (2008) AJHG 84:235-250 Summary – imputation accuracy • Strategies to improve imputation studies -Increased sample size -Improved imputation algorithms -Improved use of reference panels -Development of additional reference panels -Improved haplotyping -Use of additional data from relatives Imputation – summary • Imputation error and sample size inflation are greatest in Africa • Several strategies may be available for improving imputation, including use of mixtures Rosenberg lab James Degnan Mike DeGiorgio Lucy Huang Mattias Jakobsson Trevor Pemberton Paul Scheet Zach Szpiech Jenna VanLiere Chaolong Wang Collaborators Goncalo Abecasis (Michigan) Raph Gibbs (NIA) John Hardy (UCL) Yun Li (Michigan) Sonja Scholz (NIA) Andy Singleton (NIA) Funding Alfred P. Sloan Foundation Burroughs Wellcome Fund National Institutes of Health U of M Rackham Graduate School [M DeGiorgio] U of M Center for Genetics in Health and Medicine [M Jakobsson]