Haplotypes and imputed genotypes in diverse human populations

advertisement
Haplotypes and imputed genotypes
in diverse human populations
Noah Rosenberg
April 29, 2009
Human Genome Diversity Cell Line Panel
525,910 single-nucleotide polymorphisms in 29 populations
M Jakobsson et al. (2008) Nature 451:998-1003
Overview
•
How do we measure and compare haplotype diversity across populations?
•
Imputation in diverse populations
Which populations and genomic sites have more haplotype diversity?
Population 1
0
0
0
0
X 0
0
X X 0
X 0
0
0
0
0
X 0
0
0
0
0
0
X 0
0
X 0
0
X X 0
X 0
0
0
0
0
X 0
0
0
0
X 0
X X
X 0
X X X 0
0
X X 0
X 0
0
0
0
0
X 0
0
0
0
0
0
0
0
0
0
0
0
0
0
X X X 0
0
0
X X 0
0
0
0
0
X 0
0
0
0
0
X 0
0
0
0
X 0
X X
X 0
0
0
0
0
X 0
0
0
X 0
0
0
X 0
0
X 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X 0
X 0
0
0
X 0
0
X 0
0
0
0
0
0
0
0
0
0
0
0
X 0
0
X X 0
X 0
0
0
0
0
X 0
0
0
0
0
0
0
0
0
0
0
X X X 0
0
0
X X 0
0
0
0
X 0
0
0
0
0
X 0
0
0
0
X 0
X X
0
0
X 0
0
0
0
0
0
X 0
X 0
Population 2
0
X X 0
0
X X 0
0
X X 0
X X 0
X X 0
0
0
0
X 0
0
0
0
0
0
X X 0
X 0
0
0
X 0
0
X 0
0
0
0
0
0
0
0
0
0
0
0
X 0
X 0
0
0
0
0
0
0
0
0
0
0
0
0
X X 0
0
0
0
0
0
0
X 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X X 0
0
0
0
0
0
0
0
0
0
0
X X 0
0
0
0
0
0
X X X 0
0
0
X X 0
0
0
0
0
X 0
0
0
X X 0
0
0
0
0
0
X X X 0
0
0
X X 0
0
0
0
0
X 0
0
X 0
X X 0
X 0
0
0
X 0
X X 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
X X 0
0
X 0
X X
0
X 0
X 0
X 0
0
0
X 0
0
X 0
0
0
0
0
0
0
0
X 0
X X
0
X 0
0
X 0
0
0
0
0
X 0
0
0
0
X 0
X X
0
X 0
X 0
0
0
X X X 0
0
X X X 0
X X
0
0
X 0
0
X X 0
X X 0
0
0
0
0
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
X
X
X
X
X
X
X X
X
X X
X X X
X
X
X
X
X X
X X
X X X
X X
X X
X X X
X X
X
X
X X
X
X X
X
X
X
X X
X X
X X
X
X
X
X
X
X
X X
X X
X
Population 2
X
X
X X
X X
X X
X X
X
X
X
X
X
X
X X
X
X X X
X
X
X
X
X
X X
X X
X
X
X
X X X
X
X X
X
X X
X
X X
X X X
X X
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
X
X
X
X
X
X
X X
X
X X
X X X
X
X
X
X
X X
X X
X X X
X X
X X
X X X
X X
X
X
X X
X
X X
X
X
X
X X
X X
X X
X
X
X
X
X
X
X X
X X
X
Population 2
X
X
X X
X X
X X
X X
X
X
X
X
X
X
X X
X
X X X
X
X
X
X
X
X X
X X
X
X
X
X X X
X
X X
X
X X
X
X X
X X X
X X
P Scheet, M Stephens (2006) AJHG 78:629-644
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
X
X
X
X
X
X X
1
X
1
1
5
6
6
X X X
6
6
X
6
6
X
X X
X
X X
X
X X
X X
X X
X X
X
X
X
1
X
X
X X
X
Blue
X X X
X
X X
X
6
6
6
6
6
6
6
6
6
6
3
3
3
3
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
X
X
X
X
X
X X
1
3
3
X
1 1
X
3 2
X
X X
X
X X
X
X X
X X X
X X
X X
X X
X
X
X
1
X
X
X X
X
Blue
Green
X X X
X
X
X X
X
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
3
3
3
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
X
X
X
X
X
X X
1
3
0
X
X
X X
X
X X
X
X X
X X X
X X
X X
X X
X
X
X
1
X
X
X X
X
Blue
Green
Orange
X X X
X
X
X X
X
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
3
3
3
3
3
1 1
X
3 2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
6
6
6
0
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
X
X
X
X
X
X X
1
3
X
X
X X
X
X X
X
X X
X X X
X X
X X
X X
X
X
X
1
X
X
X X
X
Blue
Green
Orange
Pink
X X X
X
X
X X
X
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
3
3
3
3
3
1 1
X
3 2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
6
6
6
4
4
4
4
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
X
X
X
X
X
X X
1
3
X
X
X X
X
X X
X
X X
X X X
X X
X X
X X
X
X
X
1
X
X
X X
X
Blue
Green
Orange
Pink
Yellow
X X X
X
X
X X
X
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
3
3
3
3
3
1 1
X
3 2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
6
6
6
4
4
4
4
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
0
0
0
0
Which populations and genomic sites have more haplotype diversity?
0.8
0.6
0.5
Population 1
0.4
Population 2
0.3
0.2
0.1
Site number
25
23
21
19
17
15
13
11
9
7
5
3
0
1
Heterozygosity
0.7
Which populations and genomic sites have more haplotype diversity?
X
X X
X
X
X
X X
X
X
X X X
X X
X
X
X
X
X
X X
X
Population 1
X X
X
X
Less diversity
X
X
X
X
X
X
X X
X
X X
X X X
X
X
X
X
More diversity
X X
X X
X X X
X X
X X
X X X
X X
X
X
X X
X
X X
X
X
X
X X
X X
X X
X
X
X
X
X
X
X X
X X
X
Population 2
X
X
X X
X X
X X
X X
X
X
X
X
X
X
X X
X
X X X
X
X
X
X
X
X X
X X
X
X
X
X X X
X
X X
X
X X
X
X X
X X X
X X
Haplotype cluster frequencies for a “typical” genomic region
M Jakobsson et al. (2008) Nature 451:998-1003
More haplotype diversity in Africa
C Asia
Europe
America
Oceania
Africa
Middle
East
East
Asia
M Jakobsson et al. (2008) Nature 451:998-1003
Less haplotype homozygosity and more haplotype diversity in Africa
M Jakobsson et al. (2008) Nature 451:998-1003
Genetic diversity declines with distance from Africa
Haplotype
heterozygosity
Haplotype clusters recover population structure
Europe
Middle East
Central/South Asia
America
East Asia
Oceania
Africa
M Jakobsson et al. (2008) Nature 451:998-1003
Haplotype clusters recover population structure
M Jakobsson et al. (2008) Nature 451:998-1003
Low haplotype diversity in the lactase region in Europe
C Asia
Europe
America
Oceania
Africa
Middle
East
East
Asia
M Jakobsson et al. (2008) Nature 451:998-1003
Haplotype cluster homozygosity as a test for selection
Random region
Lactase region
M Jakobsson et al. (2008) Nature 451:998-1003
Haplotype diversity – summary
•
Haplotype clusters can be used to encode haplotypes pointwise for
measurement of diversity
•
Haplotype cluster diversity is greatest in Africa
•
Low haplotype cluster diversity can potentially be used to detect selection
Overview
•
Measuring haplotype diversity using haplotype clusters
•
Imputation in diverse populations
Imputed genotypes can be tested for disease association
Genotyped positions
Study
sample
Reference
panel
Genotypes can be imputed using a reference panel – but imperfectly
Evaluating imputation accuracy in worldwide populations
•
443 individuals in 29 populations from the Human Genome Diversity Panel
•
Genotypes at >500,000 SNPs (Jakobsson et al. Nature 451:998-1003, 2008)
•
420 HapMap reference haplotypes of ~2,000,000 SNPs, omitting offspring in trios
•
Randomly hide 15% genotypes in HGDP individuals and impute with MACH
•
Measure the proportion of alleles imputed correctly
Imputation accuracy
Imputation accuracy is predicted by haplotype diversity
L Huang et al. (2008) AJHG 84:235-250
Imputation accuracy is greatest with a close reference panel
L Huang et al. (2008) AJHG 84:235-250
Highest-accuracy reference panels match geographic locations
Africa
Europe/
W Asia
E Asia/
Oceania/
Americas
L Huang et al. (2008) AJHG 84:235-250
Imputation accuracy can be increased using HapMap mixtures
•
Instead of imputing based on separate HapMap panels, impute from mixtures
•
Choose mixtures to have optimal size given specified ratios
L Huang et al. (2008) AJHG 84:235-250
Imputation accuracy can be increased using HapMap mixtures
L Huang et al. (2008) AJHG 84:235-250
Summary – imputation accuracy
•
Strategies to improve imputation studies
-Increased sample size
-Improved imputation algorithms
-Improved use of reference panels
-Development of additional reference panels
-Improved haplotyping
-Use of additional data from relatives
Imputation – summary
•
Imputation error and sample size inflation are greatest in Africa
•
Several strategies may be available for improving imputation, including use
of mixtures
Rosenberg lab
James Degnan
Mike DeGiorgio
Lucy Huang
Mattias Jakobsson
Trevor Pemberton
Paul Scheet
Zach Szpiech
Jenna VanLiere
Chaolong Wang
Collaborators
Goncalo Abecasis (Michigan)
Raph Gibbs (NIA)
John Hardy (UCL)
Yun Li (Michigan)
Sonja Scholz (NIA)
Andy Singleton (NIA)
Funding
Alfred P. Sloan Foundation
Burroughs Wellcome Fund
National Institutes of Health
U of M Rackham Graduate School [M DeGiorgio]
U of M Center for Genetics in Health and Medicine [M Jakobsson]
Download