Analysis of imputed rare variants

advertisement
Analysis of imputed
rare variants
Andrew Morris
Advanced Topics in GWAS
Toronto, 30 May 2012
Introduction
• GWAS have been successful in detecting novel
loci for complex traits:
• typically characterised by common variants of
modest effect;
• together explain relatively little of the heritability.
• Low-frequency and rare variation may
contribute to the “missing” genetic
component of complex traits:
• IFIH1 and type 1 diabetes;
• MYH6 and sick sinus syndrome.
Rare variants and complex disease
• Rare variants are likely to have arisen from
founder effects in the last few generations.
• Rare variants are expected to have larger
effects on complex traits that common
variants.
• Statistical methods focus on the accumulation
of minor alleles at rare variants (mutational
load) within the same functional unit.
GRANVIL
• Test of association of phenotype with proportion of rare
variants at which individuals carry minor alleles.
1
0
0 0
0
1 0
0
0 1
pi = 3/10
• Model disease phenotype via regression on pi and any
other covariates in GLM framework.
Reedik Magi
http://www.well.ox.ac.uk/GRANVIL/
Assaying rare genetic variation
• Gold-standard approach to assaying rare genetic
variation is through re-sequencing, which is
expensive on the scale of the whole genome.
• GWAS genotyping arrays are inexpensive, but are
not designed to capture rare genetic variation.
• Increasing availability of large-scale reference
panels of whole-genome re-sequencing data:
1000 Genomes Project and the UK10K Project.
• Impute into GWAS scaffolds up to these reference
panels to recover genotypes at rare variants at no
additional cost, other than computing.
GRANVIL: imputed variants
• Test of association of phenotype with proportion of rare
variants at which individuals carry minor alleles.
0.9
0.1
0.2 0.1
0.1
0.8 0.1
0.1
0.1 0.6
pi = 3.0/10
• Replace direct genotypes with posterior probability of
heterozygous or rare homozygous call from imputation.
• Model disease phenotype via regression on pi and any
other covariates in GLM framework.
Study question
• Can we make use of imputation into GWAS
scaffolds up to re-sequencing reference panels
to detect rare variant associations with
complex traits?
• Simulation study performed to compare
power to detect association using GRANVIL for
four alternative strategies for assaying rare
genetic variation.
Design and analysis strategies
PHASED REFERENCE PANEL
ANALYSIS COHORT
Strategy 1. Re-sequence analysis cohort
PHASED REFERENCE PANEL
ANALYSIS COHORT
Strategy 2. Genotype analysis cohort
for variants in reference panel
PHASED REFERENCE PANEL
ANALYSIS COHORT
Variants not present in reference panel will be missed
Strategy 3. Genotype analysis
cohort with GWAS chip
PHASED REFERENCE PANEL
ANALYSIS COHORT
Strategy 4. Genotype analysis cohort with GWAS
chip and impute variants on reference panel
PHASED REFERENCE PANEL
ANALYSIS COHORT
Recovery of rare variants on reference panel by imputation
Simulation study
• Simulate 1050kb region of genome containing
50kb gene in phased reference panel (120,
500 or 4000 individuals) and analysis cohort
(2000 individuals).
• Select causal variants within gene subject to
maximum MAF and total MAF.
• Simulate quantitative trait for analysis cohort
given causal variants and contribution to
overall trait variance.
Simulation study
• Apply each strategy and test for association of
rare variants (MAF<1% in analysis cohort) with
quantitative trait using GRANVIL.
• Strategies 3 and 4: GWAS Illumina 660K chip.
• Strategy 4: Imputation performed using IMPUTEv2
allowing a “buffer” of 500kb, with low quality
imputed variants (info score < 0.4) excluded from
analysis.
• Assess power to detect association at a
nominal 5% significance threshold.
Maximum MAF of causal variant: 1%
Total MAF of causal variants: 5%
Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.
Maximum MAF of causal variant: 0.5%
Total MAF of causal variants: 2%
Power at nominal 5% significance threshold, assuming 5% contribution to trait variance.
Comments
• We can recover up to 80% of the power to
detect rare variant associations attained through
re-sequencing by imputation into GWAS data.
• Essential to include a “buffer” for imputation.
• As the MAF of causal variants decreases, larger
reference panels offer greater power.
• Limiting assumptions of simulation study:
• No re-sequencing or phasing errors in the reference
panel, and no miscalled or missing genotypes in the
analysis cohort.
• Reference panel ascertained from same population as
analysis cohort.
Application to WTCCC
• GWAS of seven complex human diseases from
the UK (2000 cases each and 3000 shared
controls from 1958 British Birth Cohort and
National Blood Service):
• bipolar disease (BD), coronary artery disease
(CAD), Crohn’s disease (CD), hypertension (HT),
rheumatoid arthritis (RA), type 1 diabetes (T1D)
and type 2 diabetes (T2D).
• Individuals genotyped using the Affymetrix
GeneChip 500K Mapping Array Set.
Quality control
• Samples excluded on the basis of
mismatch with external data, low
call rate, outlying heterozygosity,
duplication, relatedness, and nonEuropean ancestry.
• SNPs excluded on the basis of:
• call rate <95% (<99% if MAF <5%);
• extreme deviation from HWE (exact
p<5.7x10-7);
• MAF <1%.
Cohort
Samples passing QC
Controls
2,938
BD
1,868
CAD
1,926
CD
1,748
HT
1,952
RA
1,860
T1D
1,963
T2D
1,924
A total of 16,179 samples and 391,060 high-quality autosomal
SNPs carried forward for analysis
Fine-scale UK population structure
• Fine-scale population structure may have
greater impact on rare variants than on
common SNPs because of recent founder
effects.
• Utilised EIGENSTRAT to construct principal
components to represent axes of genetic
variation across the UK: 27,770 high-quality
LD pruned (r2<0.2) common autosomal SNPs
(MAF>5%).
Fine-scale UK population structure
Imputation
• SNPs mapped to NCBI build 37 of human
genome.
• Samples imputed up to 1000 Genomes Phase 1
cosmopolitan reference panel (June 2011 interim
release).
• 8.23M imputed autosomal rare variants
(MAF<1%) polymorphic in WTCCC.
• 5.38M (65.3%) were “well-imputed” (i.e. Info
score > 0.4) and carried forward for analysis.
• Mean info score was 0.618, and 17.3% had info
score > 0.8.
Rare variant analysis
• Test for association of each disease with
accumulation of rare variants (MAF<1%) within
genes using GRANVIL.
• Gene boundaries defined from UCSC human
genome database (build 37).
• Analyses adjusted for three principal components
to adjust for fine-scale UK population structure.
• Genome-wide significance threshold p<1.7x10-6:
Bonferroni adjustment for 30,000 genes.
No evidence of residual
population structure
Rare variant association with CAD
• Genome-wide significant evidence of association of CAD
with rare variants in the gene PRDM10 (p=4.9x10-8).
• Gene contains 122 well imputed rare variants with mean
MAF of 0.23%.
• Accumulations of minor alleles across these variants
were associated with decreased risk of disease: odds
ratio 0.828 (0.774-0.886) per minor allele.
Rare variant association with T1D
• Genome-wide significant evidence of association of T1D
with rare variants in multiple genes from the MHC.
• Strongest signal of association observed for HLA-DRA
(p=2.0x10-13).
• Gene contains 23 well imputed rare variants with mean
MAF of 0.32%.
• Accumulations of minor alleles across these variants were
associated with decreased risk of disease: odds ratio 0.556
(0.476-0.650) per minor allele.
T1D association across the MHC
• Ten genes achieve genome-wide significant evidence of
rare variant association with T1D.
HLA-DRA
PBMUCL2
NCR3
SLC44A4
HLA-DRB5
PBX2
TNXA
EHMT2
AGPAT1
C6orf10
T1D association across the MHC
• After additional adjustment for additive effect of lead
GWAS common variant from the MHC (rs9268645).
PBX2
SLC44A4
PBMUCL2
EHMT2
HLA-DRA
SKIVL2
TNXB
AGPAT1
HLA-DRB5
HLA-DMA
T1D association across the MHC
Comments
• GRANVIL assumes the same direction of effect on
the trait of all rare variants within the functional
unit.
• Methodology allowing for different directions of
effect of rare variants are well established for resequencing data, and are being generalised to
allow for imputation.
• The most powerful rare variant test will depend
on the underlying genetic architecture of the
trait.
Summary
• Simulations suggest that we can recover up to 80%
of the power to detect rare variant associations
attained through re-sequencing by imputation into
GWAS data.
• Requires no additional cost, other than
computation, which is not trivial!
• Imputation up to 1000 Genomes reference panel
into GWAS data from WTCCC highlighted:
• novel association of rare genetic variation in PRDM10
with CAD;
• complex genetic architecture underlying T1D association
across the MHC involving multiple genes.
Lab practical
• Use GRANVIL to test for association of T1D
with imputed rare variants within genes across
the MHC, using data from the WTCCC.
• Investigate the impact on results of:
• the MAF threshold for inclusion of rare variants in
the analysis;
• filtering rare variants on the basis of annotation;
• gene boundary definition.
Download