here

advertisement
Sequential Kernel Association Tests for the
Combined Effect of Rare and Common Variants
Journal club (Nov/13)
SH Lee
Introduction
• Sequence data
– Rare and unidentified variants
• Groupwise association tests
– Omnibus tests
– Burden test, CMC test, SKAT test
• Up-weighting for rare,
• down-weighting for common
• Rare/common variants tested separately
Introduction
• This study develops a joint test of rare/common
– Combining burden/SKAT test for rare/common
• Can be applied to
– whole exome sequencing + GWAS
– Deep resequencing of GWAS loci
• Basically can analyse all variants including rare,
low-frequency and common variants
• Simulation (type 1 error, power)
• Real data, CD and Autism
Materials and Methods
Definition of rare/common
• <0.01
rare
• 0.01-0.05
low frequency
• >0.05
common
Or
• <1/sqrt(2*n) rare
• >1/sqrt(2*n) common
– n = 500, rare MAF < 0.031
– n = 10000, rare MAF < 0.007
Materials and Methods
• Testing for the overall effect of rare and
common variants
– Rare for Burden test
– Common for SKAT test
Weighted-sum statistics
Fishers method of combining the p values
Weighted-sum statistics
• Within a region (e.g. a gene) having m variants
– g(*) is a linear or logistic link function
– Alpha is for covariates
– X is n x m matrix
– Beta is regression coefficient and random variable
Weighted sum score test
(Variance component score test)
1
1
1
L(a,V | C, y) = - ln |V | - log | C'V -1C | - (y -Ca )¢V -1 (y -Ca )
2
2
2
Taking the first derivative of log-likelihood respect
with the variance τ
P-value from κχ2ν
κ is scale parameter, v is degree of freedom
Weighted sum score test
(Variance component score test)
Wu et al (2010) AJHG 86: 929;
Liu et al (2008) BMC Bioinformatics 8: 292;
Lin (1997) Biometrika 84: 309;
White (1982) Econometrica 50: 1
Weighted sum score test
(Variance component score test)
• ρ : the correlation between regression coefficients
• If perfectly correlated (ρ = 1), they will be all the same
after weighting, and one should collapse the variants
first before running regression, i.e., the burden test
• If the regression coefficients are unrelated to each
other, one should use SKAT
Lee et al. (2012) AJHG 91: 224
Burden-C, SKAT-C
• Partitioning rare and common variants
• Combined test statistic for rare and common
– Weighting beta(p,1,25) for rare,
– beta(p,0.5,0.5) for common
Other methods
• Burden-A, SKAT-A
– Adaptive combining rare/common
– Searching φ for the minimum p-value
• Burden-F, SKAT-F
– Fisher’s combination method
Simulation
• Sequence data on 10,000 haplotypes on 1 Mb
region
• Calibrated model for the European pop
• Random sample of a region of 5 or 25 kb and
simulated data with 1000-5000 individuals
• Proportion of cases in the sample is 0.5
Disease model
Methods
Type I error
• The proposed methods agrees with the
expectation
Power (separation cut-off)
• Using burden-C test
• Power with different separation cut-offs
• 1/sqrt(2n) will be used further
Power (proposed methods)
• Power for 8 different tests
• The proposed combination tests outperform
Power
• Rare/common causal variants (model 1, 2, 3, 6)
– The combination methods perform better
Power
• Common causal variants (model 5)
– The combination methods perform better
• Rare causal variants (model 4)
– The combination methods perform similarly
Power (proposed methods)
• The proposed combination methods outperform CMC for
all 6 disease models
• The proposed combination methods outperform the
original SKAT for all 6 disease models
Power
• For model 1-4 which include only risk variants
•
•
SKAT better than Burden when prop. risk variants is small (10%)
Burden better than SKAT when prop. risk variants is large (30%)
Power
• Model 1-3 which include both rare/common
• SKAT-F better than burden-F regardless of prop. risk variants
• Model 5 which include only common risk variants
• SKAT better than burden regardless of prop. risk variants
Power
• Adaptive test (SKAT-A, Burden-A)
– Perform worse than SKAT-C and Burden-C
• Results for a region of size 5 kb were similar
Real data
• CD NOD2 sequence data
– 453 cases, 103 controls
– 60 single nucleotide variations (9 of them have >
MAF 0.05)
– Because only pooled frequency counts available
for each variants, sequencing data were
simulated.
• Autism LRP2 sequencing data
– 430 cases, 379 controls
Real data
• The combination methods powerful than others
Discussion
• The proposed combination methods
– Partitioning rare/common
– Powerful approach
– Better than CMC (rare/common partitioning)
– Better than original Burden and SKAT test
– Extend to family-based designs
Discussion
• T1D HLA region
– SKAT (2.7e-43)
– Wald test (6.7e-49)
– Likelihood ratio test (8.9e-221)
• LD between regions
• Multiple different components within a region
• Thanks
Linear SKAT vs individual variant test
statistics
• Linear SKAT (lower) and individual variant test
(upper) is equivalent
• Three disease model for power comparison
Download