Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee Introduction • Sequence data – Rare and unidentified variants • Groupwise association tests – Omnibus tests – Burden test, CMC test, SKAT test • Up-weighting for rare, • down-weighting for common • Rare/common variants tested separately Introduction • This study develops a joint test of rare/common – Combining burden/SKAT test for rare/common • Can be applied to – whole exome sequencing + GWAS – Deep resequencing of GWAS loci • Basically can analyse all variants including rare, low-frequency and common variants • Simulation (type 1 error, power) • Real data, CD and Autism Materials and Methods Definition of rare/common • <0.01 rare • 0.01-0.05 low frequency • >0.05 common Or • <1/sqrt(2*n) rare • >1/sqrt(2*n) common – n = 500, rare MAF < 0.031 – n = 10000, rare MAF < 0.007 Materials and Methods • Testing for the overall effect of rare and common variants – Rare for Burden test – Common for SKAT test Weighted-sum statistics Fishers method of combining the p values Weighted-sum statistics • Within a region (e.g. a gene) having m variants – g(*) is a linear or logistic link function – Alpha is for covariates – X is n x m matrix – Beta is regression coefficient and random variable Weighted sum score test (Variance component score test) 1 1 1 L(a,V | C, y) = - ln |V | - log | C'V -1C | - (y -Ca )¢V -1 (y -Ca ) 2 2 2 Taking the first derivative of log-likelihood respect with the variance τ P-value from κχ2ν κ is scale parameter, v is degree of freedom Weighted sum score test (Variance component score test) Wu et al (2010) AJHG 86: 929; Liu et al (2008) BMC Bioinformatics 8: 292; Lin (1997) Biometrika 84: 309; White (1982) Econometrica 50: 1 Weighted sum score test (Variance component score test) • ρ : the correlation between regression coefficients • If perfectly correlated (ρ = 1), they will be all the same after weighting, and one should collapse the variants first before running regression, i.e., the burden test • If the regression coefficients are unrelated to each other, one should use SKAT Lee et al. (2012) AJHG 91: 224 Burden-C, SKAT-C • Partitioning rare and common variants • Combined test statistic for rare and common – Weighting beta(p,1,25) for rare, – beta(p,0.5,0.5) for common Other methods • Burden-A, SKAT-A – Adaptive combining rare/common – Searching φ for the minimum p-value • Burden-F, SKAT-F – Fisher’s combination method Simulation • Sequence data on 10,000 haplotypes on 1 Mb region • Calibrated model for the European pop • Random sample of a region of 5 or 25 kb and simulated data with 1000-5000 individuals • Proportion of cases in the sample is 0.5 Disease model Methods Type I error • The proposed methods agrees with the expectation Power (separation cut-off) • Using burden-C test • Power with different separation cut-offs • 1/sqrt(2n) will be used further Power (proposed methods) • Power for 8 different tests • The proposed combination tests outperform Power • Rare/common causal variants (model 1, 2, 3, 6) – The combination methods perform better Power • Common causal variants (model 5) – The combination methods perform better • Rare causal variants (model 4) – The combination methods perform similarly Power (proposed methods) • The proposed combination methods outperform CMC for all 6 disease models • The proposed combination methods outperform the original SKAT for all 6 disease models Power • For model 1-4 which include only risk variants • • SKAT better than Burden when prop. risk variants is small (10%) Burden better than SKAT when prop. risk variants is large (30%) Power • Model 1-3 which include both rare/common • SKAT-F better than burden-F regardless of prop. risk variants • Model 5 which include only common risk variants • SKAT better than burden regardless of prop. risk variants Power • Adaptive test (SKAT-A, Burden-A) – Perform worse than SKAT-C and Burden-C • Results for a region of size 5 kb were similar Real data • CD NOD2 sequence data – 453 cases, 103 controls – 60 single nucleotide variations (9 of them have > MAF 0.05) – Because only pooled frequency counts available for each variants, sequencing data were simulated. • Autism LRP2 sequencing data – 430 cases, 379 controls Real data • The combination methods powerful than others Discussion • The proposed combination methods – Partitioning rare/common – Powerful approach – Better than CMC (rare/common partitioning) – Better than original Burden and SKAT test – Extend to family-based designs Discussion • T1D HLA region – SKAT (2.7e-43) – Wald test (6.7e-49) – Likelihood ratio test (8.9e-221) • LD between regions • Multiple different components within a region • Thanks Linear SKAT vs individual variant test statistics • Linear SKAT (lower) and individual variant test (upper) is equivalent • Three disease model for power comparison