Sebastian Zöllner University of Michigan Keng-Han Lin Matthew Zawistowski Mark Reppell GWAS have been successful. Only some heritability is explained by common variants. Uncommon coding variants (maf 5%-0.5%) explain less. Rare variants could explain some ‘missing’ heritability. ◦ Better Risk prediction. ◦ Rare variants may identify new genes. ◦ Rare exonic variants may be easier to annotate functionally and interpret. Testing individual variants is unfeasible. ◦ Limited power due to small number of observations. ◦ Multiple testing correction. Alternative: Joint test. ◦ Burden test (CMAT, Collapsing, WSS) ◦ Dispersion test (SKAT, C-alpha) Gene-based tests have low power. ◦ Nelson at al (2010) estimated that 10,000 cases & 10,000 controls are required for 80% power in half of the genes. Large sample size required More heterogeneous sample =>Danger of stratification Stratification may differ from common variants in magnitude and pattern. (202 genes, n=900/900, MAF < 1%, Nonsense/nonsynonymous variants) Expected Number of variants per kb African-American Southern Asia South-Eastern Europe South-Western Europe Western Europe Central Europe North-Western Europe Eastern Europe Northern Europe Finland A gradient in diversity from Southern to Northern Europe Sample Size • Measure of rare variant diversity. • Probability of two carriers of the minor alleles being from different populations (normalized). Median EU-EU: 0.71 Median EU-EU: 0.86 Median EU-EU: 0.98 1. Select 2 populations. 2. Select mixing parameter r. 3. 4. Sample 30 variants from the 202 genes. Calculate inflation based on observed frequency differences. Zawistowski et al. 2014 If multiple affected family members are collected, it may be more powerful to sequence all family members. Family-based tests can be robust against stratification. TDT-Type tests are potentially inefficient. How to leverage low frequency? ◦ Low frequency risk variants should me more common in cases. ◦ And even more common on chromosomes shared among many cases. S=0 • Consider affected sibpairs. • Estimate IBD sharing. Compare the number of rare variants on shared (solid) and non-shared chromosomes (blank). • Any aggregate test can be applied. S=1 S=2 Twice as many non-shared as shared chromosomes. Null hypothesis determines test: Shared alleles : Non-shared alleles=1:2 Test for linkage or association Shared alleles : Non-shared alleles= Shared chromosomes : Non-shared chromosomes Test for association only IBD sharing is known. Individuals don’t need phase to identify shared variants. Except one configuration: IBD 1 and both sibs are heterozygous Configuration 1 Configuration 1 +1 shared +2 non-shared Under null, probability of configuration 2 is allele frequency. Under the alternative, we need to use multiple imputation. Assume chromosome sharing status S=0 is known for each sibpair. Count rare variants; impute sharing status for double-heterozygotes. S=1 Compare number of rare variants between shared and non-shared chromosomes with chi-squared test (Burden Style). S=2 Classic CaseControl Internal Control S=0 S=1 S=2 Selected Cases Consider 2 populations. p=0.01 in pop1, p=0.05 in pop2. 1000 sibpairs for internal control design. 1000 cases, 1000 controls for selected cases. 1000 cases and 1000 controls for case-control. Sample cases from pop1 with proportion . Test for association with α=0.05. 0.8 0.4 0.0 Type I Error Rate Internal Control Selected Cases Conventional 0.0 0.2 0.4 0.6 Proportion 0.8 1.0 Realistic rare variant models are unknown ◦ ◦ ◦ ◦ ◦ Typical allele frequency Number of risk variants/gene Typical effect size Distribution of effect sizes Identifiabillity of risk variants Goal: Create a model that summarizes these unknowns into ◦ Summed allele frequency ◦ Mean effect size ◦ Variance of effect size Assume many loci carrying risk variants. Risk alleles at multiple loci each increase the risk by a factor independently. Frequency of risk variant: A ◦ Independent cases P( R | A) P(A|R)P(R) ◦ On shared chromosome P( R | AA) P( AA | R) P( R) Affected AA Affected relative pair R Risk locus genotype Relative risk is sampled from distribution f with mean , variance σ2. Simplifications: ◦ Each risk variant occurs only once in the population. ◦ Each risk variant on its own haplotype. Then the risk in a random case is A Affected r1,r Carrier status of 2 chromosome 1,2 m1, Relative risk of m2 risk variants on 1,2 Mean effect size σ2 Variance of effect size P( A | r1 , r2 ) m1r1 m2r2 f (m1 ) f (m2 ) r1 r2 To calculate the probability of having an affected sib-pair we condition on sharing S. For S>0, the probability depends on σ2. E.g. (S=2): P( AA | r1 , r2 , S 2) m12 r1 m22 r2 f (m1 ) f (m2 ) AA Affected rel pair ri Carrier stat chrom i mi Relative risk of variant on i f Distribution of RR Mean RR σ2 Variance of RR S Sharing status E( f 2 )r1 E( f 2 )r2 ( 2 2 )r1 r2 Select μ, σ2 and cumulative frequency f Calculate allele frequency in cases/controls P(R|A). Calculate allele frequency in shared/nonshared chromosomes. => Non-centrality parameter of χ2 distribution. 0.6 Conventional Case-Control Internal Control Selected Cases 0.0 sMAF 0.2 0.4 f=0.2 f=0.01 1 2 3 4 5 1 2 3 4 5 1 Mean Relative Risk 2 3 4 5 1.0 2.5 function(x) power.sas(mu = x, sigma2 = sigma2, f = 0 n_sb = n1)) 0.0 Power 0.4 0.8 f=0.01 4.0 1.0 2.5 4.0 1.0 Mean Relative Risk , function(x) power.sas(mu = x, sigma2 = sigma2, f = n_sb = n1)) f=0.05 f=0.2 Internal Control Selected Cases Conventional 2.5 4.0 0 1 2 y(x, function(x) power.sas(mu = mu, sigma2 = x, f = 0.05, n_sb = n1)) 0.0 Power 0.4 0.8 f=0.01 3 40 1 2 3 40 Variance of Relative Risk ly(x, function(x) power.sas(mu = mu, sigma2 = x, f = 0.2, n_sb = n1)) f=0.05 f=0.2 Internal Control Selected Cases Conventional 1 2 3 4 Gene-gene interaction affects power in families. For broad range of interaction models, consider two-locus model. G now has alleles g1,g2. The joint effect is P( A | r1, r2 , g1, g2 ) Lr1 r2 Gg1 g2 (r1 r2 )( g1 g2 ) We compare the effect of while adjusting L and G to maintain marginal risk. 0.8 Power 0.4 0.0 IC SRR=2 IC SRR=8 Conventional 0.2 0.4 0.6 0.8 Interaction Coefficient 1.0 0.8 Power 0.4 0.0 IC SRR=2 IC SRR=8 Conventional 1.0 1.2 1.4 1.6 1.8 Interaction Coefficient 2.0 Stratification is a strong confounder for rare variant tests. Family-based association methods are robust to stratification. Comparing rare variants between shared and nonshared chromosomes is substantially more powerful than case-control designs. All family based methods/samples depend on the model of gene-gene interaction. Under antagonistic interaction power can be lower than a population sample. Thank you for your attention