BANFF14Zoellner

advertisement
Sebastian Zöllner
University of Michigan
Keng-Han Lin
Matthew Zawistowski
Mark Reppell

GWAS have been successful.

Only some heritability is explained by common variants.

Uncommon coding variants (maf 5%-0.5%) explain less.

Rare variants could explain some ‘missing’ heritability.
◦ Better Risk prediction.
◦ Rare variants may identify new genes.
◦ Rare exonic variants may be easier to annotate functionally and
interpret.

Testing individual variants is
unfeasible.
◦ Limited power due to small number of
observations.
◦ Multiple testing correction.

Alternative: Joint test.
◦ Burden test (CMAT, Collapsing, WSS)
◦ Dispersion test (SKAT, C-alpha)

Gene-based tests have low power.
◦ Nelson at al (2010) estimated that 10,000 cases &
10,000 controls are required for 80% power in
half of the genes.



Large sample size required
More heterogeneous sample =>Danger of
stratification
Stratification may differ from common
variants in magnitude and pattern.


(202 genes, n=900/900,
MAF < 1%, Nonsense/nonsynonymous
variants)
Expected Number of variants
per kb
African-American
Southern Asia
South-Eastern Europe
South-Western Europe
Western Europe
Central Europe
North-Western Europe
Eastern Europe
Northern Europe
Finland
A gradient in diversity
from Southern to Northern
Europe
Sample Size
• Measure of rare variant diversity.
• Probability of two carriers of the minor alleles being
from different populations (normalized).
Median EU-EU: 0.71
Median EU-EU: 0.86
Median EU-EU: 0.98
1.
Select 2 populations.
2.
Select mixing parameter r.
3.
4.
Sample 30 variants from
the 202 genes.
Calculate inflation based
on observed frequency
differences.
Zawistowski et al.
2014

If multiple affected family members are collected, it may
be more powerful to sequence all family members.

Family-based tests can be robust against stratification.

TDT-Type tests are potentially inefficient.

How to leverage low frequency?
◦ Low frequency risk variants should me more common in cases.
◦ And even more common on chromosomes shared among
many cases.
S=0
•
Consider affected sibpairs.
•
Estimate IBD sharing.
Compare the number of
rare variants on shared
(solid) and non-shared
chromosomes (blank).
•

Any aggregate test can be
applied.
S=1
S=2


Twice as many non-shared as shared
chromosomes.
Null hypothesis determines test:
Shared alleles : Non-shared alleles=1:2
Test for linkage or association
Shared alleles : Non-shared alleles=
Shared chromosomes : Non-shared chromosomes
Test for association only

IBD sharing is known.

Individuals don’t need phase to identify shared variants.

Except one configuration: IBD 1 and both sibs are heterozygous


Configuration 1
Configuration 1
+1 shared
+2 non-shared
Under null, probability of configuration 2 is allele frequency.
Under the alternative, we need to use multiple imputation.

Assume chromosome sharing status
S=0
is known for each sibpair.

Count rare variants; impute sharing
status for double-heterozygotes.

S=1
Compare number of rare variants
between shared and non-shared
chromosomes with chi-squared test
(Burden Style).
S=2
Classic CaseControl
Internal
Control
S=0
S=1
S=2
Selected
Cases

Consider 2 populations.

p=0.01 in pop1, p=0.05 in pop2.

1000 sibpairs for internal control design.

1000 cases, 1000 controls for selected cases.

1000 cases and 1000 controls for case-control.

Sample cases from pop1 with proportion .

Test for association with α=0.05.
0.8
0.4
0.0
Type I Error Rate
Internal Control
Selected Cases
Conventional
0.0
0.2
0.4
0.6
Proportion
0.8
1.0

Realistic rare variant models are unknown
◦
◦
◦
◦
◦

Typical allele frequency
Number of risk variants/gene
Typical effect size
Distribution of effect sizes
Identifiabillity of risk variants
Goal: Create a model that summarizes these
unknowns into
◦ Summed allele frequency
◦ Mean effect size
◦ Variance of effect size



Assume many loci carrying risk variants.
Risk alleles at multiple loci each increase the
risk by a factor independently.
Frequency of risk variant: A
◦ Independent cases
P( R | A)  P(A|R)P(R)
◦ On shared chromosome
P( R | AA)  P( AA | R) P( R)
Affected
AA Affected relative
pair
R Risk locus
genotype


Relative risk is sampled from
distribution f with mean ,
variance σ2.
Simplifications:
◦ Each risk variant occurs only
once in the population.
◦ Each risk variant on its own
haplotype.

Then the risk in a random
case is
A
Affected
r1,r Carrier status of
2
chromosome 1,2
m1, Relative risk of
m2 risk variants on
1,2
Mean effect size

σ2 Variance of effect
size
P( A | r1 , r2 )   m1r1 m2r2 f (m1 ) f (m2 )   r1  r2


To calculate the probability of
having an affected sib-pair we
condition on sharing S.
For S>0, the probability
depends on σ2. E.g. (S=2):
P( AA | r1 , r2 , S  2) 
  m12 r1 m22 r2 f (m1 ) f (m2 ) 
AA Affected rel pair
ri
Carrier stat
chrom i
mi Relative risk of
variant on i
f
Distribution of
RR
Mean RR

σ2 Variance of RR
S
Sharing status
 E( f 2 )r1 E( f 2 )r2  ( 2   2 )r1 r2



Select μ, σ2 and cumulative frequency f
Calculate allele frequency in cases/controls
P(R|A).
Calculate allele frequency in shared/nonshared chromosomes.
=> Non-centrality parameter of χ2
distribution.
0.6
Conventional
Case-Control
Internal Control
Selected Cases
0.0
sMAF
0.2 0.4
f=0.2
f=0.01
1
2
3
4
5
1 2 3 4 5 1
Mean Relative Risk
2
3
4
5
1.0
2.5
function(x) power.sas(mu = x, sigma2 = sigma2, f = 0
n_sb = n1))
0.0
Power
0.4
0.8
f=0.01
4.0 1.0
2.5
4.0 1.0
Mean Relative Risk
, function(x) power.sas(mu = x, sigma2 = sigma2, f =
n_sb = n1))
f=0.05
f=0.2
Internal Control
Selected Cases
Conventional
2.5
4.0
0
1
2
y(x, function(x) power.sas(mu = mu, sigma2 = x, f = 0.05,
n_sb = n1))
0.0
Power
0.4
0.8
f=0.01
3
40
1
2
3
40
Variance of Relative Risk
ly(x, function(x) power.sas(mu = mu, sigma2 = x, f = 0.2,
n_sb = n1))
f=0.05
f=0.2
Internal Control
Selected Cases
Conventional
1
2
3
4



Gene-gene interaction affects power in families.
For broad range of interaction models, consider
two-locus model.
G now has alleles g1,g2. The joint effect is
P( A | r1, r2 , g1, g2 )  Lr1 r2  Gg1  g2   (r1 r2 )( g1  g2 )

We compare the effect of  while adjusting L and
G to maintain marginal risk.
0.8
Power
0.4
0.0
IC SRR=2
IC SRR=8
Conventional
0.2 0.4 0.6 0.8
Interaction Coefficient
1.0
0.8
Power
0.4
0.0
IC SRR=2
IC SRR=8
Conventional
1.0
1.2 1.4 1.6 1.8
Interaction Coefficient
2.0

Stratification is a strong confounder for rare variant
tests.

Family-based association methods are robust to
stratification.


Comparing rare variants between shared and nonshared chromosomes is substantially more powerful
than case-control designs.
All family based methods/samples depend on the
model of gene-gene interaction. Under antagonistic
interaction power can be lower than a population
sample.
Thank you for your attention
Download