Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215

Genome-Wide Association Studies
Xiaole Shirley Liu
Stat 115/215
Association Studies
• Association between genetic markers and
phenotype
• Especially, find disease genes, SNP / haplotype
markers, for susceptibility prediction and
diagnosis
• Influences individual decisions on life styles,
prevention, screening, and treatment
2
3
Mike Snyder’s iPOP Reveals Diabetes
4
Warfarin and CYP2C9:
SNPs in Pharmacogenomics
• Warfarin anticoagulant drug; CYP2C9 gene
metabolizes warfarin.
• A patient requiring low dosage warfarin
compared to normal population, has an odd
ratio of 6.21 for having  1 variant allele
• Subgroup of patients who are poor
metabolisers of warfarin are potentially at
higher risk of bleeding
Aithal et al., 1999, Lancet.
Genome-Wide Association Studies
• Two strategies:
– Family-based association studies
– Population-based case-control association
studies
• Quality Control
–
–
–
–
6
Unusual similarity between individual
Wrong sex
Trio has non-Mendelian inheritance
Genotyping quality
Quality Control: SNP calls
Good calls!
Bad calls!
Family-based Association Studies
TDT: Transmission Disequilibrium Test
• Look at allele transmission in unrelated families
and one affected child in each
Aa
92

 2.11
A a
92
~  2 , 1 df
ZTDT 
2
ZTDT
• Could also compare
allele frequency
between affected vs
unaffected children
in the same family
8
Like coin toss
Case Control Studies
• SNP/haplotype marker frequency in sample
of affected cases compared to that in age
/sex /population-matched sample of
unaffected controls
• Size matters
9
Visscher, AJHG 2012
From Genotyping to Allele Counts
10
Test Significant Associations
• Expected:
– (24 + 278) * (24 + 86) / (24 + 278 + 86 + 296) = 49
– (278+296) * (86+296) / (24 + 278 + 86 + 296) = 321
2
•  
i, j
(eij  oij )2
eij
2 = 27.5, 1df, p < 0.001
• Multiple hypotheses testing?
11
GWAS Pvalues
GWAS Pvalues for Type II Diabetes
• Bonferroni correction: most common, typically p
< 10-7 or 10-8
• Split samples to improve power
McCarthy et al, Nat Rev Genetics, 2008
14
Association of Alleles and Genotypes of
rs1333049 (‘3049) with Myocardial Infarction
C
N (%)
G
N (%)
2,132 (55.4)
1,716 (44.6)
Controls 2,783 (47.4)
3,089 (52.6)
Cases
2
(1df)
P-value
55.1
1.2 x 10-13
Allelic Odds Ratio = 1.38
• OR = 1, no disease association
• OR > 1, allele increase risk of disease
• OR < 1, allele decrease risk of disease
Samani N et al, N Engl J Med 2007; 357:443-453.
Manolio et al., Clin Invest 2008
Pitfalls of Association Studies
• Not very predictive
17
Pitfalls of Association Studies
•
•
•
•
•
•
18
•
•
Not very predictive
Explain little heritability
Poor reproducibility
Poor penetrance (fraction of people with the
marker who show the trait) and expressivity
(severity of the effect)
Focus on common variation
Difficult when several genes affecting a
quantitative trait
Many associated variants are not causal
No available intervention for many disease risks
Reproducibility of Association Studies
• Most reported associations have not been
consistently reproduced
• Hirschhorn et al, Genetics in Medicine, 2002,
review of association studies
– 603 associations of polymorphisms and disease
– 166 studied in at least three populations
– Only 6 seen in > 75% studies
19
Cause for Inconsistency
• What explains the lack of reproducibility?
• False positives
– Multiple hypothesis testing
– Ethnic admixture / stratification
• False negatives
– Lack of power for weak effects
• Population differences
– Variable LD with causal SNP
– Population-specific modifiers
20
Population Stratification
• Population stratification
●
●
21
– e.g. some SNP unique to ethnic group
– Need to make sure sample groups match
– Hidden environmental structure
Two populations have different disease frequency,
and different allele frequency.
Association picks up they are different
populations!
Balding, Nature Reviews Genetics 2010
Genotyping Principal Components (PCs)
Can Model Population Stratification
• Li et al., Science 2008
Causes for Inconsistency
• A sizable fraction (but less than half) of
reported associations are likely correct
• Genetic effects are generally modest
– Beware the winner’s curse (auction theory)
– In association studies, first positive report is
equivalent to the winning bid
• Large study sizes are
needed to detect these
reliably
23
Should we Believe
Association Study Results?
• Initial skepticism is warranted
• Replication, especially with low p values, is
encouraging
• Large sample sizes are crucial
• E.g. PPARg
Pro12Ala &
Diabetes
24
Replication, Replication, Replication
• Meta-analysis of multiple studies to
increase GWAS power
• Combine data from different platforms /
studies
• Impute unmeasured or missing genotypes
based on LD (e.g. HapMap haplotypes or
1000 Genomes)
• Analyze all studies together to increase
GWAS power
25
Missing Heritability?
Visccher, AJHG 2011
Detection Power of GWAS
27
Acknowledgement
• Tim Niu
• Kenneth Kidd, Judith Kidd and Glenys
Thomson
• Joel Hirschhorn
• Greg Gibson & Spencer Muse
• Jim Stankovich
• Teri Manolio
28