36321

advertisement
Genetics for Epidemiologists
National Human
Genome Research
Institute
National
Institutes of
Health
U.S. Department
of Health and
Human Services
Lecture 5: Analysis of Genetic Association
Studies
U.S. Department of Health and Human Services
National Institutes of Health
National Human Genome Research Institute
Teri A. Manolio, M.D., Ph.D.
Director, Office of Population Genomics and
Senior Advisor to the Director, NHGRI,
for Population Genomics
Topics to be Covered
• Discrete traits and quantitative traits
• Measures of association
• Detecting/correcting for false positives
• Genotyping quality control
• Quantile-quantile (Q-Q) plots
• Odds ratios: allelic and genotypic
• Models of genetic transmission
• Interactions: gene-gene, gene-environment
Larson, G. The Complete Far Side. 2003.
Quantitative Genetics
“…concerned with the inheritance of those
differences between individuals that are of
degree rather than of kind…”
Quantitative
Qualitative
Continuous gradation
among individuals from
one extreme to other
Sharply demarcated types
with little connection by
intermediates
Effects of genes are small
Effects of genes are large
Usually many genes
Single genes inherited in
Mendelian ratios?
Falconer and Mackay, Quantitative Genetics 1996.
Inheritance Models in Single Gene Trait
A
a
Inheritance Models in Single Gene Trait
Genotype Group
Model
A is
Dominant
A is
Recessive
A is
Co-Dominant
AA
Aa
aa
Inheritance Models in Quantitative Trait
A
x increase in height
a
x decrease in height
Inheritance Models in Quantitative Trait
Model
-x
A is Completely
Dominant
aa
A is Partially
Dominant
aa
A is Not (Co-)
Dominant
aa
A is OverDominant
aa
Population Mean
0
+x
AA
Aa
Aa
Aa
AA
AA
AA
Aa
Quantitative Traits with Published GWA
Studies (16 - 34)
• QT interval
• Lipids and lipoproteins
• Memory
• Nicotine dependence
• ORMDL3 expression
• YKL-40 levels
• Obesity, BMI, waist
• Insulin resistance
• Height
• Bone mineral density
• F-cell distribution
• Fetal hemoglobin levels
• C-Reactive protein
• 18 groups of
Framingham traits
• Pigmentation
• Uric Acid Levels
• Recombination Rate
Association of Alleles and Genotypes of
rs1333049 (‘3049) with Myocardial Infarction
C
N (%)
G
N (%)
2,132 (55.4)
1,716 (44.6)
Controls 2,783 (47.4)
3,089 (52.6)
Cases
Allelic Odds Ratio = 1.38
Samani N et al, N Engl J Med 2007; 357:443-453.
2
(1df)
P-value
55.1
1.2 x 10-13
Association of Alleles and Genotypes of
rs1333049 (‘3049) with Myocardial Infarction
C
N (%)
G
N (%)
2,132 (55.4)
1,716 (44.6)
Controls 2,783 (47.4)
3,089 (52.6)
Cases
2
(1df)
P-value
55.1
1.2 x 10-13
2
(2df)
P-value
59.7
1.1 x 10-14
Allelic Odds Ratio = 1.38
CC
N (%)
Cases
CG
N (%)
GG
N (%)
586 (30.5)
960 (49.9)
378 (19.6)
Controls 676 (23.0)
1,431 (48.7)
829 (28.2)
Heterozygote Odds Ratio = 1.47
Homozygote Odds Ratio = 1.90
Samani N et al, N Engl J Med 2007; 357:443-453.
-Log10 P Values for SNP Associations with
Myocardial Infarction
Samani N et al, N Engl J Med 2007; 357:443-453.
Genome-Wide Scan for Type 2 Diabetes in a
Scandinavian Cohort
http://www.broad.mit.edu/diabetes/scandinavs/type2.html
GWA Study of Serum Uric Acid Levels
• Linear regression of inverse normalized levels
against number of alleles
• Additive model
• Sex, age, age2 as covariates
Li S et al, PLoS Genet 2007; 3:e194.
Association of rs6855911 and Uric Acid Levels
Genotype Means (mg/dl)
Cohort
Additive Effect
AA
AG
GG
SardiNIA
-0.317
4.66 (1.51)
4.48 (1.59)
4.02 (1.63)
InCHIANTI
-0.397
5.27 (1.44)
4.94 (1.31)
4.33 (1.37)
Li S et al, PLoS Genet 2007; 3:e194.
Association Methods for Quantitative Traits
• Linear regression of multivariable adjusted
residual against number of alleles
(Kathiresan,Nat Genet 2008; 40:189-97)
• Linear regression of log transformed or
centralized BMI against genotype (Frayling,
Science 2007; 316:889-94)
• Variance components based Z-score analysis
of quantile normalized height (Sanna, Nat
Genet 2008; 40:198-203)
Ways of Dealing with Multiple Testing
• Control family wise error rate (FWER):
Bonferroni (α’ = α/n) or Sĭdák (α’ = 1- [1- α]1/n)
• False discovery rate: proportion of significant
associations that are actually false positives
• False positive report probability: probability
that the null hypothesis is true, given a
statistically significant finding
• Bayes factors analysis: avoids need for
assessing genome-wide error rates but must
identify reasonable alternative model
Hogart CJ et al, Genet Epidemiol 2008; 32:179-85.
Larson, G. The Complete Far Side. 2003.
Quality Control of SNP Genotyping: Samples
• Identity with forensic markers (Identifiler)
• Blind duplicates
• Gender checks
• Cryptic relatedness or unsuspected twinning
• Degradation/fragmentation
• Call rate (> 80-90%)
• Heterozygosity: outliers
• Plate/batch calling effects
Chanock et al, Nature 2007; Manolio et al Nat Genet 2007
Quality Control of SNP Genotyping: SNPs
• Duplicate concordance (CEPH samples)
• Mendelian errors (typically < 1)
• Hardy-Weinberg errors (often > 10-5)
• Heterozygosity (outliers)
• Call rate (typically > 98%)
• Minor allele frequency (often > 1%)
• Validation of most critical results on independent
genotyping platform
Chanock et al, Nature 2007; Manolio et al Nat Genet 2007
Hardy-Weinberg Equilibrium
• Occurrence of two alleles of a SNP in the same
individual are two independent events
• Ideal conditions:
– random mating
- no selection (equal survival)
– no migration
- no mutation
– no inbreeding
- large population sizes
– gene frequencies equal in males and females)…
• If alleles A and a of SNP rs1234 have frequencies p
and 1-p, expected frequencies of the three
genotypes are:
Freq AA = p2
After G. Thomas, NCI
Freq Aa = 2p(1-p)
Freq aa = (1-p)2
Coverage, Call Rates, and Concordance of Perlegen
and Affymetrix Platforms on HapMap Phase II
Metric
Number of SNPs
Coverage
CEU
CHB + JPT
YRI
Average call rate
Concordance
Homozygous
genotypes
Heterozygous
genotypes
Perlegen
Affymetrix/Broad
480,744
Single
MultiMarker
Marker
0.90
0.96
0.87
0.93
0.64
0.78
98.9%
439,249
MultiSingle
Marker
Marker
0.78
0.87
0.78
0.86
0.63
0.75
99.3%
99.8%
99.9%
99.8%
99.8%
GAIN Collaborative Group, Nat Genet 2007; 39:1045-51.
Sample and SNP QC Metrics for Affymetrix 5.0 and
6.0 Platforms in GAIN
Metric
Total Samples
Passing QC
> 98% call rate
5.0
% fail
6.0
% fail
1,829
1,817
1,815
-0.44
0.55
2,289
2,192
2,257
-4.24
1.40
Courtesy, J Paschall, NCBI
Sample and SNP QC Metrics for Affymetrix 5.0 and
6.0 Platforms in GAIN
Metric
Total Samples
Passing QC
> 98% call rate
Total SNPs
Passing QC
MAF > 1%
> 98% call rate
> 95% call rate
HWE < 10 -6
< 1 Mendel error
< 1 Duplicate error
5.0
% fail
6.0
% fail
1,829
1,817
1,815
-0.44
0.55
2,289
2,192
2,257
-4.24
1.40
457,645
429,309
457,466
419,810
439,272
455,899
417,722
454,820
-6.19
0.04
8.27
4.01
0.38
8.72
0.01
906,660
845,814
888,234
821,942
873,856
904,275
899,721
892,103
-6.70
2.03
9.34
3.61
0.26
0.01
0.02
Courtesy, J Paschall, NCBI
Sample Heterozygosity in GAIN
2,500
Frequency
2,000
1,500
1,000
500
0
0.20
0.22
0.24 0.26
Courtesy, J Paschall, NCBI
0.28
0.30 0.32
0.34 0.36
0.38
0.40
Sample Heterozygosity in GAIN
100
90
80
Frequency
70
60
50
40
30
20
10
0
0.20
0.22
0.24
0.26
Courtesy, J Paschall, NCBI
0.28
0.30
0.32
0.34
0.36
0.38
0.40
Signal Intensity Plots for rs10801532 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for rs4639796 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for rs534399 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for rs572515 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for CD44 SNP
rs9666607
Clayton DG et al, Nat Genet 2005; 37:1243-1246.
Principal Component Analysis of Structured
Population: First to Third Components
Courtesy, G. Thomas, NCI
Principal Component Analysis of Structured
Population: Fourth and Fifth Components
Courtesy, G. Thomas, NCI
Influence of Relatedness on Principal
Component Analysis
Courtesy, G. Thomas, NCI
Principal Component Analysis of Structured
Population: Fourth and Fifth Components
Courtesy, G. Thomas, NCI
Principal Component Analysis of Structured
Population: Fourth and Fifth Components
Courtesy, G. Thomas, NCI
Summary Points: Genotyping Quality Control
• Sample checks for identity, gender error, cryptic
relatedness
• Sample handling differences can introduce
artifacts but probably can be adjusted for
• Association analysis is often quickest way to
find genotyping errors
• Low MAF SNPs are most difficult to call
• Inspection of genotyping cluster plots is crucial!
Quantile-Quantile Plot for Test Statistics,
390 Breast Cancer Cases, 364 Controls
205,586 SNPs
λ = 1.03
Easton D et al, Nature 2007; 447:1087-1093.
Observed and Expected Associations after
Stage 2 of Breast Cancer GWA
Significance
0.01 - 0.05
Observed
Observed
Expected
Adjusted
Ratio
1,239
1,162
934
1.24
10-3 – 10-2
574
517
348
1.49
10-4 – 10-3
112
88
53
1.65
10-5 – 10-4
16
12
7
1.71
< 10-5
15
13
1
1,956
1,792
1,343
All p < 0.05
Easton D et al, Nature 2007; 447:1087-93.
13.5
1.33
Q-Q Plot for Multiple Sclerosis; Effect of MHC
Hafler D et al, N Engl J Med 2007; 357:851-862.
Q-Q Plot for Prostate Cancer,
all SNPs
Gudmundsson J et al, Nat Genet 2007; 39:977-983.
Q-Q Plot for Prostate Cancer,
excluding Chromosome 8
Gudmundsson J et al, Nat Genet 2007; 39:977-983.
40
20
0
Observed chi-squared statistic
60
Q-Q Plot for Myocardial Infarction
0
5
10
15
Expected chi-squared statistic
20
Samani N et al, N Engl J Med 2007; 357:443-453.
25
-Log10 P Values for SNP Associations with
Myocardial Infarction
Samani N et al, N Engl J Med 2007; 357:443-453.
-Log10 P Values for SNP Associations with
Myocardial Infarction
Samani N et al, N Engl J Med 2007; 357:443-453.
SNP Associations with 1,928 MI Cases and
2,938 Controls from UK
Samani N et al, N Engl J Med 2007; 357:443-453.
Association Signal for Coronary Artery
Disease on Chromosome 9
’3049
Samani N et al, N Engl J Med 2007; 357:443-453.
Winner’s Curse: Odds Ratios for CHD Associated
with LTA Genotypes in Multiple Studies
Clarke et al, PLoS Genet 2006; 2:e107.
Genome-Wide Scan for Alzheimer’s Disease
in 861 Cases and 550 Controls
Reiman E et al, Neuron 2007; 54:713-20.
Genome-Wide Scan for Alzheimer’s Disease
in ApoE*e4Carriers
Reiman E et al, Neuron 2007; 54:713-20.
LOAD Odds Ratios Associated with
rs2373115 GG by APOE*e4 Status
APOE*e4
Group
APOE*e4
OR [95% CI]
rs2373115
OR [95%CI]
APOE*e4 -
1.12 [0.82,1.53]
APOE*e4 +
2.88 [1.90,4.36]
All
6.07 [4.63-7.95]
Reiman et al, Neuron 2007; 54:713-720.
1.34 [1.06,1.70]
P Values of GWA Scan for Age-Related
Macular Degeneration
Klein et al, Science 2005; 308:385-389.
Odds Ratios and Population Attributable
Risks for AMD
Allelic association χ P value
Odds ratio (dominant)
Frequency in HapMap CEU
Population Attributable Risk
rs380390
(C/G)
C
–8
x
4.1 10
rs1329428
(C/T)
C
–6
x
1.4 10
4.6 [2.0-11]
0.70
70% [42-84%]
4.7 [1.0-22]
0.82
80% [0-96%]
Odds ratio (recessive)
Frequency in HapMap CEU
Population Attributable Risk
7.4 [2.9-19]
6.2 [2.9-13]
0.23
0.41
46% [31-57%] 61% [43-73%]
Attribute (SNP)
Risk allele
2
Klein et al, Science 2005; 308:385-389.
Risk of Developing AMD by CFH Y402H
and Modifiable Risk Factors
Risk Factor
BMI < 30 kg/m2
BMI > 30 kg/m2
Non-smoker
Current smoker
CFH Y402H Genotype
YY
1.00
1.98
[0.91-4.31]
1.00
2.34
[1.20-4.55]
YH
HH
1.95
3.96
[1.42-2.67]
[2.69-5.82]
2.19
12.28
[1.11-4.30]
[4.88-30.90]
1.95
4.23
[1.41-2.71]
[2.86-6.27]
3.20
8.69
[1.85-5.55]
[3.86-19.57]
Schaumberg DA et al, Arch Ophthalmol 2007; 125:55-62.
Interaction: Is LIPC Genotype Related to HDL-C?
CC
TT
CT
CT
TT
CC
Ordovas et al, Circulation 2002; 106:2315-2321.
Inverse Relation between Endotoxin Exposure
and Allergic Sensitization by CD14 Genotype
Simpson A et al, Am J Respir Crit Care Med 2006;174:386-392.
Challenges in Studying Gene-Environment
Interactions
Challenge
Genes
Environment
Ease of measure
Pretty easy
Often hard
Variability over time
Low/none
High
Recall bias
None
Possible
Temporal relation
to disease
Easy
Hard
Larson, G. The Complete Far Side. 2003.
Download