Lecture Slides - McMaster University

Analytical challenges in genetic association studies
David Meyre, Associate Professor, McMaster University
(meyred@mcmaster.ca)
HRM 728 Graduate Course: Genetic Epidemiology – November, 7th 2014
Li & Meyre., Int J Obes 2013
The march of technology
1980
single variant
(100 SNPs)
detailed study of individual genes
(102 SNPs)
1990
regional studies
(104 SNPs)
2000
2006
genome-wide association
(5 105 SNPs)
3,5 106 SNPs (2007)
2011
Whole-genome sequencing
(3 107 SNPs)
A storm of data to deal with!
Analytical challenges in genetic association studies
. 426 positive findings in 127 genes but….
. only 22 genes associated with obesity-related phenotypes in >
5 studies
 Replication is challenging in genetic epidemiology
 Skepticism in the medical / scientific community
Rankinen et al., Obesity 2006
Analytical challenges in genetic association studies
I. Analytical challenges to find a true association in a
discovery study (risk of false positive result)
II. Analytical challenges to replicate a true positive association
III. Guidelines for proper discovery and replication association
study designs
Analytical challenges to find a true association in a
discovery study
Are you ready for the
Episode 1 of the saga!
Analytical challenges in genetic association studies
I. Lack of replication may occur because the original study
reports a false positive result
1) The phenotype is not heritable
Obesity is an heritable disease
. 2 obese parents  10-fold increased risk for childhood obesity
. Obesity has a strong genetic component: heritability  50- 85%
(Stunkard et al., NEJM 1986; Wardle et al., AJCN 2008)
Analytical challenges in genetic association studies
I. Lack of replication may occur because the original study
reports a false positive result
1) The phenotype is not heritable
2) Insufficient sample size
Statistical power and sample size
Effect sizes for obesity-associated common genetic variants are small (OR < 2)
Statistical power and sample size
MAF in controls
0.01
0.05
0.1
0.2
0.3
0.4
1.1
443,854
92,868
49,252
27,974
21,518
19,010
1.2
116,354
24,434
13,018
7,460
5,792
5,162
1.3
54,110
11,404
6,102
3,526
2,760
2,480
1.5
21,208
4,498
2,426
1,424
1,132
1,032
2.0
6,386
1,374
754
458
376
354
Allelic OR
Table 1. Sample sizes needed in a case control design to detect significant association
with a power of 90% and a two-sided P-value of 0.001 by odds ratio and allele
frequency for risk allele. Calculations assume multiplicative effect on disease risk. Sample
sizes presented are total number of cases and controls needed, assuming an equal number of
cases and controls.
GAD2 or the importance of a well-powered study
.Association of the GAD2 promoter gene variant -243 A>G with morbid
obesity (OR=1.05-1.58, P=0.01) using 575 cases and 646 controls
. No prior statistical power calculation in the princeps study
.Lack of confirmation of the association of the GAD2 promoter gene
variant -243 A>G with morbid obesity (OR=0.90-1.36, P=0.28) in a metaanalysis of 1,252 cases and 1,800 controls
Boutin et al., PLOS Biol 2003, Swarbrick et al., PLOS Biol 2005
Statistical power and rare variant analysis
“We identified six highly correlated SNPs that show
strong and comparable associations with risk of type 2
diabetes, but further refinement of these associations
will require large sample sizes (>100,000) or studies in
ethnically diverse populations.“
Fawcett et al., Diabetes 2010
Analytical challenges in genetic association studies
I. Lack of replication may occur because the original study
reports a false positive result
1) The phenotype is not heritable
2) Insufficient sample size
3) Lack of correction for multiple testing
Multiple testing in the post-GWAS area
1 million
polymorphisms!
Bonferroni correction: Pcorrected = 0.05 / 1,000,000 = 5 x 10-8
2 SNP gene x gene interactions: Pcorrected = 1x 10-13
Multiple testing in the whole-exome/genome
sequencing area
30 million polymorphisms
20,000 genes
Bonferroni correction SNPs: Pcorrected = 0.05 / 30,000,000 = 1 x 10-9
Bonferroni correction genes: Pcorrected = 0.05 / 20,000 = 2.5 x 10-6
INSIG2: a GWA false positive association
Science April 2006
Science January 2007
INSIG2 rs7566605 variant is
INSIG2: lack of association with obesity
associated with obesity
in 3 independent designs (N=22,381)
(ORmeta-analysis=1.05-1.42,
P =0.008), far from the threshold
of significance after multiple
testing correction (P=5 x 10-7)
Analytical challenges in genetic association studies
I. Lack of replication may occur because the original study
reports a false positive result
1) The phenotype is not heritable
2) Insufficient sample size
3) Lack of correction for multiple testing
4) Geographical population substructure
Lactase persistence and population substructure
LCT rs4988235 T allele
frequency in UK
Davey-Smith et al., EJHG 2009
Rare variants and founder effects
-common SNP associated with adiponectin level in Fillipinos by GWAS
-exon resequencing identified a rare coding variant (R221S) in LD with the
common SNP strongly associated with adiponectin level
-the mutation is found exclusively in Fillipinos
Croteau-Chonka et al., HMG 2012
Analytical challenges in genetic association studies
I. Lack of replication may occur because the original study
reports a false positive result
1) The phenotype is not heritable
2) Insufficient sample size
3) Lack of correction for multiple testing
4) Geographical population substructure
5) Technological biases, lack of quality control procedure
INS VNTR and association with childhood obesity, a
technological bias?
. Association of the INS VNTR
variant with childhood obesity
. Lack of association of the INS VNTR
variant with childhood obesity
. Genotyping by RFLP, a highly
subjective method (Peters et al., CCM 2003)
. Genotyping by TaqMan, a highly
reliable method
. Family-based design to enable a highstandard quality control procedure
Le Stunff et al., Nat Genet 2000, Bouatia-Naji et al., Obesity 2008
Next generation sequencing and false-positive mutations
. 10% of mutations are technological
artifacts in next generation sequencing
. The rate of false positive mutations is
higher in ‘old’ DNA libraries
 Use of pedigrees, confirmation of
mutations by Sanger resequencing
 New methods (Rain Dance technology)
Bonnefond et al., PLOS One 2012
Analytical challenges in genetic association studies
I. Lack of replication may occur because the original study
reports a false positive result
1) The phenotype is not heritable
2) Insufficient sample size
3) Lack of correction for multiple testing
4) Geographical population substructure
5) Technological biases, lack of quality control procedure
6) Inappropriate statistical analysis
Association and adjustement for confounding factors
. Association between FTO intron 1 SNP and type 2 diabetes
(OR=1.09-1.23, P= 5x 10-8) if adjustment for sex and age
. Lack of association between FTO intron 1 SNP and type 2 diabetes
(OR=0.96-1.10, P= 0.44) if adjustment for sex, age and BMI
FTO is an obesity gene
 Inappropriate adjustment (or lack of adjustment) can lead to wrong
conclusions
Frayling et al., Science 2007
Analytical challenges in genetic association studies
I. Lack of replication may occur because the original study
reports a false positive result
1) The phenotype is not heritable
2) Insufficient sample size
3) Lack of correction for multiple testing
4) Geographical population substructure
5) Technological biases, lack of quality control procedure
6) Inappropriate statistical analysis
Analytical challenges to replicate a true positive association
Now the Episode 2 of the saga!
Analytical challenges in genetic association studies
II. Replication may be challenging even when the original result
is a true positive association
1) Willingness to replicate the original study
Lactase persistence and BMI variation
Despite a convincing initial evidence of
association between the LCT rs4988235
T variant and BMI (P=8 x 10-5) in 31,720
European individuals…
Kettunen et al., HMG 2009
Lactase persistence and BMI variation
Replication studies showed-up after 2-4 years…
Correla et al., Obesity 2011
Analytical challenges in genetic association studies
II. Replication may be challenging even when the original result
is a true positive association
1) Willingness to replicate the original study
2) Winner’s curse effect and sample size in follow-up studies
Obesity loci from GIANT and replication
. Due to the small effect size of the SNPs on BMI variation, only a
fraction of these associations replicates for obvious statistical power
concerns (den Hoed et al., Diabetes 2010)
Analytical challenges in genetic association studies
II. Replication may be challenging even when the original result
is a true positive association
1) Willingness to replicate the original study
2) Winner’s curse effect and sample size in follow-up studies
3) Gene x gene, gene x environment interactions
Interactions between FTO SNP and physical activity
.The effect of the rs9939609 SNP on obesity risk is
decreased by 27% in physically active adults
. No genotype x physical activity interaction on
obesity risk in children
Kilpelainen et al., PLOS Med 2012
Savage et al., Nat Genet 2002
Analytical challenges in genetic association studies
II. Replication may be challenging even when the original result
is a true positive association
1) Willingness to replicate the original study
2) Winner’s curse effect and sample size in follow-up studies
3) Gene x gene, gene x environment interactions
4) Heterogeneity (ethnic heterogeneity, phenotype heterogeneity)
Ethnicity and linkage disequilibrium blocs
SNP1
SNP2
SNP3
SNP4
SNP5
Icelandic
French
Asian
African
Distance (Kb)
Causal SNP
Proxy SNP
Disease-associated
LD block
Ethnicity and SNP allele frequency
. Intronic variation (rs2237892) in a new locus (KCNQ1) was strongly
associated with T2D in Asian (OR: 1.26-1.42, 10-40< P-value < 10-12)
. The association with T2D was nominally replicated in European descent
populations (DIAGRAM: P=0.01), with similar OR but lower risk allele
frequency (5-7% in European, 28-40% in Asian)
Obesity, waist and BMI have a partially overlapping
genetic architecture
Analytical challenges in genetic association studies
II. Replication may be challenging even when the original result
is a true positive association
1) Willingness to replicate the original study
2) Winner’s curse effect and sample size in follow-up studies
3) Gene x gene, gene x environment interactions
4) Heterogeneity (ethnic heterogeneity, phenotyp heterogeneity)
5) Inheritance model (parent of origin effects, de novo mutations…)
Analytical challenges in genetic association studies
II. Replication may be challenging even when the original result
is a true positive association
1) Willingness to replicate the original study
2) Winner’s curse effect and sample size in follow-up studies
3) Gene x gene, gene x environment interactions
4) Heterogeneity (ethnic heterogeneity, phenotyp heterogeneity)
5) Inheritance model
6) Subjective interpretation of data
Subjective interpretation of data
Is this glass half-full or half-empty?
Analytical challenges in genetic association studies
II. Replication may be challenging even when the original result
is a true positive association
1) Willingness to replicate the original study
2) Winner’s curse effect and sample size in follow-up studies
3) Gene x gene, gene x environment interactions
4) Heterogeneity (ethnic heterogeneity, phenotyp heterogeneity)
5) Inheritance model
6) Subjective interpretation of data
Guidelines for proper discovery and replication
association study designs
Enough time for the Episode 3 of the saga?
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Discovery
1) Study designs
Gene discovery study designs
General population
N
Lean
Obese
1) Case control studies from extremes of the BMI tails
2) Quantitative trait studies in the whole population
 Correlation genotype / trait at a genetic locus
 Best approach (GIANT / GIANT extreme): BMI study in the whole population
+ analysis of the extremes of the BMI tails (genetic variance, effect size…)
Berndt et al., Nat Genet 2013
Gene discovery study designs
3) Family-based association studies: allele transmission from
parents to affected offsprings (imprinting, haplotypes….)
4) Cohort studies: correlation of a genotype with an incident
disease event (gold standard)
Gene discovery study designs
General population
N
Lean
Obese
Normal
weight
5) The case control case design: discovery of gene variants
associated with leanness or with obesity (applications in drug design)
The gain-of-function V103I and I251L variants in MC4R
are associated with leanness
French adults
French children
Italian children
16 cohorts:
5964 control and 6370 obese patients
Swiss adults
Ohshiro et al, 1999
Farooqi et al, 2000
Study Reference
Jacobson et al, 2002
Jacobson et al, 2002
Miraglia del Giudice et al, 2002
Hinney et al, 2003
Marti et al, 2003
Valli-Jaakola et al, 2004
OR = 0.53, p-value = 4.26.10-5
Santini et al, 2004
Buono et al, 2005
Larsen et al, 2005
Summary
0.03
0.10
0.32
1.00
3.16
10.00 31.62
exp(Effect)
-Meta-analysis in 39,879 subjects confirms an obesity-protective role of the
V103I polymorphism (OR = 0.80; p-value = 0.002)
-V103I et I251L are infrequent (0.41-2.24%) and induce a gain of function
effect on the melanocorin 4 receptor (Xiang et al., Biochemistry 2006)
Stutzmann et al., HMG 2007
Gene discovery study designs
6) Clinical trials, interventional studies: correlation of a
genotype with response to intervention or treatment (lifestyle
intervention, drug, surgery, smoking cessation, antipsychotic
drug administration….)
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Discovery
1) Study designs
2) Phenotype
How to chose a relevant obesity phenotype?
Heritability for BMI:
Heritability for type 2 diabetes:
-h² = 0.48 at age 4 y.
-h² = 0.69 (onset < 60 y.)
-h² = 0.78 at age 11 y.
-h² = 0.31 (onset < 75 y.)
Haworth et al., Obesity 2008, Almgren et al., Diabetologia 2011
How to chose a relevant obesity phenotype?
-clinically and biologically relevant
-easy and inexpensive to measure
-relevant in diverse ethnicities
-minimal measurement error
-minimal misclassification and reporting biases
 value of BMI to estimate the degree of adiposity questionable
 body fat content, body adiposity index are more relevant
. Genome-wide association study for % fat mass in 36,000
subjects, replication of the best hits in 39,000 subjects
. Three % fat mass-associated loci : FTO, IRS1, SPRY2
. Only one locus (FTO) out of three has been conclusively
associated with BMI body mass index in literature
Kilpelainen et al., Nat Genet 2011
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Discovery
1) Study designs
2) Phenotype
3) Gene identification strategies
Gene identification strategies
AGNOSTIC APPROACH
CANDIDATE GENE APPROACH
-moderately successful
-highly successful
-novel disease causing mechanisms -previously known mechanisms
-significance thresholds
-strong selection criteria needed
-lack of biological relevance
-biological relevance
HIGH-THROUGHPOUT
CANDIDATE GENE APPROACH
(pathway, expression, evolution…)
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Discovery
1) Study designs
2) Phenotype
3) Gene identification strategies
4) Genotyping methodology and quality control procedures
Genotyping methodology and quality control
-exclusion of low quality DNA (cases controls)
-highly reliable genotyping technology
-genotyping call rate (> 95%)\
-Hardy-Weinberg equilibrium (P > 0.005)\
-double genotyping concordance rate (> 99%)
-MAF comparison in public databases
-confirmation by a second method
-association of SNPs in linkage disequilibrium
-accurate experiments / data management and reporting (bar coding,
automated processes, internal controls, flow charts….)
-sex inconsistencies, hidden relatedness, ethnic outliers….
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Discovery
1) Study designs
2) Phenotype
3) Gene identification strategies
4) Genotyping methodology and quality control procedures
5) Statistical analysis
Statistical analysis
-power calculation
-limited number of hypotheses tested
-multiple testing (FDR, Bonferroni…)
-adjustment for confounding factors
-caution with subgroup analyses
-best fitting inheritance model
-conditional analyses
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Discovery
1) Study designs
2) Phenotype
3) Gene identification strategies
4) Genotyping methodology and quality control procedures
5) Statistical analysis
6) Population stratification
Population stratification
-correction for self-reported ethnicity
-exclusion of ethnic outliers
-genomic control
(Ancestry Informative Markers)
-family-based association tests
-case control matched for age, sex, geography…
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Replication
1) Systematic replication and reporting of promising associations
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Replication
1) Systematic replication and reporting of promising associations
2) Statistical power (Winner’s curse effect)
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Replication
1) Systematic replication and reporting of promising associations
2) Statistical power
3) Heterogeneity
How to lower heterogeneity in replication studies?
-same ethnicity / country
-same study design
-same ascertainment criteria
-same phenotype
-same genetic markers
-same age window, same sex ratio
-same inheritance model
-same statistical analysis
-same covariate adjustments
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Replication
1) Systematic replication and reporting of promising associations
2) Statistical power
3) Heterogeneity
4) Meta-analyses
Analytical challenges in genetic association studies
III. Guidelines for proper discovery and replication
association study designs
Replication
1) Systematic replication and reporting of promising associations
2) Statistical power
3) Heterogeneity
4) Meta-analyses
5) Additional studies
Additional studies
-worldwide contribution
-extension to different study designs, ascertainment criteria
-association with obesity endophenotypes
-gene x environment interactions
-fine-mapping, causative gene variants
-functional experiments
-biological insights
FTO in 2007: ‘gene of unknown function in an unknown pathway’
 2014: > 740 articles published
1997: first identification of a monogenic obesity gene (LEP)
2007: first gene variant in FTO conclusively associated with
obesity
2012: 40 monogenic (syndromic / non-syndromic) obesity
genes, > 100 common gene variants conclusively
associated with polygenic obesity
ANY QUESTIONS?
The French fair-play!