Genetics for Imagers: How
Geneticists Model Quantitative
Phenotypes
Nelson Freimer
UCLA Center for
Neurobehavioral Genetics
What makes a genetic association significant?
Outline
• The problem of achieving validated findings in psychiatric genetics
• Approaches to genetic mapping and statistical significance
- linkage analysis (+ examples)
- association analysis (+ examples
Psychiatric genetics: The brains of the family
10 July 2008 | Nature 454, 154-157 (2008)
Does the difficulty in finding the genes responsible for mental illness reflect the complexity of the genetics or the poor definitions of psychiatric disorders?
“The studies so far are statistically underpowered.
We need bigger studies.”
— Jonathan Flint
“Geneticists know nothing about psychiatric disease.”
— Daniel Weinberger
WHAT IS THE PROBLEM?
• Psychiatric disorders are highly heritable
• No psychiatric susceptibility genes known
• Studies so far are underpowered
– Phenotypes are of uncertain validity
– Samples are too small and markers too few
– Signal to noise ratio is too low
(etiological heterogeneity: genetic and non-genetic)
“
We are just too ignorant of the underlying neurobiology to make guesses about candidate genes.” —Steven Hyman
This is why geneticists have turned to genome wide mapping
Genome-wide mapping and allelic architecture
Allelic architecture and genetic mapping approaches
NOT FOUND TO DATE
LINKAGE
Family-based Case-control
OR
COPY NUMBER
VARIANTS ASSOCIATION
Rare (<1%) Common (>5%)
Disease Gene Allele Frequency
Founder
Disease Gene
IBD Region
Present-day affected individuals
Shared IBD Region
IBD= Identical By Descent
The Principle of Genetic Linkage
If genes are located on different chromosomes they show independent assortment.
compute this probability
.
However, genes on the same chromosome, especially if they are close to each other, tend to be passed onto their offspring in the same configuration as on the parental chromosomes.
Genetic markers: SNPs
Detecting Genetic Linkage:
Linkage Analysis vs Association
Analysis
• Linkage Analysis
– Using pedigree samples, search for regions of the genome where affected individuals share alleles more than you would expect
• Association Analysis
– Compare allele frequency distributions in cases and controls
• For quantitative traits can apply similar principles
Linkage
Analysis
G ,T
Association
Analysis
G ,T
T,T G ,T
T,T
G ,T
G ,T
T,T
G ,T G ,T G ,T G ,T T,T T,T T,T T,T
When are two genetic loci significantly linked?
Stringent significance thresholds based on…
• Low prior probability of linkage between any two loci
– Considered when there were few markers
• Multiple tests involved in genotyping studies
– Considered after there were many markers
• Both considerations yielded ~ same threshold:
LOD score (log. base 10 of the likelihood ratio) >~
3
(i.e. p < 10 -4 )
• Prior probability of linkage between a given locus and a random genome location: 0.02
• To obtain posterior probability of linkage of
>0.95 (i.e. <0.05 false positive linkages), apply
Bayes theorem:
• Solving for the likelihood ratio Pr(Data |
Linkage) / Pr(Data | NoLinkage)…
– ratio must be >1,000, i.e. LOD >3
Controlling for multiple testing in linkage
• With complete genome marker sets, prior probability that some marker linked is 1
• ~500 fully informative, independent markers cover linkage in all regions of the genome
• To control at 0.05 level, the global hypothesis of no linkage anywhere in the genome:
0.05/500 = 10 -4 for each test, i.e. LOD >3
Significance thresholds for linkage
Lander and Kruglyak, 1996
• Suggestive linkage: a lod score or p value expected to occur once by chance in a whole genome scan.
LOD >2.2, p < 7.4 x 10 -4
• Significant linkage: a lod score or p value expected to occur by chance 0.05 times in a whole genome scan
LOD >3.6, p < 2.2 x 10 -5
• Highly significant linkage: a lod score or p value expected to occur by chance 0.001 times in a whole genome scan.
LOD > 5.4, p < 3 x 10 -7
• Confirmed linkage - a significant linkage observed in one study is confirmed by finding a lod score or p value expected to occur 0.01 times by chance in a specific search of the candidate region.
An example of linkage to a quantitative neurobehavioral trait
Monoamine Neurotransmitters
Norepinephrine and epinephrine
Attention
Blood pressure
Histamine
Gastric acid release
Immune response
Dopamine
Reward
Serotonin
Appetite,Mood
Gastrointestinal motility
From David Krantz
Catecholamine Synthesis and
Degradation
Genome wide linkage analysis of HVA in a vervet monkey pedigree
Vervet research colony pedigree
Heritability of Monoamine
Metabolites in vervet monkeys
MONOAMINE METABOLITES
0.8
0.6
0.4
0.2
0
5-HIAA HVA MHPG h2-GENETIC c2-MATERNAL
HVA level in Vervets on Chromosome 10
Linkage analysis in extended pedigrees may be powerful for structural MRI phenotypes
Brain MRIs in the
VRC
357 Vervets scanned
Mobile Siemens Symphony
1.5 Tesla scanner
Genetic association analysis
Linkage analysis is not very powerful for mapping complex traits
(with many alleles of small effect )
Disease gene discovery methods
NOT FOUND TO DATE
LINKAGE
Family-based Case-control
OR
COPY NUMBER
VARIANTS ASSOCIATION
Rare (<1%) Common (>5%)
Disease Gene Allele Frequency
Linkage
Analysis
G ,T
Association
Analysis
G ,T
T,T G ,T
T,T
G ,T
G ,T
T,T
G ,T G ,T G ,T G ,T T,T T,T T,T T,T
Significance thresholds for association
Consider simple Bayesian argument:
- Prior probability that a random gene associated with trait: ~1/30,000, assuming
30,000 genes/genome
- Likelihood ratio should be > 550,000 for association to be significant (posterior probability >0.95)
- With χ 2 test, p< 2.6 x 10 -7
A more complete evaluation of significance
Posterior odds
(for true association)
= Prior odds x Power
Significance
• Strength of evidence depends on likely number of true associations and power to detect them
• These depend on effect sizes and sample sizes
• Less well-powered studies need more stringent thresholds to control false-positive rate
See Wacholder et al.,
J. National Cancer Institute 2004
Genome wide association thresholds
• Controlling for multiple testing
E.g. Bonferroni: 0.05 x No. of SNPs x No. of traits
E. g. For single trait with 10 6 SNPs, p < 5 x10 -8
• However, more complicated…
– SNPs are not all independent (LD)
– LD varies across genome and populations
– traits are not all independent
• False discovery rate (FDR) increasingly used
(proportion of false positives among all positives)
…if 1 out of 20 hits are false not so bad
Evaluating association in neurobehavioral genetics studies
Monoamine Neurotransmitters
Norepinephrine and epinephrine
Attention
Blood pressure
Histamine
Gastric acid release
Immune response
Dopamine
Reward
Serotonin
Appetite,Mood
Gastrointestinal motility
From David Krantz
Serotonin Transporter Promoter Polymorphism
Association Studies as of 2002
Phenotype P<.05
P>.05
Phenotype
Schizo.
OCD
2
2
3
7
2
0
BP/mood disorder
Personality traits
Suicide Drug response
Anorexia 0 2
4 1
P<.05
8
12
4
Late Onset
Alzheimer’s
2
Alcohol related 5
P>.05
13
10
1
2
2 Smoking related
Autism 2 2 Fibromyalgia 1 0
Panic disorder
0 3
Association of Anxiety-Related Traits with
Polymorphism in the Serotonin Transporter Gene
Regulatory Region
Lesch et al. Science . 1996;274(5292):1527-31 .
• Two samples (N = 221, N = 284)
• Association with P ~ 0.02
A more complete evaluation of significance
Posterior odds
(for true association)
= Prior odds x Power
Significance
• Strength of evidence depends on likely number of true associations and power to detect them
• These depend on effect sizes and sample sizes
• Less well-powered studies need more stringent thresholds to control false-positive rate
See Wacholder et al.,
J. National Cancer Institute 2004
In large samples: No association of
5HTTLPR with temperament
Example from Northern Finland Birth Cohort, N ~ 4000
Influence of Life Stress on
Depression: Moderation by a
Polymorphism in the 5-HTT
Gene
Caspi et al.
Science 301: 386 – 389 2003
Interaction Between the Serotonin
Transporter Gene (5-HTTLPR),
Stressful Life Events, and Risk of
Depression: A Meta-analysis
Risch et al.
JAMA. 2009;301(23):2462-2471.
Logistic Regression Analyses of Risk of
Depression for 14 Studies
Copyright restrictions may apply.
Genomewide association analysis
Progress in identifying gene variants for common traits
Cholesterol
Obesity
Myocardial infarction
QT interval
Atrial Fibrilliation
Type 2 Diabetes
Prostate cancer
Breast cancer
Colon cancer height
PPAR
IBD5
NOD2
Age Related Macular Degeneration
Crohns Disease
Type 1 Diabetes
Systemic Lupus Erythematosus
Asthma
Restless leg syndrome
Gallstone disease
Multiple sclerosis
Rheumatoid arthritis
Glaucoma
CTLA4 KCNJ11 PTPN22
CD25
IRF5
PCSK9
CFH
NOS1AP
IFIH1
PCSK9
CFB/C2
LOC3877
15
8q24
IL23R
TCF7L2
2000 2001 2002 2003 2004 2005 2006
CDKN2B/
A
8q24 #2
8q24 #3
8q24 #4
8q24 #5
8q24 #6
ATG16L1
5p13
10q21
IRGM
NKX2-3
IL12B
3p21
1q24
PTPN2
TCF2
CDKN2B/
A
IGF2BP2
CDKAL1
HHEX
SLC30A8
Slide from David Altshuler
MEIS1 HMGA2
LBXCOR GDF5-
1
BTBD9
C3
UQCC
HMPG
8q24 CDC123
ORMDL3 ADAMTS
4q25
TCF2
GCKR
JAZF1
9
THADA
WSF1
FTO LOXL1
C12orf30 IL7R
ERBB3
KIAA035
0
TRAF1/C
5
STAT4
CD226 ABCG8
16p13 GALNT2
PTPN2 PSRC1
SH2B3
FGFR2
8q24
2007
NCAN
TBL2
TNRC9 TRIB1
MAP3K1 KCTD10
LSP1 ANGLPT
3
GRIN3A
51
HDL Association at 16q22.1
HDL Association near LIPC
Progress in identifying gene variants for common traits
Cholesterol
Obesity
Myocardial infarction
QT interval
Atrial Fibrilliation
Type 2 Diabetes
Prostate cancer
Breast cancer
Colon cancer height
PPAR
IBD5
NOD2
Age Related Macular Degeneration
Crohns Disease
Type 1 Diabetes
Systemic Lupus Erythematosus
Asthma
Restless leg syndrome
Gallstone disease
Multiple sclerosis
Rheumatoid arthritis
Glaucoma
CTLA4 KCNJ11 PTPN22
CD25
IRF5
PCSK9
CFH
NOS1AP
IFIH1
PCSK9
CFB/C2
LOC3877
15
8q24
IL23R
TCF7L2
2000 2001 2002 2003 2004 2005 2006
CDKN2B/
A
8q24 #2
8q24 #3
8q24 #4
8q24 #5
8q24 #6
ATG16L1
5p13
10q21
IRGM
NKX2-3
IL12B
3p21
1q24
PTPN2
TCF2
CDKN2B/
A
IGF2BP2
CDKAL1
HHEX
SLC30A8
Slide from David Altshuler
MEIS1 HMGA2
LBXCOR GDF5-
1
BTBD9
C3
UQCC
HMPG
8q24 CDC123
ORMDL3 ADAMTS
4q25
TCF2
GCKR
JAZF1
9
THADA
WSF1
FTO LOXL1
C12orf30 IL7R
ERBB3
KIAA035
0
TRAF1/C
5
STAT4
CD226 ABCG8
16p13 GALNT2
PTPN2 PSRC1
SH2B3
FGFR2
8q24
2007
NCAN
TBL2
TNRC9 TRIB1
MAP3K1 KCTD10
LSP1 ANGLPT
3
GRIN3A
55
A success story in neuropsychiatry
Genome Wide association in narcolepsy in Japan (222 cases vs 389 controls)
8
6
4
HLA
2
Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22
From Emmanuel Mignot
J. Hallmayer et al.
Nature Genetics 41, 708 - 711 (2009)
Narcolepsy is strongly associated with the T-cell receptor alpha locus
~
2000 cases in GWAS +
~2000 cases in replication
Strong genome-wide evidence
Known genes and environment explain little of trait variance
Sequencing: the currently unexplored middle of the allelic spectrum
Whole genome sequencing is coming soon…
But we don’t have very good models for it yet
Summary
• The allelic spectrum of complex traits determines the appropriate genetic mapping approach
• Genetic linkage and association studies require stringent statistical thresholds
• Single candidate gene studies have very low probability of being true positives
• Genome-wide linkage and association studies are beginning to bear fruit for neurobehavioral traits
• Whole-genome sequencing is just around the corner