Genetics for Imagers: How Geneticists Model Quantitative Phenotypes Nelson Freimer

advertisement

Genetics for Imagers: How

Geneticists Model Quantitative

Phenotypes

Nelson Freimer

UCLA Center for

Neurobehavioral Genetics

What makes a genetic association significant?

Outline

• The problem of achieving validated findings in psychiatric genetics

• Approaches to genetic mapping and statistical significance

- linkage analysis (+ examples)

- association analysis (+ examples

Psychiatric genetics: The brains of the family

10 July 2008 | Nature 454, 154-157 (2008)

Does the difficulty in finding the genes responsible for mental illness reflect the complexity of the genetics or the poor definitions of psychiatric disorders?

“The studies so far are statistically underpowered.

We need bigger studies.”

— Jonathan Flint

“Geneticists know nothing about psychiatric disease.”

— Daniel Weinberger

WHAT IS THE PROBLEM?

• Psychiatric disorders are highly heritable

• No psychiatric susceptibility genes known

• Studies so far are underpowered

– Phenotypes are of uncertain validity

– Samples are too small and markers too few

– Signal to noise ratio is too low

(etiological heterogeneity: genetic and non-genetic)

We are just too ignorant of the underlying neurobiology to make guesses about candidate genes.” —Steven Hyman

This is why geneticists have turned to genome wide mapping

Genome-wide mapping and allelic architecture

Allelic architecture and genetic mapping approaches

NOT FOUND TO DATE

LINKAGE

Family-based Case-control

OR

COPY NUMBER

VARIANTS ASSOCIATION

Rare (<1%) Common (>5%)

Disease Gene Allele Frequency

Founder

Disease Gene

IBD Region

Present-day affected individuals

Shared IBD Region

IBD= Identical By Descent

The Principle of Genetic Linkage

If genes are located on different chromosomes they show independent assortment.

compute this probability

.

However, genes on the same chromosome, especially if they are close to each other, tend to be passed onto their offspring in the same configuration as on the parental chromosomes.

Genetic markers: SNPs

Detecting Genetic Linkage:

Linkage Analysis vs Association

Analysis

• Linkage Analysis

– Using pedigree samples, search for regions of the genome where affected individuals share alleles more than you would expect

• Association Analysis

– Compare allele frequency distributions in cases and controls

• For quantitative traits can apply similar principles

Linkage

Analysis

G ,T

Association

Analysis

G ,T

T,T G ,T

T,T

G ,T

G ,T

T,T

G ,T G ,T G ,T G ,T T,T T,T T,T T,T

When are two genetic loci significantly linked?

Stringent significance thresholds based on…

• Low prior probability of linkage between any two loci

– Considered when there were few markers

• Multiple tests involved in genotyping studies

– Considered after there were many markers

• Both considerations yielded ~ same threshold:

LOD score (log. base 10 of the likelihood ratio) >~

3

(i.e. p < 10 -4 )

• Prior probability of linkage between a given locus and a random genome location: 0.02

• To obtain posterior probability of linkage of

>0.95 (i.e. <0.05 false positive linkages), apply

Bayes theorem:

• Solving for the likelihood ratio Pr(Data |

Linkage) / Pr(Data | NoLinkage)…

– ratio must be >1,000, i.e. LOD >3

Controlling for multiple testing in linkage

• With complete genome marker sets, prior probability that some marker linked is 1

• ~500 fully informative, independent markers cover linkage in all regions of the genome

• To control at 0.05 level, the global hypothesis of no linkage anywhere in the genome:

0.05/500 = 10 -4 for each test, i.e. LOD >3

Significance thresholds for linkage

Lander and Kruglyak, 1996

• Suggestive linkage: a lod score or p value expected to occur once by chance in a whole genome scan.

LOD >2.2, p < 7.4 x 10 -4

• Significant linkage: a lod score or p value expected to occur by chance 0.05 times in a whole genome scan

LOD >3.6, p < 2.2 x 10 -5

• Highly significant linkage: a lod score or p value expected to occur by chance 0.001 times in a whole genome scan.

LOD > 5.4, p < 3 x 10 -7

• Confirmed linkage - a significant linkage observed in one study is confirmed by finding a lod score or p value expected to occur 0.01 times by chance in a specific search of the candidate region.

An example of linkage to a quantitative neurobehavioral trait

Monoamine Neurotransmitters

Norepinephrine and epinephrine

Attention

Blood pressure

Histamine

Gastric acid release

Immune response

Dopamine

Reward

Serotonin

Appetite,Mood

Gastrointestinal motility

From David Krantz

Catecholamine Synthesis and

Degradation

Genome wide linkage analysis of HVA in a vervet monkey pedigree

Vervet research colony pedigree

Heritability of Monoamine

Metabolites in vervet monkeys

MONOAMINE METABOLITES

0.8

0.6

0.4

0.2

0

5-HIAA HVA MHPG h2-GENETIC c2-MATERNAL

HVA level in Vervets on Chromosome 10

Linkage analysis in extended pedigrees may be powerful for structural MRI phenotypes

Brain MRIs in the

VRC

357 Vervets scanned

Mobile Siemens Symphony

1.5 Tesla scanner

Genetic association analysis

Linkage analysis is not very powerful for mapping complex traits

(with many alleles of small effect )

Disease gene discovery methods

NOT FOUND TO DATE

LINKAGE

Family-based Case-control

OR

COPY NUMBER

VARIANTS ASSOCIATION

Rare (<1%) Common (>5%)

Disease Gene Allele Frequency

Linkage

Analysis

G ,T

Association

Analysis

G ,T

T,T G ,T

T,T

G ,T

G ,T

T,T

G ,T G ,T G ,T G ,T T,T T,T T,T T,T

Significance thresholds for association

Consider simple Bayesian argument:

- Prior probability that a random gene associated with trait: ~1/30,000, assuming

30,000 genes/genome

- Likelihood ratio should be > 550,000 for association to be significant (posterior probability >0.95)

- With χ 2 test, p< 2.6 x 10 -7

A more complete evaluation of significance

Posterior odds

(for true association)

= Prior odds x Power

Significance

• Strength of evidence depends on likely number of true associations and power to detect them

• These depend on effect sizes and sample sizes

• Less well-powered studies need more stringent thresholds to control false-positive rate

See Wacholder et al.,

J. National Cancer Institute 2004

Genome wide association thresholds

• Controlling for multiple testing

E.g. Bonferroni: 0.05 x No. of SNPs x No. of traits

E. g. For single trait with 10 6 SNPs, p < 5 x10 -8

• However, more complicated…

– SNPs are not all independent (LD)

– LD varies across genome and populations

– traits are not all independent

• False discovery rate (FDR) increasingly used

(proportion of false positives among all positives)

…if 1 out of 20 hits are false not so bad

Evaluating association in neurobehavioral genetics studies

Monoamine Neurotransmitters

Norepinephrine and epinephrine

Attention

Blood pressure

Histamine

Gastric acid release

Immune response

Dopamine

Reward

Serotonin

Appetite,Mood

Gastrointestinal motility

From David Krantz

Serotonin Transporter Promoter Polymorphism

Association Studies as of 2002

Phenotype P<.05

P>.05

Phenotype

Schizo.

OCD

2

2

3

7

2

0

BP/mood disorder

Personality traits

Suicide Drug response

Anorexia 0 2

4 1

P<.05

8

12

4

Late Onset

Alzheimer’s

2

Alcohol related 5

P>.05

13

10

1

2

2 Smoking related

Autism 2 2 Fibromyalgia 1 0

Panic disorder

0 3

Association of Anxiety-Related Traits with

Polymorphism in the Serotonin Transporter Gene

Regulatory Region

Lesch et al. Science . 1996;274(5292):1527-31 .

• Two samples (N = 221, N = 284)

• Association with P ~ 0.02

A more complete evaluation of significance

Posterior odds

(for true association)

= Prior odds x Power

Significance

• Strength of evidence depends on likely number of true associations and power to detect them

• These depend on effect sizes and sample sizes

• Less well-powered studies need more stringent thresholds to control false-positive rate

See Wacholder et al.,

J. National Cancer Institute 2004

In large samples: No association of

5HTTLPR with temperament

Example from Northern Finland Birth Cohort, N ~ 4000

Influence of Life Stress on

Depression: Moderation by a

Polymorphism in the 5-HTT

Gene

Caspi et al.

Science 301: 386 – 389 2003

Interaction Between the Serotonin

Transporter Gene (5-HTTLPR),

Stressful Life Events, and Risk of

Depression: A Meta-analysis

Risch et al.

JAMA. 2009;301(23):2462-2471.

Logistic Regression Analyses of Risk of

Depression for 14 Studies

Copyright restrictions may apply.

Genomewide association analysis

Progress in identifying gene variants for common traits

Cholesterol

Obesity

Myocardial infarction

QT interval

Atrial Fibrilliation

Type 2 Diabetes

Prostate cancer

Breast cancer

Colon cancer height

PPAR

IBD5

NOD2

Age Related Macular Degeneration

Crohns Disease

Type 1 Diabetes

Systemic Lupus Erythematosus

Asthma

Restless leg syndrome

Gallstone disease

Multiple sclerosis

Rheumatoid arthritis

Glaucoma

CTLA4 KCNJ11 PTPN22

CD25

IRF5

PCSK9

CFH

NOS1AP

IFIH1

PCSK9

CFB/C2

LOC3877

15

8q24

IL23R

TCF7L2

2000 2001 2002 2003 2004 2005 2006

CDKN2B/

A

8q24 #2

8q24 #3

8q24 #4

8q24 #5

8q24 #6

ATG16L1

5p13

10q21

IRGM

NKX2-3

IL12B

3p21

1q24

PTPN2

TCF2

CDKN2B/

A

IGF2BP2

CDKAL1

HHEX

SLC30A8

Slide from David Altshuler

MEIS1 HMGA2

LBXCOR GDF5-

1

BTBD9

C3

UQCC

HMPG

8q24 CDC123

ORMDL3 ADAMTS

4q25

TCF2

GCKR

JAZF1

9

THADA

WSF1

FTO LOXL1

C12orf30 IL7R

ERBB3

KIAA035

0

TRAF1/C

5

STAT4

CD226 ABCG8

16p13 GALNT2

PTPN2 PSRC1

SH2B3

FGFR2

8q24

2007

NCAN

TBL2

TNRC9 TRIB1

MAP3K1 KCTD10

LSP1 ANGLPT

3

GRIN3A

51

HDL Association at 16q22.1

HDL Association near LIPC

Progress in identifying gene variants for common traits

Cholesterol

Obesity

Myocardial infarction

QT interval

Atrial Fibrilliation

Type 2 Diabetes

Prostate cancer

Breast cancer

Colon cancer height

PPAR

IBD5

NOD2

Age Related Macular Degeneration

Crohns Disease

Type 1 Diabetes

Systemic Lupus Erythematosus

Asthma

Restless leg syndrome

Gallstone disease

Multiple sclerosis

Rheumatoid arthritis

Glaucoma

CTLA4 KCNJ11 PTPN22

CD25

IRF5

PCSK9

CFH

NOS1AP

IFIH1

PCSK9

CFB/C2

LOC3877

15

8q24

IL23R

TCF7L2

2000 2001 2002 2003 2004 2005 2006

CDKN2B/

A

8q24 #2

8q24 #3

8q24 #4

8q24 #5

8q24 #6

ATG16L1

5p13

10q21

IRGM

NKX2-3

IL12B

3p21

1q24

PTPN2

TCF2

CDKN2B/

A

IGF2BP2

CDKAL1

HHEX

SLC30A8

Slide from David Altshuler

MEIS1 HMGA2

LBXCOR GDF5-

1

BTBD9

C3

UQCC

HMPG

8q24 CDC123

ORMDL3 ADAMTS

4q25

TCF2

GCKR

JAZF1

9

THADA

WSF1

FTO LOXL1

C12orf30 IL7R

ERBB3

KIAA035

0

TRAF1/C

5

STAT4

CD226 ABCG8

16p13 GALNT2

PTPN2 PSRC1

SH2B3

FGFR2

8q24

2007

NCAN

TBL2

TNRC9 TRIB1

MAP3K1 KCTD10

LSP1 ANGLPT

3

GRIN3A

55

A success story in neuropsychiatry

Genome Wide association in narcolepsy in Japan (222 cases vs 389 controls)

8

6

4

HLA

2

Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

17 18 19 20 21 22

From Emmanuel Mignot

J. Hallmayer et al.

Nature Genetics 41, 708 - 711 (2009)

Narcolepsy is strongly associated with the T-cell receptor alpha locus

~

2000 cases in GWAS +

~2000 cases in replication

Strong genome-wide evidence

Known genes and environment explain little of trait variance

Sequencing: the currently unexplored middle of the allelic spectrum

Whole genome sequencing is coming soon…

But we don’t have very good models for it yet

Summary

• The allelic spectrum of complex traits determines the appropriate genetic mapping approach

• Genetic linkage and association studies require stringent statistical thresholds

• Single candidate gene studies have very low probability of being true positives

• Genome-wide linkage and association studies are beginning to bear fruit for neurobehavioral traits

• Whole-genome sequencing is just around the corner

Download