presentation - Vanderbilt Kennedy Center

advertisement

Statistical Issues in Human Genetics

Jonathan L. Haines Ph.D.

Center for Human Genetics Research

Vanderbilt University Medical Center

COMMON COMPLEX DISEASE

Genes

Environmen t

Complex Disease

COMMON COMPLEX DISEASE

Genes

Environmen t

Complex Disease

What Can The Genes Tell Us?

Give us a better understanding of the underlying biology of the trait in question

• Serve as direct targets for better treatments

– Pharmacogenetics

Interventions

Give us better predictions of who might develop disease

Give us better predictions of the course of the disease

Lead to knowledge that can help find a cure or prevention

Watson and Crick started it all in 1953 with the description of

DNA

53 Year Anniversary of the paper will be in

April.

Both Won Nobel

Prize

The DNA Between Individuals is Identical.

All differences are in the 0.1% of DNA that varies.

T

C

C

A

A

C

C

G

G

G

T

G

C

A

A

C

C

G

G

G

It’s hard to believe sometimes!

HUMAN CHROMOSOMES

Single-Nucleotide Polymorphisms

(SNPs)

One of the most common types of variation

1 st Chromosome

2 nd Chromosome

GATCCT

G

TAGCT

GATCCT

C

TAGCT

Normal

GATCCT G TAGCT

GATCCT G TAGCT

< Normal

< Disease

G/C

Affected

GATCCT C TAGCT

GATCCT C TAGCT

Extremely frequent across the genome (~1/400 bp) -> high resolution

Easy to genotype -> high-throughput techniques

What are We Looking For?

Earth City Street Address

Human Genome Chromosome Band Gene (DNA)

640 cubic yards 3,000 MB

1/100 cubic inch

It really is like finding a needle in a haystack!

(and a very BIG haystack, at that)

1 x 10 -6 MB

The Genome Sequence is not THE answer!

Disease Gene Discovery In Complex Disease

1. Define Phenotype a. Consistency b. Accuracy

2. Define the Genetic Component a. Twin Studies b. Adoption Studies c. Family Studies d. Heritability e. Segregation Analysis

3. Define Experimental Design

4. Ascertain Families a. Case-Control b. Singleton c. Sib Pairs d. Affected Relative Pairs

5. Collect Data a. Family Histories b. Clinical Results c. Risk Factors d. DNA Samples

6. Perform Genotype Generation a. Genomic Screen b. Candidate Gene a. Model-dependent

Lod score

7. Analyze data b. Model-independent sib-pair, relative pair c. Association studies case-control, family-based

8. Identify, Test, and Localize Regions of Interest

9. Bioinformatics and Gene Identification

10. Identify Susceptibility Variation(s)

11. Define Interactions a. Gene-Gene b. Gene-Environment

CLASSES OF HUMAN

GENETIC DISEASE

Diseases of Simple Genetic Architecture

– Can tell how trait is passed in a family: follows a recognizable pattern

– One gene per family

– Often called Mendelian disease

– Usually quite rare in population

– “Causative” gene

Diseases of Complex Genetic Architecture

– No clear pattern of inheritance

– Moderate to strong evidence of being inherited

– Common in population: cancer, heart disease, dementia etc.

Involves many genes or genes and environment

– “Susceptibility” genes

CLASSES OF HUMAN

GENETIC DISEASE

Diseases of Simple Genetic Architecture

– Can tell how trait is passed in a family: follows a recognizable pattern

– One gene per family

– Often called Mendelian disease

– Usually quite rare in population

– “Causative” gene

Diseases of Complex Genetic Architecture

– No clear pattern of inheritance

– Moderate to strong evidence of being inherited

– Common in population: cancer, heart disease, dementia etc.

Involves many genes or genes and environment

– “Susceptibility” genes

Modes of Inheritance

• Autosomal Dominant

– Huntington disease

• Autosomal Recessive

Cystic fibrosis

X-linked

Duchenne muscular dystrophy

Mitochondrial

– Leber Optic atrophy

Additive

– HLA-DR in multiple sclerosis

Combinations of the above

– RP (39 loci), Nonsyndromic deafness

Linkage Analysis

Traces the segregation of the trait through a family

Traces the segregation of the chromosomes through a family

Statistically measures the correlation of the segregation of the trait with the segregation of the chromosome

A SAMPLE PEDIGREE

The RED chromosome is key

Measures of Linkage

Parametric Vs Non-Parametric

Two major approaches toward linkage analysis

Parametric: Defines a genetic model of the action of the trait locus (loci). This allows more complete use of the available data (inheritance patterns and phenotype information).

– The historical approach towards linkage analysis.

Development driven by need to map simple Mendelian diseases

– Quite powerful when model is correctly defined

Non-Parametric: Uses either a partial genetic model or no genetic model. Relies on estimates of allele/ haplotype/region sharing across relatives. Makes far fewer assumptions about the action of the underlying trait locus(loci).

Linkage Analysis

• Families

– Affected sibpairs

– Affected relative pairs

Extended families

Traits

– Qualitative (affected or not)

– Quantitative (ordinal, continuous)

• There are numerous different methods that can be applied

These methods differ dramatically depending on the types of families and traits

Recombination: Nature’s way of making new combinations of genetic variants

A. B. C. D.

A. A diploid cell.

B. DNA replication and pairing of homologous chromosomes to form bivalent.

C. Chiasma are formed between the chromatids of homologous chromosomes

D. Recombination is complete by the end of prophase I.

Linkage Analysis in Humans

Measure the rate of recombination between two or more loci on a chromosome

Can be done with any loci, but primary application is to find the location of a trait variant by measuring linkage to known marker variants.

LOD Score Analysis

The likelihood ratio as defined by Morton (1955):

L(pedigree|

= x)

L(pedigree |

= 0.50) where

 represents the recombination fraction and where 0

 x

0.49.

When all meioses are “scorable”, the LR is constructed as:

L.R. =

(

R

(

( 1

 

)

NR

0 .

5 ) N

)

The LOD score (z) is the log

10

(L.R.)

: z(

) is the lod score at a particular value of the recombination fraction

: z(

) is the maximum lod score, which occurs at the MLE of the recombination fraction

CLASSES OF HUMAN

GENETIC DISEASE

Diseases of Simple Genetic Architecture

– Can tell how trait is passed in a family: follows a recognizable pattern

– One gene per family

– Often called Mendelian disease

– Usually quite rare in population

– “Causative” gene

Diseases of Complex Genetic Architecture

– No clear pattern of inheritance

– Moderate to strong evidence of being inherited

– Common in population: cancer, heart disease, dementia etc.

Involves many genes or genes and environment

– “Susceptibility” genes

Large Families

Study Designs

Linkage Analysis

Small Families

Family-Based

Association Studies

Case-Control

Linkage vs. Association

Linkage

Shared within Families

Association

Shared across Families

TESTING CANDIDATE GENES

Disease Normal

5/20

Gene is not important

5/20

TESTING CANDIDATE GENES

Disease Normal

10/20

Gene may be important

5/20

Two Basic Study Designs for Association Analysis

Family-Based

– Parent-child Trio

Discordant sibpairs

Advantages

Use existing samples

– Robustness to assumptions

• Disadvantages

– Ascertainment

– Power

Case-Control

• Advantages

– Power

– Ascertainment

Disadvantages

– Sensitivity to assumptions

Matching

METHODS FOR FAMILY-

BASED ASSOCIATION STUDIES

Parent-Child

AFBAC

• TDT

HHRR

• QTDT

• Sibship

SDT

– WSDT

– FBAT

Sibpair

S-TDT

• DAT

Pedigree

Transmit

– PDT

– FBAT

TRANSMISSION

DISEQUILIBRIUM TEST (TDT)

• Examines transmission of alleles to affected individuals

Requires:

– Linkage (transmission through meioses); and

Association (specific alleles)

Test of linkage if association assumed

• Test of association if linkage assumed

Test of linkage AND association if neither assumed

• Uses the non-transmitted alleles, effectively, as the control group. Can make “pseudocontrol” by creating genotype of the two non-transmitted alleles

• Requires phenotype only for the child

TDT calculation

12 12

Transmitted

1 2

A B

C D

11

TDT=

(B-C) 2

(B+C)

With > 5 per cell, this follows a

2 distribution with 1 df

TDT

12 12

11

Transmitted

1 2

Not transmitted 1 0 0

2 2 0

TDT

22 12

12

Transmitted

1 2

Not transmitted 1 0 0

2 1 1

TDT

22 11

12

Transmitted

1 2

Not transmitted 1 1 0

2 0 1

Transmitted

1 2

A B

C D

TDT=

(B-C) 2

(B+C)

TDT Example

Transmitted

1 2

25 42

25 42

TDT=

(42-25) 2

(42+25)

= 4.31

Two Basic Study Designs for Association Analysis

Family-Based

– Parent-child Trio

Discordant sibpairs

Advantages

Use existing samples

– Robustness to assumptions

• Disadvantages

– Ascertainment

– Power

Case-Control

• Advantages

– Power

– Ascertainment

Disadvantages

– Sensitivity to assumptions

Matching

Analysis of Case-Control Data

• Standard epidemiological approaches can be used

Qualitative trait

– Logistic regression

Quantitative trait

– Linear regression

The usual concerns about matching but must also worry about false-positives from population substructure

Incorporating Genetics into Your Studies

• Obtain appropriate IRB approval

– DNA studies are quite common

– Template language exists for IRB approval and consent forms

– Genetic Studies Ascertainment Core (GSAC) can help

– Kelly Taylor: ktaylor@chgr.mc.vanderbilt.edu

Collect family history information

Obtain DNA sample

– Venipuncture

– Buccal wash/swab

– Finger stick

• Extract/Store DNA

DNA Resources Core can help

Cara Sutcliffe: cara@chgr.mc.vanderbilt.edu

• http://chgr.mc.vanderbilt.edu/

What Can The Genes Tell Us?

Give us a better understanding of the underlying biology of the trait in question

• Serve as direct targets for better treatments

– Pharmacogenetics

Interventions

Give us better predictions of who might develop disease

Give us better predictions of the course of the disease

Lead to knowledge that can help find a cure or prevention

Download