Jonathan L. Haines Ph.D.
Center for Human Genetics Research
Vanderbilt University Medical Center
Genes
Environmen t
Complex Disease
Genes
Environmen t
Complex Disease
•
Give us a better understanding of the underlying biology of the trait in question
• Serve as direct targets for better treatments
– Pharmacogenetics
–
Interventions
•
Give us better predictions of who might develop disease
•
Give us better predictions of the course of the disease
•
Lead to knowledge that can help find a cure or prevention
•
Watson and Crick started it all in 1953 with the description of
DNA
•
53 Year Anniversary of the paper will be in
April.
•
Both Won Nobel
Prize
The DNA Between Individuals is Identical.
All differences are in the 0.1% of DNA that varies.
T
C
C
A
A
C
C
G
G
G
T
G
C
A
A
C
C
G
G
G
It’s hard to believe sometimes!
One of the most common types of variation
1 st Chromosome
2 nd Chromosome
G
C
Normal
GATCCT G TAGCT
GATCCT G TAGCT
< Normal
< Disease
G/C
Affected
GATCCT C TAGCT
GATCCT C TAGCT
Extremely frequent across the genome (~1/400 bp) -> high resolution
Easy to genotype -> high-throughput techniques
Earth City Street Address
Human Genome Chromosome Band Gene (DNA)
640 cubic yards 3,000 MB
1/100 cubic inch
It really is like finding a needle in a haystack!
(and a very BIG haystack, at that)
1 x 10 -6 MB
The Genome Sequence is not THE answer!
Disease Gene Discovery In Complex Disease
1. Define Phenotype a. Consistency b. Accuracy
2. Define the Genetic Component a. Twin Studies b. Adoption Studies c. Family Studies d. Heritability e. Segregation Analysis
3. Define Experimental Design
4. Ascertain Families a. Case-Control b. Singleton c. Sib Pairs d. Affected Relative Pairs
5. Collect Data a. Family Histories b. Clinical Results c. Risk Factors d. DNA Samples
6. Perform Genotype Generation a. Genomic Screen b. Candidate Gene a. Model-dependent
Lod score
7. Analyze data b. Model-independent sib-pair, relative pair c. Association studies case-control, family-based
8. Identify, Test, and Localize Regions of Interest
9. Bioinformatics and Gene Identification
10. Identify Susceptibility Variation(s)
11. Define Interactions a. Gene-Gene b. Gene-Environment
•
Diseases of Simple Genetic Architecture
– Can tell how trait is passed in a family: follows a recognizable pattern
– One gene per family
– Often called Mendelian disease
– Usually quite rare in population
– “Causative” gene
•
Diseases of Complex Genetic Architecture
– No clear pattern of inheritance
– Moderate to strong evidence of being inherited
– Common in population: cancer, heart disease, dementia etc.
–
Involves many genes or genes and environment
– “Susceptibility” genes
•
Diseases of Simple Genetic Architecture
– Can tell how trait is passed in a family: follows a recognizable pattern
– One gene per family
– Often called Mendelian disease
– Usually quite rare in population
– “Causative” gene
•
Diseases of Complex Genetic Architecture
– No clear pattern of inheritance
– Moderate to strong evidence of being inherited
– Common in population: cancer, heart disease, dementia etc.
–
Involves many genes or genes and environment
– “Susceptibility” genes
• Autosomal Dominant
– Huntington disease
• Autosomal Recessive
–
Cystic fibrosis
•
X-linked
–
Duchenne muscular dystrophy
•
Mitochondrial
– Leber Optic atrophy
•
Additive
– HLA-DR in multiple sclerosis
•
Combinations of the above
– RP (39 loci), Nonsyndromic deafness
•
Traces the segregation of the trait through a family
•
Traces the segregation of the chromosomes through a family
•
Statistically measures the correlation of the segregation of the trait with the segregation of the chromosome
A SAMPLE PEDIGREE
The RED chromosome is key
Parametric Vs Non-Parametric
•
Two major approaches toward linkage analysis
• Parametric: Defines a genetic model of the action of the trait locus (loci). This allows more complete use of the available data (inheritance patterns and phenotype information).
– The historical approach towards linkage analysis.
Development driven by need to map simple Mendelian diseases
– Quite powerful when model is correctly defined
•
Non-Parametric: Uses either a partial genetic model or no genetic model. Relies on estimates of allele/ haplotype/region sharing across relatives. Makes far fewer assumptions about the action of the underlying trait locus(loci).
• Families
– Affected sibpairs
– Affected relative pairs
–
Extended families
•
Traits
– Qualitative (affected or not)
– Quantitative (ordinal, continuous)
• There are numerous different methods that can be applied
•
These methods differ dramatically depending on the types of families and traits
Recombination: Nature’s way of making new combinations of genetic variants
A. B. C. D.
A. A diploid cell.
B. DNA replication and pairing of homologous chromosomes to form bivalent.
C. Chiasma are formed between the chromatids of homologous chromosomes
D. Recombination is complete by the end of prophase I.
•
Measure the rate of recombination between two or more loci on a chromosome
•
Can be done with any loci, but primary application is to find the location of a trait variant by measuring linkage to known marker variants.
The likelihood ratio as defined by Morton (1955):
L(pedigree|
= x)
L(pedigree |
= 0.50) where
represents the recombination fraction and where 0
x
0.49.
When all meioses are “scorable”, the LR is constructed as:
L.R. =
(
R
(
( 1
)
NR
0 .
5 ) N
)
The LOD score (z) is the log
10
(L.R.)
: z(
) is the lod score at a particular value of the recombination fraction
: z(
) is the maximum lod score, which occurs at the MLE of the recombination fraction
•
Diseases of Simple Genetic Architecture
– Can tell how trait is passed in a family: follows a recognizable pattern
– One gene per family
– Often called Mendelian disease
– Usually quite rare in population
– “Causative” gene
•
Diseases of Complex Genetic Architecture
– No clear pattern of inheritance
– Moderate to strong evidence of being inherited
– Common in population: cancer, heart disease, dementia etc.
–
Involves many genes or genes and environment
– “Susceptibility” genes
Large Families
Study Designs
Linkage Analysis
Small Families
Family-Based
Association Studies
Case-Control
Linkage
Shared within Families
Association
Shared across Families
Disease Normal
5/20
Gene is not important
5/20
Disease Normal
10/20
Gene may be important
5/20
Two Basic Study Designs for Association Analysis
•
Family-Based
– Parent-child Trio
–
Discordant sibpairs
•
Advantages
–
Use existing samples
– Robustness to assumptions
• Disadvantages
– Ascertainment
– Power
•
Case-Control
• Advantages
– Power
– Ascertainment
•
Disadvantages
– Sensitivity to assumptions
–
Matching
–
Parent-Child
•
AFBAC
• TDT
•
HHRR
• QTDT
• Sibship
–
SDT
– WSDT
– FBAT
–
Sibpair
•
S-TDT
• DAT
•
Pedigree
–
Transmit
– PDT
– FBAT
TRANSMISSION
DISEQUILIBRIUM TEST (TDT)
• Examines transmission of alleles to affected individuals
•
Requires:
– Linkage (transmission through meioses); and
–
Association (specific alleles)
•
Test of linkage if association assumed
• Test of association if linkage assumed
•
Test of linkage AND association if neither assumed
• Uses the non-transmitted alleles, effectively, as the control group. Can make “pseudocontrol” by creating genotype of the two non-transmitted alleles
• Requires phenotype only for the child
12 12
Transmitted
1 2
A B
C D
11
TDT=
(B-C) 2
(B+C)
With > 5 per cell, this follows a
2 distribution with 1 df
12 12
11
Transmitted
1 2
Not transmitted 1 0 0
2 2 0
22 12
12
Transmitted
1 2
Not transmitted 1 0 0
2 1 1
22 11
12
Transmitted
1 2
Not transmitted 1 1 0
2 0 1
Transmitted
1 2
A B
C D
TDT=
(B-C) 2
(B+C)
Transmitted
1 2
25 42
25 42
TDT=
(42-25) 2
(42+25)
= 4.31
Two Basic Study Designs for Association Analysis
•
Family-Based
– Parent-child Trio
–
Discordant sibpairs
•
Advantages
–
Use existing samples
– Robustness to assumptions
• Disadvantages
– Ascertainment
– Power
•
Case-Control
• Advantages
– Power
– Ascertainment
•
Disadvantages
– Sensitivity to assumptions
–
Matching
• Standard epidemiological approaches can be used
•
Qualitative trait
– Logistic regression
•
Quantitative trait
– Linear regression
•
The usual concerns about matching but must also worry about false-positives from population substructure
• Obtain appropriate IRB approval
– DNA studies are quite common
– Template language exists for IRB approval and consent forms
– Genetic Studies Ascertainment Core (GSAC) can help
– Kelly Taylor: ktaylor@chgr.mc.vanderbilt.edu
•
Collect family history information
•
Obtain DNA sample
– Venipuncture
– Buccal wash/swab
– Finger stick
• Extract/Store DNA
–
DNA Resources Core can help
–
Cara Sutcliffe: cara@chgr.mc.vanderbilt.edu
• http://chgr.mc.vanderbilt.edu/
•
Give us a better understanding of the underlying biology of the trait in question
• Serve as direct targets for better treatments
– Pharmacogenetics
–
Interventions
•
Give us better predictions of who might develop disease
•
Give us better predictions of the course of the disease
•
Lead to knowledge that can help find a cure or prevention