Genetic studies of diabetes

advertisement
Genetic Epidemiology
Michèle Sale, Ph.D.
Center for Public Health Genomics
msale@virginia.edu
Tel: 982-0368
Genetic epidemiology
• “A science which deals with the
etiology, distribution, and control of
disease in groups of relatives and with
inherited causes of disease in
populations.” Newton E. Morton, 1982
Model for Complex Diseases
+
=
Disease
Susceptibility
Genetics of a Complex Disease
Environment 2
Gene 1
Gene 4
Trait 1
Trait 3
Environment 1
Gene 2
Trait 2
Gene 3
Disease
“Monogenic” vs “Complex” Disease
Mendelian
Complex
1 or small # of genes
Many
Often etiologic
(severe phenotype)
Susceptibility / molecular
pathology ?
Highly penetrant
Modest penetrance
High Odds Ratio
Modest/Low Odds Ratio
Strong selection =>
Low frequency/Rare
Weak/No selection =>
High frequency/Common
Coding Sequence
Non-coding/regulation (?)
Overall steps for disease gene
identification
•
•
•
•
•
Is there a genetic component?
Study design
Measurement of phenotype
Molecular analysis
Functional analysis
Is there a genetic component?
•
•
•
•
Twin studies
Familial aggregation
Segregation analysis
Race/ethnicity differences
Twin studies
•
•
•
•
•
Comparison of monozygotic (MZ) pairs (who share all their genome) with
dizygotic (DZ) twins (who share half of their genome in common on
average, same as sibs)
The greater similarity of MZ twins than DZ twins is considered evidence
of genetic factors
The pairwise (Pr) concordance (Pr) is the proportion of affected pairs that
are concordant for the disease.
– The proportion of twin pairs with both twins affected of all ascertained
twin pairs with at least one affected
– Pr=C/(C+D), where C is the number of concordant pairs and D is the
number of discordant pairs.
The probandwise (Cc) concordance is the proportion of affected
individuals among the co-twins of previously ascertained index cases.
– Allows for double counting of doubly ascertained twin pairs and is
interpretable as the recurrence risk in a co-twin of an affected
individual
– Cc=2C/(2C+D)
In theory, complete genetic determination of a disease would equate to
MZ twins having 100% concordance and DZ twins having 50%
concordance
Twin studies - assumptions
• Random mating
• No interactions between genes and
environment
• Equivalent environments for MZ and
DZ twins
Concordance rates for some traits
Trait
Blood pressure
Pulse rate
Measles
Diabetes mellitus
Tuberculosis
Schizophrenia
Manic-depressive psychosis
MZ
63%
56%
95%
85%
54%
58%
76%
DZ
36%
34%
87%
37%
22%
10%
18%
Other types of twin studies
• Twins discordant for disease have
been used to examine possible
environmental causes.
• Adoption studies also permit the
separation of childhood rearing effects
from genetic effects by studying the
similarity of adopted children with
their biological and adoptive parents
Bouchard et al. Science. 1990 Oct 12;250(4978):223-8.
Familial aggregation
• Sibling risk relative ratio
ls =
risk to a sibling of person with disease of interest
population risk
SNP Disease Variant & λs in
Diabetes
Type 1
Type 2
Prob of Disease (Sibling)
6%
30-40%
Prob of Disease (Unrelated)
0.4%
7%
λs
15
4-6
Recurrence risks for multiple
sclerosis in families
Compston and Coles. Lancet. 2002;359(9313):1221-31.
Segregation analysis
• Determines which specific model
(genetic or environmental) best fits the
familial aggregation
• E.g.
– Major gene or many genes (polygenic)?
– Dominant, additive, recessive inheritance?
Differences in prevalence
across race/ethnicities
Time trends in the percentage of
African American adolescents and adults
who were overweight, 1988-94
http://www.niddk.nih.gov/health/diabetes/pubs/afam/afam.htm
Adiposity
• However, rates of type 2 diabetes in
African Americans still higher than
Caucasian Americans after controlling
for age, adiposity, and socio-economic
status
• Other factors must be involved
Epidemiological study designs
http://en.wikipedia.org/wiki/Study_design
Study designs
• Case series: what clinicians see
• Case-control: compare people with and
without a disease
• Cohort: follow people over time to see
who gets the disease
• Randomized controlled trial (RCT)
Other terms
• Retrospective vs. prospective
• Cross-sectional vs. longitudinal
Measurement of Phenotype
• What is the phenotype?
– e.g., diabetes, fasting glucose, oral glucose tolerance test
• How is it diagnosed?
– Physician’s diagnosis, clinical measurements, questionnaire
• How objective are the phenotypes?
– Physician’s diagnosis – somewhat variable
– Clinical measurements – most pretty good
*The more defined the phenotype, the easier to find
the gene(s) that controls it
Additional consideration in
genetic studies: families or
unrelated individuals?
Patient ascertainment
• Sib-pair:
• Families:
• Case-control:
?
?
?
or
?
Molecular/Analytical
approaches
• Linkage
– Families
• Association
–
–
–
–
Candidate gene
Genome wide
Generally case-control
There are family-based approaches
Magnitude of effect
Effect and frequency of risk alleles
dictate strategies
Unlikely to exist
Association studies
Linkage studies
Unlikely to be
found
Frequency in population
Linkage analysis
• Linkage = The proximity of two or more
markers on a chromosome
• Linkage analysis is a statistical method for
detecting linkage between a disease and
markers of known location by following their
inheritance in families
• Uses recombination to define genomic
interval likely to contain gene/s
• Single large pedigree or multiple small
pedigrees
Linkage analysis
•
•
•
•
•
•
•
Works well for Mendelian traits and more highly penetrant diseases
Low resolution = fewer markers needed and resilient to allelic
heterogeneity
Apparently high Type I error rate for complex/non-Mendelian diseases
– more loci, common variants, high phenocopy rate, and lower
penetrance
Large pedigrees better for rare alleles – more likely to segregate the
allele
Large pedigrees increase the probability of parental heterozygosity for
frequent alleles
Most likely to detect intermediate frequency alleles
Strong pedigree signal may reflect rare Mendelian forms of complex
disease
– eg BRCA1 & BRCA2 mutations in breast and ovarian cancer
Genotype markers across the
genome
Illumina's Linkage IVb Panel: >6,000 SNPs
LOD score
• LOD score = Log of the Odds of linkage
= log10 Likelihood of linkage
Likelihood of not being linked
= log10 L(Q<0.5)
L(Q=0.5)
• The closer two markers are to each other, the
lower the odds of a recombination (crossing
over event) occurring between them in
meiosis.
Linkage analysis
Is there cosegregation of a chromosomal region
with the phenotype?
Linkage analysis
Is there cosegregation of a chromosomal region
with the phenotype?
Linkage analysis
Is there cosegregation of a chromosomal region
with the phenotype?
• Add additional markers to region
• Add additional families to study
Association study
• Best power for common variants of modest - low effect size
• Search for specific genetic differences distinguishing cases from
controls
Cases
Controls
Case - Control is the most popular study
design for complex disease genetics:

• Cross-sectional - no follow-up
• More efficient recruitment than
families - easy to ascertain and
recruit
• Easy to analyze
• Statistical power compared to
family-based linkage
X
• Cases and controls must be wellmatched
- Drawn from same population
- Randomize non-genetic
confounder factors
• At risk for type 1 errors if
incomplete matching (stratification)
Genome-wide association studies
(GWAS): A paradigm shift in
human genetics
How can we use SNPs to find
diabetes genes?
• Genome-Wide Association Study (GWAS)
– Examination of variation across the entire human genome to
identify genetic correlations with the presence or absence of
diabetes
• Two groups: cases (have diabetes) vs.
controls (don’t have diabetes)
• Each participant’s genome is surveyed for
markers of genetic variation (SNPs)
• Groups compared to determine specific
genetic differences between the two groups
GWAS
approach
•
•
•
http://www.genizon.com/html/gestion/Research_Technology.jpg
Does not assume
knowledge of
genes/biology
Investigate markers
evenly spaced along
genome
Investigate
association: Joint
occurrence of two
alleles (e.g. disease
allele and marker
allele) in a
population >
expected frequency
Why are GWAS now feasible?
• SNP identification efforts
 more SNPs in databases
• Understanding of linkage
disequilibrium in the human genome
(HapMap project)
 fewer “tagSNPs” to genotype
• Lower cost of genotyping platforms
Products now use >1 million SNPs!
Pairwise tagging
A/T
1
A
A
T
T
G/A
2
G
G
A
A
high r2
G/C
3
G
C
G
C
T/C
4
T
C
C
C
G/C
5
A/C
6
A
C
C
C
G
C
G
C
high r2
After Carlson et al. (2004) AJHG 74:106
http://www.hapmap.org/downloads/presentations/2_Daly.ppt
high r2
Tags:
SNP 1
SNP 3
SNP 6
3 in total
Test for association:
SNP 1
SNP 3
SNP 6
Use of haplotypes can improve
genotyping efficiency
A/T
1
G/A
2
G/C
3
T/C
4
G/C
5
A/C
6
Tags:
SNP 1
SNP 3
A
A
T
T
G
G
A
A
G
C
G
C
T
C
C
C
G
C
G
C
http://www.hapmap.org/downloads/presentations/2_Daly.ppt
A
C
C
C
2 in total
Test for association:
SNP 1 captures 1+2
SNP 3 captures 3+5
“AG” haplotype captures SNP
4+6
Relative power (%)
Efficiency and power
tag SNPs
random
SNPs
~300,000 tag SNPs
needed to cover common
variation in whole genome
in CEU
Average marker density (per kb)
P.I.W. de Bakker et al. (2005) Nat Genet
Genotyping platform: Illumina
Cost:
370 Duo $240-280
650Y $480-520
1M $580-650
http://www.genengnews.com/articles/chtitem.aspx?tid=1862&chid=2
Genotyping platform: Affymetrix
http://gmed.bu.edu/about/genotyping.html
YRI
Completeness
of dbSNP
•
CEU
Vast majority
of common
SNPs are
contained in or
highly
correlated
with a SNP in
dbSNP
CHN+JPT
Nature 437, 1299-1320. 2005
Comparison of coverage
YRI
Platform
CEU
CHB+JPT
% r2 ≥ 0.8Mean max r2% r2 ≥ 0.8Mean max r2 % r2 ≥ 0.8Mean max r2
Affymetrix GeneChip 500K
46
0.66
68
0.81
67
0.80
Affymetrix SNP Array 6.0
66
0.80
82
0.90
81
0.89
Illumina HumanHap300
33
0.56
77
0.86
63
0.78
Illumina HumanHap550
55
0.73
88
0.92
83
0.89
Illumina HumanHap650Y
66
0.80
89
0.93
84
0.90
Perlegen 600K
47
0.68
92
0.94
84
0.90
Paul de Bakker, pers. comm.
Association Studies
• Detect genes/genomic regions associated with a disease through
allelic associations in case-control studies
– Causal variants are associated with disease phenotype
– Linked neutral variants are associated with the disease phenotype
through LD with the causal variant
• Younger disease variants (rarer variant)
– LD around the variant is stronger = better power
– Associated region containing variant is broad (low genome
resolution)
• Older disease variants (common variant)
– Weaker LD = worse power
– Better association map resolution
Phenotype-genotype
association
Marker associated with disease could be:
1. False positive result (type 1 error)
2. Co-inherited with a true causative
(functional) variant
3. A true functional or causative variant
Replication and follow-up
• Many analytical tests – high probability of false
positives
• Replicate in additional studies (often requires crossstudy collaboration)
• Map the causal variant
– Denser marker map
– Evidence for other variants in the same gene with (perhaps smaller)
independent effects (allelic heterogeneity)
– Haplotype analysis
– Resequencing
– Sequence / genome mapping bioinformatics to identify or predict
genome features in the linkage disequilibrium region of the map SNP
– Expression or reporter assays
Common Disease Common
Variant Hypothesis (CDCV)
• Genetic risk for common diseases (diabetes,
CHD, hypertension, schizophrenia, asthma,..)
results from common variants/polymorphisms
in multiple genes
• The effects for each gene variant must be smaller
than in monogenic disorders otherwise the
prevalence of the diseases would be very high
• Since SNPs are a common mode of variation in
the human genome - and coding SNPs lead to
mongenic diseases - SNPs may be the variants
that are associated with risk for common
diseases
Do the Common Disease
Variants Code ?
Not necessarily (or usually ?)
• Protein coding SNPs (cSNPs) may disrupt
protein fold, structure, activity
• Variants in Mendelian diseases with high
penetrance (50-100% penetrance) often
disrupt proteins
• But common diseases are not penetrant to
the same level - genetic odds ratios :
Prob (Disease | risk allele)
~ 1.2 -1.5
Prob (Not Disease | risk allele)
Early successes
• Klein R et al. Complement factor H
polymorphism in age-related macular
degeneration. Science. 2005 Apr 15; 308:385-9.
– 96 cases and 50 controls; 116,204 SNPs
• Maraganore et al. High-resolution wholegenome association study of Parkinson disease.
Am J Hum Genet. 2005 Nov; 77:685-93.
– 198,345 SNPs in 443 sibling pairs discordant for PD
– 1,793 PD-associated SNPs (P<.01 in tier 1) and 300 genomic
control SNPs in 332 matched case-unrelated control pairs
The future
• Identify additional genes in diverse populations
• Identify causal variant/s in these genes
• Determine function of novel genes, and function of
causative variants
• Explore gene x gene interactions (epistasis) and gene x
environment interactions (e.g. physical activity, diet)
• Other technological advances:
–
–
–
–
Animal models of disease
Innovative imaging of target tissues
Functional approaches to gene expression profiling
Whole genome sequencing?
• Era of “personalized medicine”&/or prevention
END
Questions?
Download