SNP Applications

advertisement
SNP Applications
statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt
Human Genome and SNPs
• Now that the human genome is (mostly)
sequenced, attention turning to the
evaluation of variation
• Alterations in DNA involving a single base
pair are called single nucleotide
polymorphisms, or SNPs
• Map of ~1.4 million SNPs (Feb 2001)
• It is estimated that ~60,000 SNPs occur
within exons; 85% of exons within 5 kb
of nearest SNP
SNP Initiatives
• Industrial
–
–
–
–
Genset
Incyte
Celera
CuraGen
• Academic – Industry Consortium
• Governmental
– US
– Japan
• Non-industrial scale academic programs
Goals of SNP Initiatives
• Immediate goals:
– Detection/identification of …
– The hundreds of thousands of SNPs
estimated to be present in the
human genome
– Interest also in other organisms, e.g.
potatoes(!)
– Establishment of SNP Database(s)
Longer term goals: Areas of
SNP Application
• Gene discovery and mapping
• Association-based candidate
polymorphism testing
• Diagnostics/risk profiling
• Response prediction
• Homogeneity testing/study design
• Gene function identification
• …etc.
• See Schork, Fallin, Lanchbury 2000
Polymorphism
• Technical definition: most common
variant (allele) occurs with less than 99%
frequency in the population
• Also used as a general term for variation
• Many types of DNA polymorphisms,
including RFLPs, VNTRs, microsatellites
• ‘Highly polymorphic’ = many variants
Use of Polymorphism in
Gene Mapping
• 1980s – RFLP marker maps
• 1990s – microsatellite marker maps
SNPs in Genetic Analysis
• Abundance – lots
• Position – throughout genome
• Haplotype patterns – groups of SNPs
may provide exploitable diversity
• Rapid and efficient to genotype
• Increased stability over other types of
mutation
• Recombination patterns – e.g. ‘hot spots’
Gene Discovery and Mapping
• Linkage Analysis
– Within-family associations between
marker and putative trait loci
• Linkage Disequilibrium (LD)
– Across-family associations
One locus: Founder
genotype probabilities
• Founder: individual whose parents are
not in the pedigree
• Usually obtain genotype probs. assuming
Hardy-Weinberg Equilibrium (HWE):
Say P(D) = p, P(d) = 1-p;
Then P(DD) = p2, P(Dd) = 2p(1-p), P(dd) = (1-p)2
• Genotypes of founder couples treated as
independent:
P(Father Dd and Mother DD) = 2p(1-p)3
One locus: Transmission
probabilities (I)
• Offspring get their genes according to Mendel’s
rules…
• Independently for different offspring
D
d
1
2
3
D
d
d
d
P(3 dd | 1 Dd & 2 Dd) = ½ x ½
One locus: Transmission
probabilities (II)
D
d
1
D
d
2
3
4
5
d
d
D
d
D
D
P(3 dd & 4 Dd & 5 DD| 1 Dd & 2 Dd)
= (½ x ½) x (2 x ½ x ½) x (½ x ½)
One locus: Penetrance
• Usual to assume that the chance of
having a particular phenotype (being
affected with a disease, say) depends
only on the genotype at one locus
• Complete penetrance:
P(affected|DD) = 1
• Incomplete penetrance:
P(affected|DD) = p (<1)
One locus: putting it all together
1
D
d
3
2
4
D
d
Assume:
P(Aff|dd) = .1
P(Aff|Dd) = .3
5
P(Aff|DD) = .8
d
d
D
d
D
D
P(D) = .01
P(pedigree) = (2 x .01 x .99 x .7) x (2 x .01 x .99 x .3)
x (½ x ½ x .9) x (2 x ½ x ½ x .7) x (½ x ½ x .8)
Crossing over and Recombination
Two loci: Linkage and
Recombination
Dd
TT
1
2
3
Dd
Tt
T
3 produces
gametes in
proportions:
Dd
tt
T
D (1-)/2 /2
d /2
½
½
(1-)/2 ½
½
Recombination Fraction
•  = ½ : independent assortment (Mendel)
•  < ½ : linked loci
•  = 0 : tightly linked loci (no recombination)
• In 3, if the loci are linked then D-T and d-t
are parental haplotypes, D-t and d-T are
recombinant haplotypes
LOD-score Linkage Analysis
• LOD(*) = log10 of the odds ratio L:
L = P(data|*)/P(data|½)
• LOD(*) measures the relative strength
of the data for  = * rather than  = ½
• Can compute LOD() at several values
• Can find the value  maximizing the LOD
IBD Allele Sharing
Allele-sharing Methods
• Based on number (or proportion) of
alleles shared identical by descent
(IBD) of related individuals
• Can be done either assuming
(likelihood-based) or not assuming
(nonparametric) a genetic mode of
inheritance for a trait
Errors
• Genotyping errors can result in false
positive or false negative findings
• Data checking/cleaning necessary
(although there are approaches which
model error)
• Must be especially careful with SNP
genotypes, because errors often pass
simple Mendelian checks
Disease-Marker Association
• A marker locus is associated with a
disease if the distribution of genotypes
at the marker locus in disease-affected
individuals differs from the distribution
in the general population
• A specific allele may be positively
associated (over-represented in
affecteds) or negatively associated
(under-represented)
Examples: Alzheimer’s
• Alzheimer’s disease and ApoE
E4 present
E4 absent
Patients
58
33
Controls
16
55
The E4 allele appears to be positively associated
with Alzheimer’s disease:
Odds Ratio = (58/16)/(33/55) = 6
Examples: HLA
Disease
Ankylosing
spondylitis
Myasthenia gravis
Allele
B27
RR
87
B8
4.1
Systemic lupus
erythematosus
Hemachromotosis
B8
2.1
A3
8.2
(and many more…)
Linkage Disequilibrium
Disease locus
LD
Marker locus
Alleles M, m
Alleles D, d
penetrance
Disease
Linkage Disequilibrium
• Concept of the ‘historical recombinant’
• Explanations for observed association
between marker and disease:
– Marker locus may be a disease
susceptibility locus
– Marker locus may be linked to disease
susceptibility locus
– Spurious result due, e.g. to admixture,
population stratification, heterogeneity
Linkage and LD
Mutation occurs
Nearby marker
Allele D is created
Allele M was nearby
D and M subsequently transmitted together
Candidate Polymorphism Testing
• Linkage and LD assume markers have
indirect association with the trait
• Large SNP collections may allow
testing for direct, physiologically
relevant associations with trait
Diagnostics/Risk Profiling
• Identified SNP associations can
potentially be used to develop
diagnostic tools
• Applicability will require large-scale
studies, since most diseases of
interest now are influenced by many
genetic and nongenetic factors
Response Prediction
• Related to diagnosis/risk assessment
• Strategy: stratify populations to
improve effectiveness of interventions
• Pharmaceutical companies especially
interested in this:
– Aim to identify those likely to respond
– Predict toxicity reactions in susceptible
individuals
• Response to any kind of substance;
creation of ‘functional foods’
Homogeneity Testing
• Test to protect against false inferences
about the relationship between
endpoints (e.g. disease) and risk factors
• Assess generalizability of results
• Can assess the homogeneity of the
genetic background of study
participants using a panel of randomly
distributed SNPs
Gene Function Identification
• Alternative to other experimental
procedures (e.g. knock-outs, which
cannot be used in humans)
• Studies to compare individuals with
and without naturally occurring
disease predisposing genetic profiles
Haplotype Variation
• The large databases already available
(and increasing in size) should allow
characterization of haplotype
variation across the genome in
different populations
• Can help population geneticists trace
evolution and reveal connections
between populations/ethnic groups
Download