Haplotype Discovery and Modeling

advertisement
Haplotype Discovery and Modeling
Identification of genes
Identify the
Phenotype
Map
Clone
QTL Mapping
 A QTL (quantitative trait locus) is a
gene that affects a quantitative trait,
 The QTL detected by the markers
linked with it is a chromosomal
segment,
 The DNA structure of a QTL is
unknown.
Marker 1
QTL
Marker 2
Marker 3
.
.
.
Marker k
QTL Mapping Based on
Linkage
1
I
2
Aabb
II
1
2
3
aaBb
1
2
4
aabb
AaBb AABb
aabb AaBb
4
5
6
5
7
8
III
AaBb aaBb AABb
AAbb AaBb
Aabb
Aabb aaBb
Mapping and sequencing
10000 Kb
Markers
100 Kb
DNA clones
SNPs (‘snips’)
• A SNP is a site in the DNA where different
chromosomes differ in the base they have.
SNPs
Paternal allele: CCCGCCTTCTTGGCTTTACA
Maternal allele: CCCGCCTTCTCGGCTTTACA
Paternal allele : CCCGCCTTCTTGGCTTTACA
Maternal allele : CCCGCCTTCTTGGCTTTACA
HapMap
Detecting specific DNA sequence variants
that determine complex traits
Single Nucleotide Polymorphisms (SNPs)
Sensitive to drug
Insensitive to drug
The International HapMap Consortium (Nature, 2003, 2005)
Basic concepts
Allele, Haplotype, and Diplotype
Basic concepts
Haplotyping a Phenotype
Quantitative Trait Nucleotide (QTN)
Basic concepts
Risk Haplotype and Composite Diplotype
Consider A QTN composed of two SNPs:
Risk haplotype:
[AB] = R
Non-risk haplotype:
[Ab], [aB], [ab] = r
Composite Diplotype: RR, Rr, rr
Illustrations
A
B
,
A
A
A
A
A
A
A
B
B
B
B
B
B
B
RR (2)
Rr (1)
rr (0)
Study design
A random sample of unrelated individuals from a natural population
SNP
Group
1
2
Diplotype
Obs.
Drug Response Trait
1
AA
BB
[AB][AB]
n11/11
y1 = (y11, …, y1n11/11)T
2
AA
Bb
[AB][Ab]
n11/10
y2 = (y21, …, y2n11/10)T
3
AA
bb
[Ab][Ab]
n11/00
y3 = (y31, …, y3n11/00)T
4
Aa
BB
[AB][aB]
n10/11
y4 = (y41, …, y4n10/11)T
5
Aa
Bb
[AB][ab]
n10/10
[Ab][aB]
y5 = (y51, …, y5n10/10)T
6
Aa
bb
[Ab][ab]
n10/00
y6 = (y61, …, y6n10/00)T
7
aa
BB
[aB][aB]
n00/11
y7 = (y71, …, y7n00/11)T
8
aa
Bb
[aB][ab]
n00/10
y8 = (y81, …, y8n00/10)T
9
aa
bb
[ab][ab]
n00/00y9 = (y91, …, y9n00/00)T
There are two types of parameters:
- Haplotype frequencies (population genetic parameters p)
[AB]: p11 = pq+D
[Ab]: p10 = p(1-q)-D
p – Allele (A) frequency at SNP 1
[aB]: p01 = (1-p)q-D
q – Allele (B) frequency at SNP 2
[ab]: p00 = (1-p)(1-q)+D D – Linkage disequilibrium
- Haplotype effects and variation (quantitative genetic para. q)
RR: µ2 = µ + a
a = additive effect
Rr: µ1 = µ + d
d = dominance effect
rr: µ0 = µ - a
Unifying Likelihood
based on marker (S) and phenotype (y) data
Liu, Johnson, Casella and Wu, 2004, Genetics
Modeling Haplotype Frequencies
Group 1
1
2
3
4
5
[Ab][aB]
6
7
8
9
SNP
2
Diplotype
AA
BB
AA
Bb
AA
bb
Aa
BB
Aa
Bb
2p10p01
Aa
bb
aa
BB
aa
Bb
aa
bb
Frequency
Obs.
[AB][AB]
[AB][Ab]
[Ab][Ab]
[AB][aB]
[AB][ab]
p211
2p11p10
p210
2p11p01
2p11p00 n10/10
[Ab][ab]
[aB][aB]
[aB][ab]
[ab][ab]
2p10p00 n10/00
p201
n00/11
2p01p00 n00/10
p200
n00/00
n11/11
n11/10
n11/00
n10/11
EM algorithm
E step
M step
Modeling Haplotype Effects
1
2
3
4
5
6
7
8
9
SNP
1
2
AA
BB
[AB][AB]
AA
Bb
[AB][Ab]
AA
bb
[Ab][Ab]
AaBB
[AB][aB]
Rr
AaBb
[AB][ab]
Rr
[Ab][aB]
Aabb
[Ab][ab]
rr
AaBB
[aB][aB]
rr
AaBb
[aB][ab]
rr
Aabb
[ab][ab]
rr
Likelihood
Risk Haplotype
[AB] [Ab] [aB] [ab]
RR
rr
rr
rr
Rr
Rr
rr
rr
rr
RR
rr
rr
rr
Rr
rr
rr
rr
Rr
rr
Rr
Rr
rr
Rr
rr
Rr
rr
RR
rr
rr
Rr
Rr
rr
rr
RR
L1
L2
L3
L4
Genotypic values of composite diplotypes: RRu2, Rru1, rru0
Mixture Model
assuming that [AB] is the risk haplotype
EM Algorithm
f1 ( yi )
i 
f1 ( yi )  (1   ) f 0 ( yi )
• E step
• M step
n
n11 / 11
ˆ 2 
y
i 1
i
n11/11
ˆ1 
n10 / 10
 y   y
i 1
i
n 
i
i 1
n10 / 10

i 1
i
n
i
ˆ 0 
n10 / 10
 y   (1   ) y
i 1
i
n 
i
i 1
i
n10 / 10
 (1   )
i 1
i
n11 / 11
n10 / 10
n
n

1
2
2
2
2
2
2
ˆ    ( yi  ˆ 2 )   ( yi  ˆ1 )   ( yi  ˆ 0 )   i ( yi  ˆ1 )  (1  i )( yi  ˆ 0 ) 
n  i 1
i 1
i 1
i 1

Hypothesis Testing
H0: µ2 = µ1 = µ0 = 0
RR = Rr = rr
H1: At least one of equalities in the H0 does
not hold
~
ˆ )]
LR = –2ln[L0(  q |y) – L1( ˆ q |y,S, 
p
The threshold is determined empirically by
permutation tests
Genome-wide Scan
Threshold
LR
SNPs on the Genome
Structural Variation in the Human Genome
Recombination Hot Spots
Block 1
Block 2
Block 3
Block 4 …
Haplotype Blocks: Nearby SNPs are often distributed in
block-like patterns
Hotspots and Coldspots: SNPs from different blocks have
larger recombination rates than those from within blocks
Tag SNPs: Haplotype diversity within each block can be well
explained by a small portion of SNPs.
A Genetic Study
A candidate gene
for human obesity
SNP A: A, G
SNP B: C, G
Four haplotypes
[AC]
[AG]
[GC]
[GG]
• A total of 155 patients selected from a population
• Typed for the two SNPs
• Measured for body mass index (BMI)
• Question: Which haplotype triggers an effect on BMI?
Testing Risk Haplotype
LR
[AC] 2.32
r
[AG] 1.52
r
[GC] 3.11
r
[GG] 10.35 (p<0.01) R
RR: µ2 = µ + a = 30.83 – 1.77 = 29.06
Rr: µ1 = µ + d = 30.83 – 3.05 = 27.78
rr: µ0 = µ - a = 30.83 + 1.77 = 32.60
a = additive effect
d = dominance effect
• A patient who combines haplotype [GG] with any other haplotypes is normal weight,
• A patient who combines any two haplotypes from [AC], [AG] and [GC] is obese,
• A patient who has double haplotypes [GG] is overweight
Model Extensions
• Block-Block Interactions (Lin et al. 2007,
Bioinformatics)
• Haplotype-Environment Interactions (Wang
et al. 2008, Molecular Pain)
• Haplotype Imprinting Effects (Cheng et al., to
be submitted)
• Multivariate high-dimensional drug response
(PK-PD link, efficacy and toxicity…) – A
systems approach
1000-Genome Projects
 This sequencing effort will
produce most detailed map
of human genetic variation to
support disease studies
 Results will help to design the
personalized medication which can
optimize drug therapy
Download