Lecture Notes

advertisement
“When I walk into the forest, I have
the feeling that, although it's very
complex, there are simple rules
underlying that complexity," say Brian
Enquist, an ecologist.
Photo by Whitfield J All creatures great and small NATURE 413 (6854): 342-344 SEP 27 2001
When I am amazed by an incredible diversity of
biological size and shape, I surprisingly find I can
use a simple QTL model to explain it.
Mixture model-based likelihood function (my most
favorite statistics)
L(,,2|y,M) =
  1 f ( yi ; 1 ,  2 )  ...   J f ( yi ;  J ,  2 )
n
i 1
 = the relative proportions of J components
f(.) = a normal probability distribution function
j = the expected means
2 = the variance
The Lander-Botstein model – estimate 
The linkage disequilibrium model – estimate 
The Wu-Ma model – estimate biology!
Mixture model-based likelihood function
  f ( y ;  ,  )  ...   f ( y ; 
n
2
L(,, ) =
2
1
i
1

i

, 2 )

i 1
 = the proportions of k components
f(.) = a normal distribution
 = the expected means
2 = the variance
n = sample size
Data structure (F2)
Phenotype
Marker
__________________
___________________________
Sample
y(1) y(2) … y(k)
1
2
… m
__________________________________________________________
1
y11 y21 … yk1
2
2
…
2
y12 y22 … yk2
2
1
…
3
y13 y23 … yk3
2
0
…
4
y14 y24 … yk4
1
2
…
5
y15 y25 … yk5
1
1
…
6
y16 y26 … yk6
1
0
…
7
y17 y27 … yk7
0
2
…
8
y18 y28 … yk8
0
1
…
9
y18 y29 … yk9
0
0
…
 There are nine groups of two-marker genotypes, 22, 21, 20, 12, 11, 10,
02, 01 and 00, with sample sizes n22, n21, …, n00;
 The conditional probabilities of QTL genotypes, QQ (2), Qq (1) and
qq (0) given these marker genotypes 2i, 1i, 0i.
Univariate interval mapping
n
L(y) =
 
2i 2
f ( y i )  1i f1 ( y i )   0i f 0 ( y i )
=
 
2i 2
i 1
n 22
f ( y i )  1i f 1 ( y i )   0i f 0 ( y i )
i 1
n 21
x
 
f ( y i )  1i f 1 ( y i )   0i f 0 ( y i )
2i 2
i 1
…
n00
x
 
f ( y i )  1i f 1 ( y i )   0i f 0 ( y i )
2i 2
i 1
1
f2(yi) =
exp{  ( yi222 ) } ,
2
2
1
f1(yi) =
exp{ ( yi221 ) } ,
2
2
1
f0(yi) =
2
exp{ 
( y i  0 ) 2
22
}.
The Lander-Botstein model estimates (2, 1, 0, 2, QTL position) (5)
Multivariate interval mapping
n
L(y) =
 
f (y i )  1i f1 (y i )   0i f 0 (y i )
2i 2
i 1
Vector y = (y1, y2, …, yk)
f2(yi) =
1
(2)
k/2

1/ 2
exp{ 12 (y i  m 2 ) T  1 (y i  m 2 )} ,
f1(yi) =
f0(yi) =
1
(2) k / 2 
1/ 2
exp{  12 (y i  m 1 ) T  1 (y i  m 1 )} ,
1/ 2
exp{ 12 (y i  m 0 ) T  1 (y i  m 0 )} ,
1
(2) k / 2 
Vectors
m2 = (21, 22, …, 2k)
m1 = (11, 12, …, 1k)
m0 = (01, 02, …, 0k)
Residual variance-covariance matrix
=
 12  1k 







 k1   2k  ,


The unknown parameters: (m2, m1, m0, , QTL position) [3k + k(k+1)/2
+1 parameters]
Functional mapping does not estimate (m2, m1, m0, ) directly, instead
of the biologically meaningful parameters.
For growth traits, we have
g( t ) 
a
1  be  rt (logistic curve)
For AR(1) model, we have
 1


1
 
 2

= 

 
 k 1  k  2

2

1

 k 3
  k 1 

 k 2 
  k 3  2
 ,

 
 1 
Functional interval mapping
n
L(y) =
 
f (y i )  1i f1 (y i )   0i f 0 (y i )
2i 2
i 1
Vector y = (y1, y2, …, yk)
f2(yi) =
f1(yi) =
f0(yi) =
1
(2) k / 2 
1
1/ 2
exp{ 12 (y i  m 2 ) T  1 (y i  m 2 )} ,
(2) k / 2 
1/ 2
exp{  12 (y i  m 1 ) T  1 (y i  m 1 )} ,
1/ 2
exp{ 12 (y i  m 0 ) T  1 (y i  m 0 )} ,
1
(2)
k/2

a2
a2
a2
m2 = ( 1  b e  r2 , 1  b e  2 r2 , …, 1  b e  kr2 )
2
2
2
a1
m1 = ( 1  b e  r1
1
a0
m0 = ( 1  b e  r
0
0
a1
a1
, 1  b e 2 r1 , …, 1  b e kr1 )
1
1
a0
a0
, 1  b e 2 r , …, 1  b e  kr )
0
0
0
0
The unknown parameters: (a2, b2, r2, a1, b1, r1, a0, b0, r0, , 2, QTL
position) [12 parameters]
The advantages of functional mapping
(1)
Increase statistical power to detect QTL (fewer parameters
estimated)
(2)
Enhance biological relevance of the QTL detected (biological rules
considered)
(3)
Ask and answer questions at the interface among different
biological disciplines
(4)
Display tremendous potential in biomedical research!
Important assumptions:
 Constant variances over time
 Correlation decay exponentially with time intervals
Relaxing the first assumption
(1)
Transform-Both-Sides (Ruppert and Carroll, JASA 1984)
log[ g( t )]  log[
a
] (logistic curve)
1  be  rt
Wu, R. L., C.-X. Ma, M. Lin, Z. H. Wang and G. Casella, 2004
Functional Mapping of Quantitative Trait Loci Underlying Growth
Trajectories Using a Transform-Both-Sides Logistic Model. Biometrics.
(2)
Modeling time-dependent variances using non-linear functions
Implications of functional mapping
(1)
Multivariate allometric scaling
Y = aX1bX2c
(2) Multiplicative epistasis
Y = X1X2, Different genes affect X1 and X2 and these two genes
epistatically interact to determine Y.
(3) Photosynthesis-light curve
Rectangular hyperbola
(4) Reaction norm
Thermal performance curves
(5) Biomass partitioning
Wu’s (1993) community productivity model
(6) Pharmacokinetics (PK – what drug does to body)
(7) Pharmacodynamics (PD – what body does to drug)
(8) Human body growth models
(8) Clonal design
(9) HIV pathogenesis (virus dynamics, virus-host interaction)
(10) Cancer invasion: A reaction-diffusion model (mathematical
oncology)
(11) Bird Flight – Modeling Flight Mechanics Using Power Functions
Download