“When I walk into the forest, I have the feeling that, although it's very complex, there are simple rules underlying that complexity," say Brian Enquist, an ecologist. Photo by Whitfield J All creatures great and small NATURE 413 (6854): 342-344 SEP 27 2001 When I am amazed by an incredible diversity of biological size and shape, I surprisingly find I can use a simple QTL model to explain it. Mixture model-based likelihood function (my most favorite statistics) L(,,2|y,M) = 1 f ( yi ; 1 , 2 ) ... J f ( yi ; J , 2 ) n i 1 = the relative proportions of J components f(.) = a normal probability distribution function j = the expected means 2 = the variance The Lander-Botstein model – estimate The linkage disequilibrium model – estimate The Wu-Ma model – estimate biology! Mixture model-based likelihood function f ( y ; , ) ... f ( y ; n 2 L(,, ) = 2 1 i 1 i , 2 ) i 1 = the proportions of k components f(.) = a normal distribution = the expected means 2 = the variance n = sample size Data structure (F2) Phenotype Marker __________________ ___________________________ Sample y(1) y(2) … y(k) 1 2 … m __________________________________________________________ 1 y11 y21 … yk1 2 2 … 2 y12 y22 … yk2 2 1 … 3 y13 y23 … yk3 2 0 … 4 y14 y24 … yk4 1 2 … 5 y15 y25 … yk5 1 1 … 6 y16 y26 … yk6 1 0 … 7 y17 y27 … yk7 0 2 … 8 y18 y28 … yk8 0 1 … 9 y18 y29 … yk9 0 0 … There are nine groups of two-marker genotypes, 22, 21, 20, 12, 11, 10, 02, 01 and 00, with sample sizes n22, n21, …, n00; The conditional probabilities of QTL genotypes, QQ (2), Qq (1) and qq (0) given these marker genotypes 2i, 1i, 0i. Univariate interval mapping n L(y) = 2i 2 f ( y i ) 1i f1 ( y i ) 0i f 0 ( y i ) = 2i 2 i 1 n 22 f ( y i ) 1i f 1 ( y i ) 0i f 0 ( y i ) i 1 n 21 x f ( y i ) 1i f 1 ( y i ) 0i f 0 ( y i ) 2i 2 i 1 … n00 x f ( y i ) 1i f 1 ( y i ) 0i f 0 ( y i ) 2i 2 i 1 1 f2(yi) = exp{ ( yi222 ) } , 2 2 1 f1(yi) = exp{ ( yi221 ) } , 2 2 1 f0(yi) = 2 exp{ ( y i 0 ) 2 22 }. The Lander-Botstein model estimates (2, 1, 0, 2, QTL position) (5) Multivariate interval mapping n L(y) = f (y i ) 1i f1 (y i ) 0i f 0 (y i ) 2i 2 i 1 Vector y = (y1, y2, …, yk) f2(yi) = 1 (2) k/2 1/ 2 exp{ 12 (y i m 2 ) T 1 (y i m 2 )} , f1(yi) = f0(yi) = 1 (2) k / 2 1/ 2 exp{ 12 (y i m 1 ) T 1 (y i m 1 )} , 1/ 2 exp{ 12 (y i m 0 ) T 1 (y i m 0 )} , 1 (2) k / 2 Vectors m2 = (21, 22, …, 2k) m1 = (11, 12, …, 1k) m0 = (01, 02, …, 0k) Residual variance-covariance matrix = 12 1k k1 2k , The unknown parameters: (m2, m1, m0, , QTL position) [3k + k(k+1)/2 +1 parameters] Functional mapping does not estimate (m2, m1, m0, ) directly, instead of the biologically meaningful parameters. For growth traits, we have g( t ) a 1 be rt (logistic curve) For AR(1) model, we have 1 1 2 = k 1 k 2 2 1 k 3 k 1 k 2 k 3 2 , 1 Functional interval mapping n L(y) = f (y i ) 1i f1 (y i ) 0i f 0 (y i ) 2i 2 i 1 Vector y = (y1, y2, …, yk) f2(yi) = f1(yi) = f0(yi) = 1 (2) k / 2 1 1/ 2 exp{ 12 (y i m 2 ) T 1 (y i m 2 )} , (2) k / 2 1/ 2 exp{ 12 (y i m 1 ) T 1 (y i m 1 )} , 1/ 2 exp{ 12 (y i m 0 ) T 1 (y i m 0 )} , 1 (2) k/2 a2 a2 a2 m2 = ( 1 b e r2 , 1 b e 2 r2 , …, 1 b e kr2 ) 2 2 2 a1 m1 = ( 1 b e r1 1 a0 m0 = ( 1 b e r 0 0 a1 a1 , 1 b e 2 r1 , …, 1 b e kr1 ) 1 1 a0 a0 , 1 b e 2 r , …, 1 b e kr ) 0 0 0 0 The unknown parameters: (a2, b2, r2, a1, b1, r1, a0, b0, r0, , 2, QTL position) [12 parameters] The advantages of functional mapping (1) Increase statistical power to detect QTL (fewer parameters estimated) (2) Enhance biological relevance of the QTL detected (biological rules considered) (3) Ask and answer questions at the interface among different biological disciplines (4) Display tremendous potential in biomedical research! Important assumptions: Constant variances over time Correlation decay exponentially with time intervals Relaxing the first assumption (1) Transform-Both-Sides (Ruppert and Carroll, JASA 1984) log[ g( t )] log[ a ] (logistic curve) 1 be rt Wu, R. L., C.-X. Ma, M. Lin, Z. H. Wang and G. Casella, 2004 Functional Mapping of Quantitative Trait Loci Underlying Growth Trajectories Using a Transform-Both-Sides Logistic Model. Biometrics. (2) Modeling time-dependent variances using non-linear functions Implications of functional mapping (1) Multivariate allometric scaling Y = aX1bX2c (2) Multiplicative epistasis Y = X1X2, Different genes affect X1 and X2 and these two genes epistatically interact to determine Y. (3) Photosynthesis-light curve Rectangular hyperbola (4) Reaction norm Thermal performance curves (5) Biomass partitioning Wu’s (1993) community productivity model (6) Pharmacokinetics (PK – what drug does to body) (7) Pharmacodynamics (PD – what body does to drug) (8) Human body growth models (8) Clonal design (9) HIV pathogenesis (virus dynamics, virus-host interaction) (10) Cancer invasion: A reaction-diffusion model (mathematical oncology) (11) Bird Flight – Modeling Flight Mechanics Using Power Functions