Basic Concepts Gene, Allele, Genotype, and Phenotype A pair of chromosomes Father Mother Phenotype Subject Gene A, with two alleles A and a Genotype Height IQ 1 2 AA AA 185 182 100 104 3 4 Aa Aa 175 171 103 102 5 6 aa aa 155 152 101 103 Bad news: It is very hard to detect such a gene directly. Genetic Mapping A gene that affects a quantitative trait is called a quantitative trait locus (QTL). A QTL can be detected by the markers linked with it. A QTL detected is a chromosomal segment. Marker 1 QTL Marker 2 Marker 3 . . . Marker k Linkage Map QTL Mapping in Natural Populations • Basic theory for QTL mapping is derived from linkage analysis in controlled crosses • There is a group of species in which it is not possible to make crosses • QTL mapping in such species should be based on existing populations Human Chromosomes Male Xy X y Female XX X XX Xy Daughter Son Human Difference How many genes control human body height? Discontinuous Distribution due to a single dwarf gene Continuous Distribution due to many genes? Continuous Variation due to • Polygenes 31=3, 32=9, …, 310=59,049 • Environmental modifications • Gene-environmental interactions Power statistical methods are crucial for the identification of human height genes Data Structure Marker (M) Subject M1 1 2 3 4 5 6 7 8 AA(2) AA(2) Aa(1) Aa(1) Aa(1) Aa(1) aa(0) aa(0) M2 BB(2) BB(2) Bb(1) Bb(1) Bb(1) bb(0) Bb(1) bb(0) Conditional prob … Mm Phenotype (y) … ... ... ... ... ... ... … y1 y2 y3 y4 y5 y6 y7 y8 of QTL genotype QQ(2) Qq(1) qq(0) 2|1 2|2 2|3 2|4 2|5 2|6 2|7 2|8 1|1 1|2 1|3 1|4 1|5 1|6 1|7 1|8 0|1 0|2 0|3 0|4 0|5 0|6 0|7 0|8 Linkage disequilibrium mapping – natural population Association between marker and QTL -Marker, Prob(M)=p, Prob(m)=1-p -QTL, Prob(A)=q, Prob(a)=1-q Four haplotypes: Prob(MA)=p11=pq+D Prob(Ma)=p10=p(1-q)-D Prob(mA)=p01=(1-p)q-D Prob(ma)=p00=(1-p)(1-q)+D p=p11+p10 q=p11+p01 D=p11p00-p10p01 Joint and conditional (j|i) genotype prob. between marker and QTL AA Aa aa Obs MM Mm mm p112 2p11p01 p012 2p11p10 2(p11p00+p10p01) 2p01p00 p102 2p10p00 p002 n2 n1 n0 MM p112 p2 2p11p01 2p(1-p) p012 (1-p)2 2p11p10 p2 2(p11p00+p10p01) 2p(1-p) 2p01p00 (1-p)2 p102 p2 2p10p00 2p(1-p) p002 (1-p)2 n2 Mm mm n1 n0 Linkage disequilibrium mapping – natural population Mixture model-based likelihood with marker information L(|y,M)=i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Prior prob. Sam- Height ple (cm, y) 1 184 2 185 3 180 4 182 5 167 6 169 7 165 8 166 Marker genotype M MM (2) MM (2) Mm (1) Mm (1) Mm (1) Mm (1) mm (0) mm (0) QTL genotype AA Aa 2|1 1|1 2|2 1|2 2|3 1|3 2|4 1|4 2|5 1|5 2|6 1|6 2|7 1|7 2|8 1|8 aa 0|1 0|2 0|3 0|4 0|5 0|6 0|7 0|8 Linkage disequilibrium mapping – natural population Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) L(|y,M) = i=1n [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] = i=1n2 [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Conditional on 2 (n2) i=1n1 [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Conditional on 1 (n1) i=1n0 [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Conditional on 0 (n0) Linkage disequilibrium mapping – natural population Normal distributions of phenotypic values for each QTL genotype group f2(yi) = 1/(22)1/2exp[-(yi-2)2/(22)], 2 = + a f1(yi) = 1/(22)1/2exp[-(yi-1)2/(22)], 1 = + d f0(yi) = 1/(22)1/2exp[-(yi-0)2/(22)], 0 = - a Linkage disequilibrium mapping – natural population Differentiating L with respect to each unknown parameter, setting derivatives equal zero and solving the log-likelihood equations L(|y,M) = i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] log L(|y,M) = i=1n log[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Define 2|i = 2|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] 1|i = 1|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] 0|i = 0|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (1) (2) (3) 2 = i=1n(2|iyi)/ i=1n2|i 1 = i=1n(1|iyi)/ i=1n1|i 0 = i=1n(0|iyi)/ i=1n0|i 2 = 1/ni=1n[2|i(yi-2)2+1|i(yi-1)2+0|i(yi-0)2] (4) (5) (6) (7) Complete data QQ MM Mm mm Prior prob Qq qq Obs p112 2p11p01 p012 2p11p10 2(p11p00+p10p01) 2p01p00 p102 2p10p00 p002 n2 n1 n0 QQ Qq qq Obs n20 n10 n00 n2 n1 n0 MM n22 n21 Mm n12 n11 mm n02 n01 p11=[2n22 + (n21+n12) + n11]/2n, p10=[2n20 + (n21+n10) + (1-)n11]/2n, p01=[2n02 + (n12+n01) + (1-)n11]/2n, p11=[2n00 + (n10+n01) + n11]/2n, =p11p00/(p11p00+p10p01) Incomplete (observed) data Posterior prob QQ Qq qq Obs MM 2|i Mm 2|i mm 2|i n2 n1 n0 1|i 1|i 1|i 0|i 0|i 0|i p11=[i=1n2(22|i+1|i)+i=1n1(2|i+1|i)]/2n, p10={i=1n2(20|i+1|i)+i=1n1[0|i+(1-)1|i]}/2n, p01={i=1n0(22|i+1|i)+i=1n1[2|i+(1-)1|i]}/2n, p00=[i=1n2(20|i+1|i)+i=1n1(0|i+1|i)]/2n (8) (9) (10) (11) EM algorithm (1) Give initiate values (0) =(2,1,0,2,p11,p10,p01,p00)(0) (2) Calculate 2|i(1), 1|i(1) and 0|i(1) using Eqs. 1-3, (3) Calculate (1) using 2|i(1), 1|i(1) and 0|i(1) based on Eqs. 4-11, (4) Repeat (2) and (3) until convergence. Hypothesis Tests • Is there a significant QTL? H0: μ2 = μ1 = μ1 H1: Not H0 LR1 = -2[ln L0 – L1] Critical threshold determined from permutation tests Hypothesis Tests • Can this QTL be detected by the marker? H0: D = 0 H1: Not H0 LR2 = -2[ln L0 – L1] Critical threshold determined from chi-square table (df = 1) A case study from human populations • 105 black women and 538 white women; • 10 SNPs genotyped within 5 candidates for human obesity; • Two obesity traits, the amount of body fat (body mass index, BMI) and its distribution throughout the body (waist to hip circumference ratio, WHR) Objective Detect quantitative trait nucleotides (QTNs) predisposing to human obesity traits, BMI and WHR BMI SNP Chrom. ADRA1A 8p21 q D a d LR Black 0.20 0.04 11.40 -2.63 3.90* White NS WHR ADRB1 ADRB2 10q24 5q32-33 ADRB2- 5/20 GNAS1 q D a d LR 0.83 -0.07 -0.15 -0.24 5.91* NS q D a d LR 0.16 0.07 0.16 -0.20 5.88* NS q D a d LR 0.83 0.02 -0.18 -0.10 8.42* 0.78 0.03 -0.15 -0.16 8.06* Shape mapping meets LD mapping Mapping Body Shape Genes through Shape Mapping Ningtao Wang, Yaqun Wang, Zhong Wang, Han Hao and Rongling Wu* Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA 17033, USA J Biom Biostat 2012, 3:8