Bayesian Inference for QTLs in Inbred Lines Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell with Jaya M. Satagopan, Sloan-Kettering and Patrick J. Gaffney, Lubrizol NCSU Statistical Genetics June 2001 June 2001 NCSU QTL II Workshop © Brian S. Yandell 1 Many Thanks Michael Newton Daniel Sorensen Daniel Gianola Jaya Satagopan Patrick Gaffney Fei Zou Liang Li Tom Osborn David Butruille Marcio Ferrera Josh Udahl Pablo Quijada Alan Attie Jonathan Stoehr USDA Hatch Grants June 2001 NCSU QTL II Workshop © Brian S. Yandell 2 What is the Goal Today? • resampling from data – permutation tests – bootstrap, jackknife – MCMC • special Markov chain • Monte Carlo sampling • show MCMC ideas – Gibbs sampler – Metropolis-Hastings – reversible jump MCMC June 2001 • Bayesian perspective – common in animal model – use “prior” information • previous experiments • related genomes • inbred lines “easy” – can check against *IM – ready extension • • • • NCSU QTL II Workshop © Brian S. Yandell multiple experiments pedigrees non-normal data epistasis 3 Note on Outbred Studies • Interval Mapping for Outbred Populations – – – – Haley, Knott & Elsen (1994) Genetics Thomas & Cortessis (1992) Hum. Hered. Hoeschele & vanRanden (1993ab) Theor. Appl. Genet. (etc.) Guo & Thompson (1994) Biometrics • Experimental Outbred Crosses (BC, F2, RI) – collapse markers from 4 to 2 alleles • Multiple Cross Pedigrees – polygenic effects not modeled here – related individuals are correlated (via coancestry) – Liu & Zeng (2000) Genetics – Zou, Yandell & Fine (2001) Genetics June 2001 NCSU QTL II Workshop © Brian S. Yandell 4 Overview • I: Single QTL • II: Bayesian Idea • V: How many QTL? – Reversible Jump – analog to regression – Bayes rule – posterior & likelihood • VI: RJ-MCMC Details • III: MCMC Samples – Monte Carlo idea – study posterior • IV: Multiple QTL June 2001 • VII: Bayes Factors • VIII: References – Software – Articles NCSU QTL II Workshop © Brian S. Yandell 5 Part I: Interval Mapping • Modelling a trait with a QTL – linear model for trait given genotype – recombination near loci for genotype • Likelihoods • Review Interval Maps & Profile LODs June 2001 NCSU QTL II Workshop © Brian S. Yandell 6 QTL Components • observed data on individual – trait: field or lab measurement • log( days to flowering ) , yield, … – markers: from wet lab (RFLPs, etc.) • linkage map of markers assumed known • unobserved data on individual – geno: genotype (QQ=1/Qq=0/qq=-1) • unknown model parameters – effects: mean, difference, variance – locus: quantitative trait locus (QTL) June 2001 NCSU QTL II Workshop © Brian S. Yandell 7 Single QTL trait Model • trait = mean + additive + error • trait = effect_of_geno + error • prob( trait | geno, effects ) y j b* x*j e j x=1 x=-1 x=0 ( y j | x*j ; , b* , 2 ) y j b* x*j June 2001 10 NCSU QTL II Workshop © Brian S. Yandell 12 14 8 6 6 8 8 10 trait 10 12 12 14 14 Simulated Data with 1 QTL x=-1 June 2001 x=1 0 2 NCSU QTL II Workshop © Brian S. Yandell 4 6 frequency 8 9 Recombination and Distance • no interference--easy approximation – Haldane map function – no interference with recombination • all computations consistent in approximation – rely on given map • marker loci assumed known – 1-to-1 relation of distance to recombination – all map functions are approximate • assume marker positions along map are known r June 2001 1 2 1 e 2 NCSU QTL II Workshop © Brian S. Yandell 10 markers, QTL & recombination rates r3 r2 r1 r5 r4 * x ? M1 M 2 M3 M4 M5 M6 ? June 2001 distance along chromosome NCSU QTL II Workshop © Brian S. Yandell 11 Interval Mapping of QT genotype • can express probabilities in terms of distance – locus is distance along linkage map – flanking markers sufficient if no missing data – could consider more complicated relationship prob( geno | locus, map ) = prob( geno | locus, flanking markers ) ( x*j | ) ( x*j | , M j ,k , M j ,k 1 ) June 2001 NCSU QTL II Workshop © Brian S. Yandell 12 Building trait Likelihood • likelihood is mixture across possible genotypes • sum over all possible genotypes at locus like( effects, locus | trait ) = sum of prob( trait, genos | effects, locus ) L ( , b* , 2 , | y j ) ( y j | , b* , 2 ; ) * 2 ( y | x ; , b , ) ( x | ) j x 1, 0 ,1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 13 Likelihood over Individuals • product of trait probabilities across individuals – product of sum across possible genotypes like( effects, locus | traits, map ) = product of prob( trait | effects, locus, map ) n L( , b* , 2 ; | y ) ( y j | , b* , 2 ; ) j 1 n * 2 ( y | x ; , b , ) ( x | ) j j 1 x 1, 0 ,1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 14 25 Profile LOD for 1 QTL 0 5 10 15 LOD 20 QTL IM 0 10 20 30 40 50 60 70 80 90 distance (cM) June 2001 NCSU QTL II Workshop © Brian S. Yandell 15 Interval Mapping for Quantitative Trait Loci • profile likelihood (LOD) across QTL – scan whole genome locus by locus • use flanking markers for interval mapping – maximize likelihood ratio (LOD) at locus • best estimates of effects for each locus • EM method (Lander & Botstein 1989) n LOD ( ) (log 10 e) ln j 1 June 2001 (y | x; ˆ , bˆ* , ˆ 2 ) ( x | ) x 1, 0 ,1 * 2 ˆ ( y j | ˆ , b 0, ˆ ) j NCSU QTL II Workshop © Brian S. Yandell 16 Interval Mapping Tests • profile LOD across possible loci in genome – maximum likelihood estimates of effects at locus – LOD is rescaling of L(effects, locus|y) • test for evidence of QTL at each locus – LOD score (LR test) – adjust (?) for multiple comparisons June 2001 NCSU QTL II Workshop © Brian S. Yandell 17 Interval Mapping Estimates • confidence region for locus – based on inverting test of no QTL – 2 LODs down gives approximate CI for locus – based on chi-square approximation to LR • confidence region for effects – approximate CI for effect based on normal – point estimate from profile LOD locus CI | LOD (ˆ ) LOD ( ) 2 effect CI bˆ* 1.96se(bˆ* ) June 2001 NCSU QTL II Workshop © Brian S. Yandell 18 Part II: Bayesian Idea • joint distribution of known & unknown – known: trait, markers, linkage map – unknown: locus, genotype, effect, variance • Use Same Likelihood Components – trait given genotype • follows linear model • depends on size of effect, variance – genotype given locus, markers & map • depends on recombination near locus • Inference about unknowns – Bayes theorem June 2001 NCSU QTL II Workshop © Brian S. Yandell 19 What is Probability? Frequentist Analysis repeat experiment – many times – hypothetical long term frequency – Type I error rate – reject null when true June 2001 Bayesian Analysis uncertainty about true value prior – uncertainty before examining data – incorporate prior knowledge/experience posterior – uncertainty after analyzing current data – balance prior & current NCSU QTL II Workshop © Brian S. Yandell 20 Bayes Theorem • posteriors and priors – prior: – posterior: prob( parameters ) prob( parameters | data ) • posterior = likelihood * prior / constant • posterior distribution is proportional to – likelihood of parameters given data – prior distribution of parameters * * * ( b and y ) ( y | b ) ( b ) * (b | y ) (y ) (y ) (b* | y ) (y | b* ) (b* ) / constant June 2001 NCSU QTL II Workshop © Brian S. Yandell 21 Bayesian Prior • “prior” belief used to infer “posterior” estimates – higher weight for more probable parameter values • based on prior knowledge – use previous study to inform current study • weather prediction: tomorrow is much like today • previous QTL studies on related organisms – historical criticism: can get “religious” about priors • often want negligible effect of prior on posterior – pick non-informative priors • all parameter values equally likely • large variance on priors – always check sensitivity to prior June 2001 NCSU QTL II Workshop © Brian S. Yandell 22 parameter June 2001 0.2 parameter = 6 012345678910 Poisson data : y 1,3,8 posterior 0.3 0 0.0 0.1 0.0002 0.2 likelihood 0 2 4 6 8 parameter = 4 012345678910 0.0 0.2 0.0 parameter = 2 012345678910 0.0004 0.0 0.2 Likelihood & Posterior Example parameter : t ? t k e t prob{ y k | t} k! NCSU QTL II Workshop © Brian S. Yandell 23 Bayesian Idea for QTLs • Model a trait with a QTL – linear model for trait given genotype – recombination near loci for genotype • Find Bayesian Posterior – compute likelihood using MCMC • Case Studies – simulated single QTL – simulated two QTL – flowering time data for Brassica napus June 2001 NCSU QTL II Workshop © Brian S. Yandell 24 Posterior for QTL Effect • posterior = likelihood * prior / constant • posterior distribution is proportional to – prior distribution of effect – likelihood of traits given effect & genos (b* | y) is proportional to n (b ) ( y j | x*j ; , b* , 2 ) * j 1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 25 Full Posterior for QTL • posterior = likelihood * prior / constant • posterior( paramaters | data ) prob( genos, effects, loci | trait, map ) ( x * ; , b* , 2 ; | y ) is proportional to n ( ) (b* ) ( 2 ) ( ) ( x*j | ) j 1 n ( y j | x*j ; , b* , 2 ) j 1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 26 How to Study Posterior? • exact methods – exact if possible – can be difficult or impossible to analyze • approximate methods – importance sampling – numerical integration – Monte Carlo & other June 2001 • Monte Carlo methods – easy to implement – independent samples • MCMC methods – handle hard problems – art to efficient use – correlated samples NCSU QTL II Workshop © Brian S. Yandell 27 38 40 2.0 2.2 36 1.8 1.8 2.0 additive 2.2 Posterior for locus & effect 42 QTL 1 0.0 0.1 0.2 0.3 distance (cM) 36 38 40 42 distance (cM) June 2001 NCSU QTL II Workshop © Brian S. Yandell 28 Marginal Posterior Summary • marginal posterior for locus & effects • highest probability density (HPD) region – smallest region with highest probability – credible region for locus & effects • HPD with 50,80,90,95% – range of credible levels can be useful – marginal bars and bounding boxes – joint regions (harder to draw) June 2001 NCSU QTL II Workshop © Brian S. Yandell 29 1.8 2.0 additive 2.2 HPD Region for locus & effect 36 38 40 42 distance (cM) June 2001 NCSU QTL II Workshop © Brian S. Yandell 30 QTL Bayesian Inference • study posterior distribution of locus & effects – sample joint distribution • locus, effects & genotypes – study marginal distribution of • locus • effects – overall mean, genotype difference, variance • locus & effects together • estimates & confidence regions – histograms, boxplots & scatter plots – HPD regions June 2001 NCSU QTL II Workshop © Brian S. Yandell 31 Frequentist or Bayesian? • Frequentist approach – fixed parameters • range of values – maximize likelihood • ML estimates • find the peak – confidence regions • random region • invert a test – hypothesis testing • 2 nested models June 2001 • Bayesian approach – random parameters • distribution – posterior distribution • posterior mean • sample from dist – credible sets • fixed region given data • HPD regions – model selection/critique • Bayes factors NCSU QTL II Workshop © Brian S. Yandell 32 Frequentist or Bayesian? • Frequentist approach – maximize over mixture of QT genotypes – locus profile likelihood • max over effects – HPD region for locus • natural for locus – 1-2 LOD drop • work to get effects • Bayesian approach – joint distribution over QT genotypes – sample distribution • joint effects & loci – HPD regions for • joint locus & effects • use density estimator – approximate shape of likelihood peak June 2001 NCSU QTL II Workshop © Brian S. Yandell 33 Basic Idea of Likelihood Use • build likelihood in steps – build from trait & genotypes at locus – likelihood for individual i – log likelihood over individuals • maximize likelihood (interval mapping) – EM method (Lander & Botstein 1989) – MCMC method (Guo & Thompson 1994) • study whole likelihood as posterior (Bayesian) – analytical methods (e.g. Carlin & Louis 1998) – MCMC method (Satagopan et al 1996) June 2001 NCSU QTL II Workshop © Brian S. Yandell 34 Studying the Likelihood • maximize (*IM) – find the peak – avoid local maxima – profile LOD • across locus • max for effects • sample (Bayes) – get whole curve – summarize later – posterior • EM method – always go up – steepest ascent • MCMC method – – – – jump around go up if you can sometimes go down cool down to find peak • simulated annealing • simulated tempering • locus & effects together June 2001 NCSU QTL II Workshop © Brian S. Yandell 35 EM-MCMC duality • EM approach can be redone with MCMC – EM estimates & maximizes – MCMC draws random samples – both can address same problem • sometimes EM is hard (impossible) to use • MCMC is tool of “last resort” – – – – June 2001 use exact methods if you can try other approximate methods be clever! very handy for hard problems in genetics NCSU QTL II Workshop © Brian S. Yandell 36 Simulation Study • 200 simulation runs • n = 100, 200; h^2 = 10, 20% • 1 QTL at 15cM • markers at 0, 10, 20, 40, 60, 80 • effect = 1 • variance depends on h^2 June 2001 NCSU QTL II Workshop © Brian S. Yandell 37 200 Simulations: Effect 1.5 0.2 0.6 1.0 1.4 2.0 0.6 1.0 1.4 n = 200 h^2 = 10 n = 200 h^2 = 20 1.2 0.8 0.6 1.0 1.4 QTL Cartographer 0.6 0.8 1.0 1.2 QTL Cartographer Bayesian MCMC QTL Cartographer 0.4 Bayesian MCMC 1.0 June 2001 n = 100 h^2 = 20 Bayesian MCMC 0.5 1.0 1.5 Bayesian MCMC n = 100 h^2 = 10 0.8 1.0 1.2 QTL Cartographer NCSU QTL II Workshop © Brian S. Yandell 38 200 Simulations: Locus 40 30 20 10 10 20 30 40 0 10 20 30 40 n = 200 h^2 = 10 n = 200 h^2 = 20 15 25 35 QTL Cartographer 5 Bayesian MCMC 30 20 5 10 15 20 25 QTL Cartographer 40 QTL Cartographer 10 Bayesian MCMC 0 June 2001 n = 100 h^2 = 20 Bayesian MCMC 35 25 15 5 Bayesian MCMC n = 100 h^2 = 10 0 5 10 15 QTL Cartographer NCSU QTL II Workshop © Brian S. Yandell 39 June 2001 NCSU QTL II Workshop © Brian S. Yandell 40 Part III: MCMC Sampling • Study the Bayesian Posterior – use Markov chain to sample • Markov chain Monte Carlo • Gibbs sampler for effects • Metropolis-Hastings for loci • Brassica data on days to flowering June 2001 NCSU QTL II Workshop © Brian S. Yandell 41 How to Proceed? • want to study π(parameters|data) • run Markov chain with stable pattern π() • study properties of Markov chain to learn about posterior π(parameters|data) – Markov chain Monte Carlo • summarize results in graphical form • diagnostics June 2001 NCSU QTL II Workshop © Brian S. Yandell 42 Markov chain idea • future given present is independent of past • update chain based on current value – can make chain arbitrarily complicated – chain converges to stable pattern π() we wish to study (1) p /( p q) p 1-p 0 1 1-q q June 2001 NCSU QTL II Workshop © Brian S. Yandell 43 Markov chain idea p 1-p Mitch’s 0 (1) p /( p q) q 1 1-q 1 Series1 Series2 Other 0 1 June 2001 3 5 7 9 11 13 15 17 19 21 NCSU QTL II Workshop © Brian S. Yandell 44 Markov chain Monte Carlo • can study arbitrarily complex models – need only specify how parameters affect each other – can reduce to specifying full conditionals • construct Markov chain with “right” model – update some parameters given data and others – can fudge on “right” (importance sampling) – next step depends only on current estimates • nice Markov chains have nice properties – sample summaries make sense – consider almost as random sample from distribution June 2001 NCSU QTL II Workshop © Brian S. Yandell 45 MCMC Run for 1 locus Data distance (cM) 40 38 36 42 40 38 36 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 11 1 11 1 11 11 11 1 1 1 1 1 11 1 1 1 111 1 11 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1111 1 11 111 1 111 11 1 11 1 11 1 11 11 1 111 11 1 1 1 11111 1 1 1111 1 111 1 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 11 1 1 11 1 1 11 1 11111111 1 1111 1 11 1 111 1 11 111 11 11 1 11111111 1 1 1 1 11 1 1 11 111 11 1 1 11 1 11 1 1 1 11 111 111 11111 11 1 1 1 1 1111 1 11 11111 111 1111 1 1 1 1 1 11111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11111111 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 11 1 1111 1 1 11 1 11 111 1 111 111 1 1 111 11111 11111 1 1 1 1 11 11111 111 111111 11 1 11111 111 1 1 1 1 1 1 1 1 1 1 1 11 111111 11 1 1 1 1 11 11 1 111 1 1 1 1 1 111111 1 1 1 1111111 11 1 11 11 11 1 11 1 111111111 111111111111 1 1 11 1111 1 11111 111 11 111 11 1 1 1 111 1 1 11 111111 1111 11 1 11111 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1111 1 1111 11 1 1 1 1 1 11 1 111 111 11 1 11 111 11111 1 111 11 1111 11 11 11 1111111 1 1 1 111 11 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1111 11 11 1 1 1 1 1 11 1 1111 1 111111 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 111 1 1 1 1 1 111 111 11 11 11 111 1 1 1 11111111111 11 111 1 1 11 11 1 11 1 11 11 1 1 1 111 1 1111 111 1 1 1 11 111 1 11 1 1 11 11 1 1 1 1 1 1 1 1 1 1 11 1 11 1 11 1 1111 1 1 1 1 1 1 11 111 1 1 1 1 1 1 11 11 11 1 1 11 1 1 1 1 11 11 11 1 1 11 1 1 1 11 11 111 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 1 111 1 1 111 1 1 1 11 1 1 1 1 11 1 11 1 1 1 111 1 11 1 1 1 1 1 11 11 1 1 1 11 1 1 1 1 1 111 1 1 1 1 1 1 1 1 1 11 1 11 1 1 1 1 1 1 1 42 1 1 1 1 1 0 200 400 600 800 1000 0 MCMC run/100 June 2001 NCSU QTL II Workshop © Brian S. Yandell 20 40 60 frequency 46 Why not Ordinary Monte Carlo? • independent samples of joint distribution • chaining (or peeling) of effects • requires numerical integration – possible analytically here – very messy in general ( , b* , 2 | y, x * ) ( 2 | y, x * ; , b* ) (b* | y, x * ; ) ( | y, x * ) ( | y, x * ) E(b , ) ( , b* , 2 | y, x * ) * June 2001 2 NCSU QTL II Workshop © Brian S. Yandell 47 MCMC Idea for QTLs • construct Markov chain around posterior – want posterior as stable distribution of Markov chain – in practice, the chain tends toward stable distribution • initial values may have low posterior probability • burn-in period to get chain mixing well • update one (or several) components at a time – update effects given genotypes & traits – update locus given genotypes & traits – update genotypes give locus & effects ( x * ; , b* , 2 ; ) ~ ( x * ; , b* , 2 ; | y ) 1 2 N June 2001 NCSU QTL II Workshop © Brian S. Yandell 48 MCMC Effect Updates genos mean additive variance traits June 2001 NCSU QTL II Workshop © Brian S. Yandell 49 Gibbs Sampler for effects • set up Markov chain around posterior for effects • sample from posterior by sampling from full conditionals – conditional posterior of each parameter given the other – update parameter by sampling full conditional update mean ( | y, x * ; b* , 2 ) ( ) (y | x * ; , b* , 2 ) / c update additive (b* | y, x * ; , 2 ) (b* ) (y | x * ; , b* , 2 ) / c update variance ( 2 | y, x * ; , b* ) ( 2 ) (y | x * ; , b* , 2 ) / c June 2001 NCSU QTL II Workshop © Brian S. Yandell 50 0 200 400 600 800 10.4 mean 10.0 10.2 1 1 1 1 1 1 11 11 1 1 1 11 11 1 1 1 111 11 11 1111 1111 1 1111 11 1111 1 1 1 1 1 1 1 1 1 1 1 1 111 111 11 1 11 1 1 1 111 1 111 1 1 1 11 1 11 1 1 1 1111111111 1 111 111 11111 11111111111111111111 1111 11 1 1111 11111111 11111111 111 11 1111 11 111111 111 11 1 11 111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1111111 1 1111111 111111 1111111 1 1 1 1 1 1 1 1111 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11111111111111111111 11111111 1111111 111 11111 1111111 11 1111111 111 11 111 1111111 111 1 1111 11 111111111 11 111 11111 11 1111 1111 11 1111111111 11 111111 11111111 111111111 1111 111111 11 111 11 11 1111111111 1111 11111 111111 1 1 1 1 1 111111 1 1 11 1 1 1 111111 1 11111 1111111 1 1 1 1 1 1 1 1 1 111111111 1 1 1 1 1 1 1 1111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 1111 1111 111111 111111 1 1111111 1111 11 1111111 1111111 1 11 1111111 1 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 11 1 1 1111111 111 111 11 11 111 11 1 1 1 11 1 11 11 1 11 1 11 11 11 1 11 111 1 1 1 1 111 1 1 1 1 1 1 1 11 1 1 1 1 9.8 1 9.6 9.6 9.8 10.0 10.2 10.4 MCMC run of mean & additive 1000 0 20 MCMC run/100 40 60 frequency 11 1 1 1 1 1 1 11 111 1 1 1 11 1 1 1 11 11 1 1 1111 1 11 111 1 11 1 1 1 1 1 1 1 1 1 1 11 1111 11111 1 1 1 11 1 1 1 111 11 1 11 11 111 11111111 111 1 1111 1 11 11 111111111111111 1 11 111 111111111 1 1 1 1 1 1 1 1 1 11 11111 1 1 1111111111 11 11111111 1111 111 11111 1111111 11111111111 1 1111111 11 1111 11 111111111 1111 1111111 11111111111111111 11 111 1111111 1 11 111 1111111 111 11111 1111 111111 1111 1111 1 111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1111 1 111 11111 1111 111111111 11 1 1111 11111111111111111111 1 1 1111111 111111111 111 11111 11111 1111 11111 11 11 1 111111 111 11111 111 111111 1 1 11 111 1111111 1 111 11111 1 11 1 1 1 1 1 1 1 1 1 1 11111 1111 11 111111 11111 11 1 111111 1111111 11111 111 11 11111 1111111 11 11111 1111 1111111 11111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 11 1 1111111111 111 1111 1 1 1 1111 11 11 1 1 1 111 1 111 1 1111111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 11 1 1 1111111111 11111 1 11 11 1 1 11 1 11 11 11 111111 11 11 11 11 1 11 1 11111111 111 11 1 11 1 1111111111 11 1 1 1 11 11111111 111 1 1 111 111 11111 1 1 1 11 1 11 1 1 111 1 1 11 1 1 1 1 1 1 1 0 200 400 600 800 1000 additive 2.0 2.2 1.8 1.8 2.0 2.2 1 0 MCMC run/100 June 2001 NCSU QTL II Workshop © Brian S. Yandell 20 40 60 frequency 51 Full Conditional for mean • normal prior with large variance 2 • leads to normal posterior • posterior mean • posterior variance June 2001 * * n y b xj j * * 2 ( | y, x ; b , ) j 1 n E ( | y, x * ; b* , 2 ) ( y j b x ) 2 * * j j 1 n 2 2 n 2 * * ( y b j xj) j 1 n 2 2 V ( | y, x * ; b* , 2 ) n n 2 2 NCSU QTL II Workshop © Brian S. Yandell 52 Full Conditional for additive Effect • normal prior with large variance 2 • leads to normal posterior • posterior mean * * * n y b xj b j * * 2 (b | y, x ; , ) j 1 n E (b* | y, x * ; , 2 ) x (y j 1 n (x ) j 1 • posterior variance V (b | y, x ; , ) * * j * 2 j n (x ) NCSU QTL II Workshop © Brian S. Yandell * 2 j n ) x (y j 1 2 j ) (x ) j 1 2 * j n 2 2 2 2 j 1 June 2001 * j * 2 j 2 n (x ) j 1 * 2 j 53 Full Conditional for variance • inverse gamma 2 ~ Inv (a, v) ( 2 ) ( a 1) e v / prior with large v/a • posterior distribution 2 2 | y, x * ; , b* ~ Inv a n2 , v n2 ˆ 2 n • posterior mean June 2001 v n2 ˆ 2 2 ˆ E ( | y, x ; , b ) n a 2 1 2 * * NCSU QTL II Workshop © Brian S. Yandell (y j 1 * * 2 b xj) j n 54 MCMC run for variance 1 1 1 1 1 1 1 11 1 11 1 1 1 1 1 1 1 1 1 1 1 11 1 11 1 1 1 1 1 1 1 1 111 1 1 11 1 11 1 11 11 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 11 11 11 1 1 1 1 111 11111 1111 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 1 11 1 11 11 1 1111 1 11 111 1 11 1 111111 1 1 1 1 1 1 1111 1 111 1 1 1 1 11 1 11 1 1 11 11 1 1 1 1 11 1 1 11 11111 1 1 11 1 11 111 1 11 1 1 1 1 1 1 1 1 1 1 11111111111 1 1 111 1 1 1 1 1 1 111111 11 1 11 1111 111 111111 11 11 1 11 1 1 1 1 11 1 111 11 11 1 1111 11 1 1 11 1 1 1 1 1111 11 1 1 111 11 1 1 1 1 1 1 1 1 1 1 1 11 1 111 1 1 1 11111 1 1 1 11 1 111 111 111 11 1 111 11111 111 1 11 1 11 11 1111 11 1 1111 1 11 1 1 111111111111111111 111 11 1 1 1111111 1111111111111 1 11 1 1111111 111 1 1 111 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1111111 1 11 1111 1 111 111 1 11111 11 111 11111 1111 11 1 11 11 1 1111 1111 1 1 11111 1 1 1 111 1 1 1 11111 1 11 1 111 11111111111 11 11 111 111 1 1111111111111 11111 1 111 1 1 11111 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 111 1 1 1 11 1 11 1 1 11 11 1 11 11 11 1 1 1 1 11 1 1111 1 11 1111111 1 1 111111 11111111 1111111 11 1111 1 11111 1 1 1 1 1 1 1 1 11 1 11 1 1 1 1 1 1 1 1 1 1 1 11111 1 11 1 11 1 1111 1 1 11 1 1 1111 111111 11111 11111111 1 111 1 1 11 11 1 111 111 1 11 1 11 1 1111 11 1111 1 111 1111 1 1 11 111 1 11 1 1 111 1 1 1 1 1 1 1 11 1 1 1 1 11 1111 111 111 1 1 11 1 1 1 1 1 1 11 1 1 1 11 1 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 0 1 200 400 600 800 1000 variance 1.4 1.6 1 1 1 1.2 1.6 1.4 1 1 1 1.2 1 1 1.0 1.8 1 1.0 2.0 1 1.8 2.0 1 0 10 MCMC run June 2001 NCSU QTL II Workshop © Brian S. Yandell 20 30 40 frequency 55 Alternative for Variance: use Inverse Chi-square vd vd • inverse chi-square 2 ~ Inv 2 (d , v) 2 , or 2 ~ d2 prior with large d,v d • posterior distribution June 2001 n vd ( y j b* x*j ) 2 j 1 2 | y, x * ; , b* ~ Inv 2 d n, d n NCSU QTL II Workshop © Brian S. Yandell 56 Quick Review of trait Model • single QTL details of Gibbs sampler – normal priors & likelihoods • mean, additive effects – inverse gamma prior for variance • or inverse chi-square – vague priors lead to usual estimates as posterior means • multiple QTL trait model – model with vector notation June 2001 NCSU QTL II Workshop © Brian S. Yandell 57 MCMC locus & geno updates genos locus effects traits June 2001 NCSU QTL II Workshop © Brian S. Yandell 58 Full Conditional for genos • full conditional for genotype depends on – effects via trait model – locus via recombination model • can explicitly decompose by individual j – binomial (or trinomial) probability x*j 1, 0, or 1 * * 2 * ( y | x ; , b , ) ( x j j j | ) * * 2 Pj ( x j | y j ; , b , ; ) * 2 ( y | x ; , b , ) ( x | ) j x 1, 0 ,1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 59 Missing marker Data • sample missing marker data a la QT genotypes • full conditional for missing markers depends on – flanking markers – possible flanking QTL • can explicitly decompose by individual j – binomial (or trinomial) probability M kj 1, 0, or 1 ( M kj | x*j , y j ; , b* , 2 ; ; M j ) ( M kj | x*j ; M j ) June 2001 NCSU QTL II Workshop © Brian S. Yandell 60 Prior for locus • prior information from other studies – uniform prior over genome – use framework map •choose interval proportional to length •then pick uniform position within interval 0.0 0.2 •concentrate on credible regions •use posterior of previous study as new prior • no prior information on locus 0 June 2001 20 40 60 distance (cM) NCSU QTL II Workshop © Brian S. Yandell 80 61 Full Conditional for locus • cannot easily sample from locus full conditional • cannot explicitly determine full conditional – difficult to normalize – need to consider all possible genotypes over entire map • Gibbs sampler will not work – but can get something proportional ... ( | y, x * ; , b * , 2 ) ( | x * ) n ( ) ( x | ) / c * j j 1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 62 Metropolis-Hastings Step • pick new locus based upon current locus – propose new locus from distribution q( ) • pick value near current one? • pick uniformly across genome? – accept new locus with probability a() • Gibbs sampler is special case of M-H – always accept new proposal • acceptance insures right stable distribution (new | x * )q(new , old ) a(old , new ) min 1, * (old | x )q(old , new ) June 2001 NCSU QTL II Workshop © Brian S. Yandell 63 Care & Use of MCMC • sample chain for long run (100,000-1,000,000) – longer for more complicated likelihoods – use diagnostic plots to assess “mixing” • standard error of estimates – use histogram of posterior – compute variance of posterior--just another summary • studying the Markov chain – Monte Carlo error of series (Geyer 1992) • time series estimate based on lagged auto-covariances – convergence diagnostics for “proper mixing” June 2001 NCSU QTL II Workshop © Brian S. Yandell 64 Part IV: Multiple QTL • Multiple QTL Model • Sampling from the Posterior • Issues for 2 QTL • Bayes factors & Model Selection • Simulated data for 0,1,2 QTL • Brassica data on days to flowering June 2001 NCSU QTL II Workshop © Brian S. Yandell 65 MCMC run: 2 loci assuming only 1 1 1 1 1 1 11 1 11 1 1 1 1 11 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 11 1 1 1 11 1 1 1 1 1 111 1 1 1 1 1 1 1 1 1 111 11 1 11 1 11 1111 11 1 11 11 1 1 1 1 1 1 11 1 11111 1 11 1 1 1 1 11 111 111 11 1 111 1111111 111 11111111 1 111 111 111 111 11 111111 11 111111 11111111111111111111 11111 111111111111111111 1111 1 1 11 111111111 1 1 111 1 11 1 1 1111111 11111 1 1 11 1 111111111111 11 1111111 1 1 1 1 111 1111 111111111 1 1111111 1111111 11 1111 11 1111111 111 11 111 11111111111111111111 11 11111111 1111 111 111111 1111 1 1 1 1 1 111 1 1 1 1 1 1 1 111 1 1 1 1 1 11 111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11111111 11111 11111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11111111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11111 1111 1 111 11111111111 11111 11111 111111111 1111111111111 1 1 1111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1111 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 111 11 1 1 1 1 1 1111 1 1 11111 1 1 11 11111 111 1 1 1 111 1 1 111111111 1 1 1 1 1 11 111 11 1 111 1 1 1 11 11 1 1 111 1 1 1 1 11 1 1 1 11 11 1111111 1 11 11111 1 1 1 1 1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 70 1 50 distance60(cM) 1 50 60 70 1 1 1 1 11 1 40 40 1 1 11 11 1 0 200 400 600 800 1000 0 50 MCMC run/1000 June 2001 NCSU QTL II Workshop © Brian S. Yandell 100 150 200 250 frequency 66 Multiple QTL model • trait = mean + add1 + add2 + error • trait = effect_of_genos + error • prob( trait | genos, effects ) y j b x b x ej * * 1 j1 * * 2 j2 m y j b x ej r 1 June 2001 * * r jr NCSU QTL II Workshop © Brian S. Yandell 67 4 4 6 6 8 8 10 trait 10 12 12 14 14 16 16 Simulated Data with 2 QTL -1,-1 -1,1 1,-1 1,1 recombinants June 2001 0 1 NCSU QTL II Workshop © Brian S. Yandell 2 3 4 frequency 5 6 68 Issues for Multiple QTL • how many QTL influence a trait? – 1, several (oligogenic) or many (polygenic)? – how many are supported by the data? • searching for 2 or more QTL – conditional search (IM, CIM) – simultaneous search (MIM) • epistasis (inter-loci interaction) – many more parameters to estimate – effects of ignored QTL June 2001 NCSU QTL II Workshop © Brian S. Yandell 69 30 LOD for 2 QTL 0 5 10 15 LOD 20 25 QTL IM CIM 0 10 20 30 40 50 60 70 80 90 distance (cM) June 2001 NCSU QTL II Workshop © Brian S. Yandell 70 Interval Mapping Approach • interval mapping (IM) – scan genome for 1 QTL • composite interval mapping (CIM) – scan for 1 QTL while adjusting for others – use markers as surrogates for other QTL • multiple interval mapping (MIM) – search for multiple QTL June 2001 NCSU QTL II Workshop © Brian S. Yandell 71 Multiple QTL model • trait = mean + add1 + add2 + error • trait = effect_of_genos + error • prob( trait | genos, effects ) y j b x b x ej * * 1 j1 * * 2 j2 m y j b x ej r 1 June 2001 * * r jr NCSU QTL II Workshop © Brian S. Yandell 72 Vector Notation for Trait Model • inner product for sum • condense notation m y X *b * e or y j br* x*jr e j r 1 y * ( y j yn* ) T , e * (e j en* ) T , X x * June 2001 * n,m jr j,r 1 and b * (b1* bm* ) T NCSU QTL II Workshop © Brian S. Yandell 73 Vector Notation for Multiple loci • vector of loci across linkage map • careful bookkeeping during update – identifiability & bump hunting – possibility of two loci in one marker interval • ordered loci are sufficient m ( | X * ) (r | X * ) r 1 n (r | X ) (r ) ( x*jr | r ) * j 1 QT loci (1 m ) T June 2001 NCSU QTL II Workshop © Brian S. Yandell 74 Posterior: Multiple QTLs • posterior = likelihood * prior / constant • posterior( paramaters | data ) prob( genos, effects, loci | traits, map ) (X * ; , b * , 2 ; | y) is proportional to m n 2 * * ( ) ( ) (br ) (r ) ( x jr | r ) r 1 j 1 n ( y j | x *j ; , b * , 2 ) j 1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 75 MCMC for Multiple QTLs • construct Markov chain around posterior • update one (or several) components at a time – update effects given genotypes & traits – update loci given genotypes & traits – update genotypes give loci & effects • update all terms for each locus at one time? – open questions of efficient mixing (X * ; , b * , 2 ; ) ~ (X * ; , b * , 2 ; | y) 1 2 N June 2001 NCSU QTL II Workshop © Brian S. Yandell 76 70 distance (cM) 60 60 1 1 111111 11 11111111 1 1111111 1 11 1 1111 11 1111111111 111 11 1 1111 1 111 11 11 1 1111 1111111111111 11111 1111 11111111111 1 11111111 11 11111111 111 1111 11 111111 111111111 1111 11 1111 11 1 111 11 111 1 111 111111 11 11111111 1 11 1 111 11 11 11 11 1111 111 111111 11 11111 1111111 1 11 11111 11 111 111111 111 1111 11 11 11 11 1111 1 111111 111 11 111 111 11 111 1 1 11111111 11 1111 1 11111 11 11 11111 1 11 11 1 11111 1 1 111111 1 11111 111 11 11 11 111 1 1 11 1 1 1 111 11 1 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111111 11 1 1 1 1 1 1 1 1 1 1 11111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111111111111111111111 11111111111 1 11111 1 111 1111111 111111 11111111111 11111111 1 1 1111111111111 11111111111 111 1 11 111 1 1 1 1 1 11111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 11 111 11 111 1 1 111 1111111 111 1 11 11111 1 11111 111 11 11 1 1 1 11111 1 1 1 1 1 1 1 1 1 1 1 11 11 1 11 111 1 1 1 0 200 400 600 800 1000 40 50 50 40 80 2 22 22 2 2 2 2 22 2 2 2 2 2 2 2222 2 2 22 2 222 22 222 222 2 2 2 2 222 22222222 22222222 22 22 22222 2 22222 2 2 222 2 22 222222222 2 2 2 2 2 2 2 2 2 2 2 2 2 222 2 2 2 2 2 2 2 222 2 2222 2222 222 2 2222 22 222 22 2 222222 222 22222222222222 2 222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22222222222222222 2 22222222222 2222222222222 222222 2 22 22 2 222222222 22 222 22222 22 2 222 222222222222222 2222 2222 2 2222222 222 2 22 2222 22222 222 22222222222 22222 2222222 22222 222222222 222222222 2222 2 222222 22222 22 22222222 2222 222222 222 222 222 2 2 2222 222222222 22222222222 22 2222222222 22222222222222 2 2 2 222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2222222 22222222222222 2222222 22222222 2 22 22 222 22 2 2222222222 2222222222 2 2 222 222222 2222 2222222 2222 2222222 22222222 22222222 22222222 22 2 22 2222 2 222 22 22 22 22222 2222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2 22 2 22 22 22 2 2 2 2 2 70 80 MCMC run with 2 loci 0 50 MCMC run/100 June 2001 NCSU QTL II Workshop © Brian S. Yandell 100 150 200 250 300 frequency 77 Bayesian Approach • simultaneous search for multiple QTL • use Bayesian paradigm – easy to consider joint distributions – easy to modify later for other types of data • counts, proportions, etc. – employ MCMC to estimate posterior dist • study estimates of loci & effects • use Bayes factors for model selection – number of QTL June 2001 NCSU QTL II Workshop © Brian S. Yandell 78 2.6 2.4 2.2 1.8 2.0 2.2 2.0 1.8 effect 2.4 2.6 Effects for 2 Simulated QTL 40 50 60 70 80 effect 1 effect 2 0.0 0.05 0.10 0.15 distance (cM) 40 50 60 70 80 distance (cM) June 2001 NCSU QTL II Workshop © Brian S. Yandell 79 Brassica napus Data • 4-week & 8-week vernalization effect – log(days to flower) • genetic cross of – Stellar (annual canola) – Major (biennial rapeseed) • 105 F1-derived double haploid (DH) lines – homozygous at every locus (QQ or qq) • 10 molecular markers (RFLPs) on LG9 – two QTLs inferred on LG9 (now chromosome N2) – corroborated by Butruille (1998) – exploiting synteny with Arabidopsis thaliana June 2001 NCSU QTL II Workshop © Brian S. Yandell 80 2.5 3.0 2.5 8-week 3.5 3.5 Brassica 4- & 8-week Data 2.5 3.0 3.5 4.0 0 2 4 6 8 10 8-week vernalization 0 2 4 6 8 4-week 2.5 3.0 3.5 4.0 4-week vernalization June 2001 NCSU QTL II Workshop © Brian S. Yandell 81 8 Brassica Data LOD Maps 8-week 0 2 LOD 4 6 QTL IM CIM 0 10 20 30 40 50 60 70 80 90 60 70 80 90 15 distance (cM) 4-week 0 5 LOD 10 QTL IM CIM 0 10 20 30 40 50 distance (cM) June 2001 NCSU QTL II Workshop © Brian S. Yandell 82 4-week vs 8-week vernalization 4-week vernalization • longer time to flower • larger LOD at 40cM • modest LOD at 80cM • loci well determined cM 40 80 June 2001 add .30 .16 • • • • 8-week vernalization shorter time to flower larger LOD at 80cM modest LOD at 40cM loci poorly determined cM 40 80 NCSU QTL II Workshop © Brian S. Yandell add .06 .13 83 Brassica Credible Regions 8-week -0.3 -0.6 -0.2 -0.4 -0.2 additive additive -0.1 0.0 0.0 0.1 0.2 0.2 4-week 20 40 60 80 20 distance (cM) June 2001 NCSU QTL II Workshop © Brian S. Yandell 40 60 80 distance (cM) 84 Collinearity of QTLs • multiple QT genotypes are correlated – QTL linked on same chromosome – difficult to distinguish if close • estimates of QT effects are correlated – poor identifiability of effects parameters – correlations give clue of how much to trust • which QTL to go after in breeding? – largest effect? – may be biased by nearby QTL June 2001 NCSU QTL II Workshop © Brian S. Yandell 85 Brassica effect Correlations 8-week additive 2 -0.2 -0.1 cor = -0.7 -0.6 -0.3 -0.4 -0.2 additive 2 cor = -0.81 0.0 0.0 4-week -0.6 -0.4 -0.2 0.0 0.2 -0.2 additive 1 June 2001 -0.1 0.0 0.1 0.2 additive 1 NCSU QTL II Workshop © Brian S. Yandell 86 Simulation Study • 2 linked QTL • QTL Cart vs. Bayesian QTL estimates – locus: 15, 65cM – effect: 1, 1 • n = 100, h^2 = 30 • also considered – n = 200, h^2 = 25, 30, 40 June 2001 NCSU QTL II Workshop © Brian S. Yandell 87 2 QTL: Loci Estimates locus 2: n = 100, h^2 = 30 10 20 30 40 50 60 70 80 90 100 0 60 Bayesian QTL Locus 50 40 30 20 10 Bayesian QTL Locus locus 1: n = 100, h^2 = 30 20 QTL Cart Locus June 2001 NCSU QTL II Workshop © Brian S. Yandell 40 60 80 100 120 QTL Cart Locus 88 2 QTL: Effect Estimates locus 2: n = 100, h^2 = 30 1.0 0.0 0.5 Bayesian QTL Effect 1.4 1.0 0.6 Bayesian QTL Effect 1.8 1.5 locus 1: n = 100, h^2 = 30 0.8 1.0 1.2 1.4 1.6 1.8 QTL Cart Effect June 2001 NCSU QTL II Workshop © Brian S. Yandell -0.5 0.5 1.0 1.5 2.0 QTL Cart Effect 89 2 QTL: Loci & Effects 30 85 75 65 55 50 50 60 70 80 locus 1: n = 200, h^2 = 40 locus 2: n = 200, h^2 = 40 0.8 1.2 QTL Cart Effect 1.6 0.6 1.0 1.4 QTL Cart Locus Bayesian QTL Effect QTL Cart Locus 0.8 1.0 1.2 1.4 Bayesian QTL Effect 10 June 2001 locus 2: n = 200, h^2 = 40 Bayesian QTL Locus 40 30 20 10 Bayesian QTL Locus locus 1: n = 200, h^2 = 40 0.8 1.2 1.6 QTL Cart Effect NCSU QTL II Workshop © Brian S. Yandell 90 Bayes Factors Which model (1 or 2 or 3 QTLs?) has higher probability of supporting the data? – ratio of posterior odds to prior odds – ratio of model likelihoods ( model1 | y) / ( model2 | y) (y | model1 ) B12 ( model1 ) / ( model2 ) (y | model2 ) June 2001 BF(1:2) 2log(BF) evidence for 1st <1 <0 negative 1 to 3 0 to 2 negligible 3 to 12 2 to 5 positive 12 to 150 5 to 10 strong > 150 > 10 very strong NCSU QTL II Workshop © Brian S. Yandell 91 Bayes Factors & LR • equivalent to LR statistic when – comparing two nested models – simple hypotheses (e.g. 1 vs 2 QTL) • Bayes Information Criteria (BIC) in general – Schwartz introduced for model selection – penalty for different number of parameters p 2 log( B12 ) 2 log( LR) ( p2 p1 ) log( n) June 2001 NCSU QTL II Workshop © Brian S. Yandell 92 Model Determination using Bayes Factors • pick most plausible model – histogram for range of models – posterior distribution of models – use Bayes theorem – often assume flat prior across models • posterior distribution of number of QTLs June 2001 NCSU QTL II Workshop © Brian S. Yandell 93 Brassica Bayes Factors •compare models for 1, 2, 3 QTL •Bayes factor and -2log(LR) •large value favors first model •8-week vernalization only here June 2001 i vs. j Bayes Factor -log(LR) 2 vs. 1 2.49 7.82 3 vs. 1 .005 7.41 3 vs. 2 .002 4.17 NCSU QTL II Workshop © Brian S. Yandell 94 Computing Bayes Factors • arithmetic mean – using samples from prior – mean across Monte Carlo or MCMC runs – can be inefficient if prior differs from posterior (y | modelk ) (y | k ; modelk ) ( k | modelk )d k • harmonic mean – using samples from posterior – more efficient but less stable – careful choice of weight h() close to posterior h( k ) ˆ (y | modelk ) G ( y | ; model ) ( | model ) k k k k g 1 G June 2001 NCSU QTL II Workshop © Brian S. Yandell 1 95 June 2001 NCSU QTL II Workshop © Brian S. Yandell 96 Part V: How many QTLs? • Reversible Jump MCMC – basic idea of Green(1995) – model selection in regression • how many QTLs? – number of QTL is random – estimate the number m • RJ-MCMC vs. Bayes factors • other similar ideas June 2001 NCSU QTL II Workshop © Brian S. Yandell 97 Jumping the Number of QTL • model changes with number of QTL – almost analogous to stepwise regression – use reversible jump MCMC to change number • book keeping helps in comparing models • change of variables between models • prior on number of QTL – uniform over some range – Poisson with prior mean me ( m | ) m! June 2001 NCSU QTL II Workshop © Brian S. Yandell 98 Posterior: Number of QTL • posterior = likelihood * prior / constant • posterior( paramaters | data ) prob( genos, effects, loci, m | traits, map ) (X* ; , b* , 2 ; , m | y) is proportional to n * * 2 ( y | x ; , b , ; m) j j j 1 n 2 * * (m) ( ) ( ) (br ) (r ) ( x jr | r ) r 1 j 1 m June 2001 NCSU QTL II Workshop © Brian S. Yandell 99 Reversible Jump Choices action step: draw one of three choices • update step with probability 1-b(m+1)-d(m) – update current model – loci, effects, genotypes as before • add a locus with probability b(m+1) – propose a new locus – innovate effect and genotypes at new locus – decide whether to accept the “birth” of new locus • drop a locus with probability d(m) – pick one of existing loci to drop – decide whether to accept the “death” of locus June 2001 NCSU QTL II Workshop © Brian S. Yandell 100 Markov chain for number m • add a new locus • drop a locus • update current model 0 June 2001 1 ... m-1 m m NCSU QTL II Workshop © Brian S. Yandell m+1 101 number of QTL distance (cM) Jumping QTL number & loci 112222111112222233333333222222222112222223 222211221 211111223 3 3 2 33332 1 22211 1 1 3 2 1 1 111 333 11122222211111 60 2 1111111 1 1 1 2 1 2 1 1 2222221 40 222222222222222222222221 111111111 111111 20 111111111111111111111111 11 111 11111 1 80 0 20 40 60 MCMC run 80 100 0 20 40 80 100 3 2 1 0 June 2001 60 NCSU QTL II Workshop © Brian S. Yandell 102 RJ-MCMC Updates 1-b(m+1)-d(m) add locus b(m+1) genos loci d(m) effects traits drop locus June 2001 NCSU QTL II Workshop © Brian S. Yandell 103 Propose to Add a locus • propose a new locus – similar proposal to ordinary update • uniform chance over genome • easier to avoid interval with another QTL qb ( ) 1 / D – need genotypes at locus & model effect • innovate effect & genotypes at new locus – draw genotypes based on recombination (prior) • no dependence on trait model yet – draw effect as in Green’s reversible jump • adjust for collinearity • modify other parameters accordingly • check acceptance ... June 2001 NCSU QTL II Workshop © Brian S. Yandell 104 Propose to Drop a locus • choose an existing locus qd (r; m) 1 / m – equal weight for all loci ? – more weight to loci with small effects? • “drop” effect & genotypes at old locus – adjust effects at other loci for collinearity – this is reverse jump of Green (1995) • check acceptance … – do not drop locus, effects & genotypes – until move is accepted June 2001 NCSU QTL II Workshop © Brian S. Yandell 105 Acceptance of Reversible Jump • accept birth of new locus with probability min(1,A) • accept death of old locus with probability min(1,1/A) ( m1 , m 1 | y) d (m 1) qb (m 1 ) 1 A ( m , m | y) b(m) qd (r ; m 1) J m X * ; , b * , 2 ; , m June 2001 NCSU QTL II Workshop © Brian S. Yandell 106 Acceptance of Reversible Jump • move probabilities m m+1 • birth & death proposals • Jacobian between models –fudge factor –see stepwise regression example June 2001 d (m 1) b( m) NCSU QTL II Workshop © Brian S. Yandell qb (m 1 ) qd (r ; m 1) J sr |others n 107 number of QTL 0 1 2 3 4 5 6 0 1 2 3 4 5 6 RJ-MCMC: Number of QTL 0 200 400 600 800 MCMC run/100 June 2001 0 200 400 600 800 MCMC run/10 0 200 400 600 800 MCMC run/1000 number of QTL 0 1 2 3 4 5 6 200 400 600 800 MCMC run 0 1 2 3 4 5 6 0 NCSU QTL II Workshop © Brian S. Yandell 108 Posterior # QTL for 8-week Data 0.0 0.2 0.4 98% credible region for m: (1,3) based on 1 million steps with prior mean of 3 0 June 2001 1 2 3 4 NCSU QTL II Workshop © Brian S. Yandell 5 6 109 June 2001 NCSU QTL II Workshop © Brian S. Yandell 110 VI: Reversible Jump Details • reversible jump MCMC details – can update model with m QTL – have basic idea of jumping models – now: careful bookkeeping between models • RJ-MCMC & Bayes factors – Bayes factors from RJ-MCMC chain – components of Bayes factors June 2001 NCSU QTL II Workshop © Brian S. Yandell 111 RJ-MCMC Updates 1-b(m+1)-d(m) add locus b(m+1) genos loci d(m) effects traits drop locus June 2001 NCSU QTL II Workshop © Brian S. Yandell 112 Reversible Jump Idea • expand idea of MCMC to compare models • adjust for parameters in different models – augment smaller model with innovations – constraints on larger model • calculus “change of variables” is key – add or drop parameter(s) – carefully compute the Jacobian • consider stepwise regression – Mallick (1995) & Green (1995) – efficient calculation with Hausholder decomposition June 2001 NCSU QTL II Workshop © Brian S. Yandell 113 Model Selection in Regression • known regressors (e.g. markers) – models with 1 or 2 regressors • jump between models – centering regressors simplifies calculations m 1 : y j b ( x j 1 x1 ) e j m 2 : y j b1 ( x j 1 x1 ) b2 ( x j 2 x 2 ) e j June 2001 NCSU QTL II Workshop © Brian S. Yandell 114 Slope Estimate for 1 Regressor recall least squares estimate of slope note relation of slope to correlation n bˆ r1 y s y s1 , r1 y (x j 1 j1 x1 )( y j y ) / n s1 s y n n j 1 j 1 s12 ( x j1 x1 ) 2 / n, s y2 ( y j y ) 2 / n June 2001 NCSU QTL II Workshop © Brian S. Yandell 115 2 Correlated Regressors slopes adjusted for other regressors (r1 y r12r2 y ) s y ˆ r2 y s y r12 s2 ˆ b1 b c21, c21 s1 s2 s1 (r2 y r12r1 y ) s y ˆ b2 s2 June 2001 NCSU QTL II Workshop © Brian S. Yandell 116 Gibbs Sampler for Model 1 • mean ny 2 2 ~ , n 22 n 22 • slope n ( x j1 x1 )( y j ) 2 j 1 b ~ , 2 2 2 2 ns1 2 ns1 2 • variance n 2 n 1 ~ Inv a 2 , v 2 y j b( x j1 x1 ) j 1 2 June 2001 2 NCSU QTL II Workshop © Brian S. Yandell 117 Gibbs Sampler for Model 2 ny 2 2 ~ , n 22 n 22 2 • mean • slopes n ( x j 2 x2 )( y j b1 ( x j1 x1 )) 2 j 1 b2 ~ , 2 2 2 2 ns2|1 2 ns2|1 2 n s ( x j 2 x2 c21 ( x j1 x1 )) 2 / n 2 2|1 • variance June 2001 j 1 2 n 2 2 n 1 ~ Inv a 2 , v 2 y j bk ( x jk xk ) j 1 k 1 NCSU QTL II Workshop © Brian S. Yandell 118 Updates from 2->1 • drop 2nd regressor • adjust other regressor b b1 b2c21 b2 0 June 2001 NCSU QTL II Workshop © Brian S. Yandell 119 Updates from 1->2 • add 2nd slope, adjusting for collinearity • adjust other slope & variance z ~ (0,1), J s21 n n b2 bˆ2 z J , bˆ2 (x j 1 j2 x2 ) y j ˆ bˆ1 ( x j1 x1 ) ns221 b1 b b2 c21 b z c21J bˆ2 c21 June 2001 NCSU QTL II Workshop © Brian S. Yandell 120 Model Selection in Regression • known regressors (e.g. markers) – models with 1 or 2 regressors • jump between models – augment with new innovation z m parameters innovations 1 2 ( , b, 2 ; z ) z ~ (0,1) 2 1 ( , b1 , b2 , ) 2 June 2001 transformations b2 bˆ2 z J b1 b b2 c21 b b1 b2 c21 z0 NCSU QTL II Workshop © Brian S. Yandell 121 Change of Variables • change variables from model 1 to model 2 • calculus issues for integration – need to formally account for change of variables – infinitessimal steps in integration (db) – involves partial derivatives (next page) b1 1 c21 J J b2 0 (b , b 1 June 2001 2 b c21 z g (b; z | y, x 1 , x 2 ) 1 ˆ b2 | y, x 1 , x 2 )db1db2 (b; z | y, x 1 , x 2 ) Jdbdz NCSU QTL II Workshop © Brian S. Yandell 122 Jacobian & the Calculus • Jacobian sorts out change of variables – careful: easy to mess up here! g (b; z ) 1 c21 J g (b; z ) (b1 , b2 ), 0 J bz 1 c21 J 1 J 0 (c21 J ) J det 0 J g ( , b, 2 ; z ) db1db2 Jdbdz db1db2 det bz June 2001 NCSU QTL II Workshop © Brian S. Yandell 123 Geometry of Reversible Jump 0.6 0.6 0.8 Reversible Jump Sequence 0.8 Move Between Models b2 0.2 0.4 b2 0.2 0.4 c21 = 0.7 0.0 0.0 m=2 m=1 0.0 June 2001 0.2 0.4 b1 0.6 0.8 0.0 NCSU QTL II Workshop © Brian S. Yandell 0.2 0.4 b1 0.6 0.8 124 QT additive Reversible Jump first 1000 with m<3 0.0 0.0 0.05 b2 0.1 0.2 b2 0.10 0.3 0.15 0.4 a short sequence 0.05 June 2001 0.10 b1 0.15 -0.3 -0.2 -0.1 0.0 0.1 0.2 b1 NCSU QTL II Workshop © Brian S. Yandell 125 0.0 -0.1 regression line corresponds to slope of updates b2 0.1 90% & 95% sets based on normal 0.2 0.3 Credible Set for additive -0.1 June 2001 NCSU QTL II Workshop © Brian S. Yandell 0.0 b1 0.1 0.2 126 Multivariate Updating of effects • more computations when m > 2 • avoid matrix inverse – Cholesky decomposition of matrix • simultaneous updates – effects at all loci • accept new locus based on – sampled new genos at locus – sampled new effects at all loci June 2001 NCSU QTL II Workshop © Brian S. Yandell 127 Multivariate Birth Acceptance •sample effects around least squares estimator •acceptance of locus involves effects update •can combine with long distance locus updates (y | X ; m 1 ) ( m 1 ) / q ( m 1 ) A ( y | X ; m ) ( m ) / q ( m ) * m 1 * m ( , b ) ~ Normal ( , D) ˆ , X T X 2 D-1 ) | X, y, 2 ~ Normal ( * June 2001 NCSU QTL II Workshop © Brian S. Yandell * 128 VII: RJMCMC & Bayes Factors • RJ-MCMC & Bayes factors – Bayes factors from RJ-MCMC chain – components of Bayes factors June 2001 NCSU QTL II Workshop © Brian S. Yandell 129 How To Infer loci? • if m is known, use fixed MCMC – histogram of loci – issue of bump hunting • combining loci estimates in RJ-MCMC – some steps are from wrong model • too few loci (bias) • too many loci (variance/identifiability) – condition on number of loci • subsets of Markov chain June 2001 NCSU QTL II Workshop © Brian S. Yandell 130 0 200 400 600 800 1000 80 distance (cM) 40 60 222222222222 2 22222222222222222222222 2222 22222222 22222222222222222222 22 2222222 22222 2 2222 2222 22 222222 2 2222222 22222 222222 222222 22 2222222 2 2222222 2222 2222 22222222222 22 22 2 2 2 22 2222 22 2 22 2222 222222 222222 222222 22 2222 22 22 222 22 22 22 22 222 22222222222222222 21 22222 2222122 2222 2 2222 222222 2 222222 222222 2 2 2 2 2 2 2 2 2 222 2 2 2 2 2 2 2 2 2 222 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2222 2 222 22222 2 22 1 2 2222 22 1222222222222 2222 222 1222221 2222212222222 22222212 2222 2 22212 222 222222 1 2222 222 1222222222 1 222222222222 1 2 222 1 2222 2 222221 22122 22222221 11 1 22212 212 1211222221222222 2221222222 2211 2 1 1 2 2 1 1 2 2 2 1 2 2 2 2 2 1 2 2 2 1 2 2 1 2 2 2 2 1 2 2 2 12 1 122 212 122 112 21 221 2 2 1 21 1 1211 21 221 2 1 2 2211 21 122121 12 1 2 2 11 1 1 22 2 2 1 1 1 1 1 1 1 21 1 1 1 11 12 11 111 112111 1 11111 12 1 1111 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111111 211 11 11 1 11 1 1 11 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1 1111 1111 11 11 1 1 11 111 11 1111 1 1 1 11 1111111 1 1 11 11 11 1 11 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11111 111 111 1 111111 1 1 11 111111 111 1 1111111 111 111 1 1111111111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1111 1 111 1 1 11 1 1 1 1 11 11111 1 11 1 11 11111 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 111111 11 1111111111111 11 1 11 111111 111 11 1 111 1111111 1111111 11111 1111 1 111 1 11 1 111 1 111111 1111111111 1 1 1 1 11 11 1 11111 1 1 11111111 111 111111 1 11111 1111 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 11 1111 1 1 1 1 1 1 1 111 111 1 1 1111 1 1 1 11 1 1 1111 11 1 1 1 1 11 11 1111111 1 11 1111 1 111 111 111111 11111111 1 111 1111 111111 11 1111111 11 1 11111111111 11111111111 1 11 111 1 1 1 1 1 11 1 1 1111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1111 1111 111 1 11 1 1 11 1111 1 111111111 1111111111 111 1 111111 11111 111 111 11 1 11 1 1111111 1 1 1 1 1 11 1 1 11 1 1 1 1 1 11 1 1 1 1 1 11 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 11 1 1 1 20 20 40 60 80 Brassica 8-week Data locus MCMC with m=2 0 50 MCMC run June 2001 100 150 200 250 300 frequency NCSU QTL II Workshop © Brian S. Yandell 131 number of QTL distance (cM) Jumping QTL number & loci 112222111112222233333333222222222112222223 222211221 211111223 3 3 2 33332 1 22211 1 1 3 2 1 1 111 333 11122222211111 60 2 1111111 1 1 1 2 1 2 1 1 2222221 40 222222222222222222222221 111111111 111111 20 111111111111111111111111 11 111 11111 1 80 0 20 40 60 MCMC run 80 100 0 20 40 80 100 3 2 1 0 June 2001 60 NCSU QTL II Workshop © Brian S. Yandell 132 RJ-MCMC loci chain 60 40 20 0 4 3 2 3 242 2 3 32 1 21322 3 1122 6 544 3 2 22 3 4 21 2114 321 5 222 3 3 123 33 22 5 323 2 1 4 54 3 4 5 32 32 13 2 2 3 2211 2 5 4 3 21 1 2 234 322 1 211 12 3 2 3 34 56 12 3 6 5 3 4 233 5 2 3321 23 1 21 23 31 43 4 22 22 2 2 22 2 5 3 2 5 1 2321 44 23 12 2 11 33 23 3224 32 22 225 2 3 3 2 4 232 21212 3 2 1 3 2 111 4 5 3 1 1 2 2 2 1 4 2 33 4 24 23 2 1 21 33 2 4 4 2 23 4 42 3 2 12 32 2 22 2 3 31 2 3 2 223 2 1 3 43 2 1 1 12 2 1 2 2 3 1 2 2 3 2 1 2 3 3 2 3 2 3 2 2 3 2 2 3 2 2 2 3 3 2 1 3 4 2 1 1 122 211 22 1 3 22 1 4 2 22 23 1211 1 21 2 3 1 1 112 2 3 2 111 23 433 2 2 2222 3 1 31 1 2 1 2 1221 1 2 2 1 22 2 12 11 11 233 2 1 2 2 1 112 53 12 4 1 2121 1 31 12 2 1 1 2 1 1 1 1 2 1 1 2 1 3 22 4 1 21 1 11 1 122 11212 2 21 223 2 1 1 3 1 2 1 1 1 1 1 2 2 1 1 4 1 222321 1 212 3 1 1112 222 121 3 2 2 111 1 3 3 2 21 2 34 1 2 12222 1111 111 4 3 1 12 31 221 2 2 4 2 1 4 23 1 221331 11 33 2 11 3 11 22 1 11 2 3 2 12 11 211211 3 22 111 1 1 3 2 2 1 1 2 1 2 2 1 1 1 3 2 1 11 2 1 2 1 12 1 2 2 3 1 1 1 1 2 3 1 2 1 3 2 3 1 2 2 1111 11 12 1 11 2 1 1111 2 11 1 1 11 1 2 33 1 2 11 1 1 2 2 1 11 1 1 1 1 2 2 1 2 1 111 1111 1 1 22 111 2 11 1 1 22 1 1 11121 1 11 1 122 11 1 11 1 1 1 1 111 1 1 1 1 1 1 2 1 11 2221 1 1 11 1 1 1 11 1 1 1 1 11121 11 11 11 1 11122 1 1 1 1 1111111 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 11 1 1 2211 11 1 11 1 1111 0 3323 2 3 22 13 3 22 32 4 122323 2 23 3 3 2 3 332 2 1 1 3 21 2 2 3 214 4 12 43 233 51 1 4 2 3323 22112 23 2123 3211 22131423 4 12 15 22 1 4 4 423 2 25 3 24 4 31 4 1 21 2215 2223 223 5 2 22 23 22 3 2323 3 32223 22 5 3 2335 322 3 222 22 1 2 number of QTL 80 0 2 323 24 23323 1 122311 24211432143 232 111213211 3213253 422112131 21 331 3222 23 1211222 14 1 222 11113122 12411 1323 2213232 222 3 321222 12 321 32 3 11111 22115 14321212 3 2 4 1 2 2 2 3 1 2 2 1 512 32 2 2 125 3 2 1 4 1 3 2 3 3 3 3 3 2 2 3 2 2 2 1 3 2 2 2 1 3 2 2 1 2 2 2 3 2 1 4 2 1 3 2 1 2 1 4 3 1 2 1 1 1 3 2 2 223 3 1 2 3 2 1 2 1 2 1 1 2 1 2 3 1 2 2 2 3 1 1 3 2 2 3 3 1 2 3232 2 11 212113232 22111 223 231211 21 3 22 1 211 1 11 1 2222 2 222 2 1 3 1 221222113 2 3 1 4 1 2 2 2 1 3 22 2 2114 121 31 222 21 3 2112211111 2114 2121 2221 2 2 12 2 2111 221213 22121 2 1 122 1211121 1132111222112213 32 11221 2 2 1 1 2 13 3 2 3 1 2 12 22421132 221 2 1 1 2 2 2 1 2 2 1 1 3 2 1 2 1 1 2 2 1 2 3 1 1 1 1 2 1 1 2 2 1 1 2 1 2 3 2 3 1 1 1 2 3 3 2 3 2 2 2 1 2 1 1 1 2 1 2 1111111111 131211 12112 2131 11 1 3 331 2221132 22 1 12 32 2 1 22 3 22 1 2 22 111 1111211 311121 112 2 3 1 2 11122 1 11 3 222 1 1 21 1211 1112221 22 224 11122 22 12 122 211113 12 321212 12 22 21 11132 322 1 11 2121 2 3 22111321 2111 22 111111 133 2 1211122 12 1 11 1242 11112 1 1 111 1 3 1 1 1 1 2 2 1 2 2 2 1 2 2 2 1 2 2 1 2 2 1 1 1 2 1 1 2 1 2 2 2 1 1 3 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1 2 1 1 2 2 2 2 1 1 1 1 11112 1112 21 121322212 122 1 111 211122111 2 22211121 111 121 1111112 31 2 11 11231 1 212121 212212111222222 2 1 1111112112 111121212121 11 2 1 2 11121221 2 1 1 2 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1111 12 111 13112 12 11 11111131 13 11211121 1121111211 1 1 11 12 11 11 1111 1 221 11 1111111 11 11113111 111121 11 111 11121111111112 11 1111 1 11 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1111111 1 1113111 11211 1 1111 1 11 121 211 112 121 21 221 11 1211111 11 121 1 1 1111111 11 2 1 21112 1 1 1 1111 11 11 111 1122 11111112 11 111 1 1 1 1 1 1 1 1 1 1 1 1 2 11111 1111 111 1 1 11 1 1 1 111111 11 1 11111 1 1 11111111 1 111 11 1 1111111 13 11 1111 2 1111 1 12 11 121 111 11 1 1 1 1 111 1 111 1 1 1 1 111 12 11111 1 1 11 1 11111 0 June 2001 20 400 800 MCMC run/100 23 12112 2 33 22332 1 222 232 3 1 3 2 32 3 122 1 11 2321 1 2 411 11222 21 2 32 2 23 2 1 232 33 222 2 1 12342 2321 4 232 3 12 1 1 121 13 34 2 1 1 3 22 121 3241 22 3 32121 2 2321 1 3 1 1 1 1 2 2 1 11 1 1 1 3 2 1 2 3 12 1 2 322 3 221 1 1 1 3 1 13222 2 12 2 1 2 1 1 1 2 1 2 2 1 1 1 3 1 1 3 3 2 1 2 3 1 1 4 2 2 3 2 1 4 2 2 2 2 2 3 1 1 1 1 2 2 2 2 1322 22 2 21 11 113 22 2 1 2 2 1 2 3 2 1 1 2 3 2 12 12 1 2 3 2 2 2 2 1 221 1 12 21 3 31 22 141 2 1 22 31 143 11331 14 31 222 3 11 1222 22122221 2 22 21 2 22 1 1 1 1 4 3 1 2 2 1 4 1 1 2 1 2 3 2 1 1 1 1 3 2 1 4 1 133 1112 1 2 12 11 2 12 14 41 1 3 2 12 2 311 2222123121 2221 112 32 212 12 111 2 1 2 2 1132 2 112 2 2 1 1111422 2 11 1 31 1 3 212 1 21 1 111311121112211 3 12 31 2 2213 2312 11 11 2 1 11 11111 3 211 21 1 3 221 2212 11 1 11 22 211311211 2 11 1 1 1 111 2 1 2 2 2 211 2 1 2 2 2 1 1111121 2 1 1 1 1 2 2 1 2 3 3 1 2 11 2 1 1 1 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 1 1 11 2 1 112 1113 22 2 11 2 211 1 2 2 1 3 1 1 1 2 4 1 12 1 2 1 1 2 1 1 2 3 1 2 2 2 2 1 2 1 2 1 1 1 2 2 2 1 2 2 2 1 1 2 2 1 2 1 3 1 2 2 1 2 2 1 1 2211 2 2 3 1 2 1 1 1 1 2 1 2 21 2 3 1 1 4 1 3 2 1 2 1 2 3 2 2 1 2 1 1 1 1 1 1 2 1 2 1 2 1 1 3 2 2 1 1 1 1 1 2 1 3 2 1 4 2 1 1 2 1 1111 11111 111 2 23 3 2 12 21 312 11 111 2111122211 11 2 11112211 111 2 1 11 22 22111 1 2 2 1 2 112 3 11 2 1112 2 31111113 3 211112 2112 1211 11 11 11111 111 2 2 212 1 11 1111111 111 1 321111112 11 11 11 3 1 1 2 21 1 1 1 2 1 1 3 1 1 1 2 1 1 2 1 1 2 1 1 2 11 11 12 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 2 1 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 11121 1111 1111111111 11 112 1111121 1 1 1 12 2 1 111 21 1111 1 1111112111111 1 211 11 1 11 1 11111 1 21 1 11 12111 112 1 2 1 1 111 111 1 21 1 1 1 2 1 1 1 111 1 1 11 1 11 1 1 21 11 11 22 11111 11 1111111 2 111 1111 11 11 1 111111 11 11 1 1112 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 111111 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1111 1 11 11 1 1 11 1 11 0 400 800 MCMC run/10 22221221321232 2 1241 313 22 23 1 22 22 1 12 3 1232223 44 3 21 21 3 22323 222 332 12 1 312323 2 22 22 3 1 21 221 112 24 23 1 2 12 3 131 31 21 22 33 1 11 11 3 22 2 32113 32 2 112123 2 33 32 321 2121 32 323 3 2222 2 231231 211133 332 1 231 323 24 3 number of QTL 20 40 0 241213222 3 1222333 21 1534123 232 2 3231 63 323 21 232423 33 2 223 232 13 233 2 322 2 43 3224 32 32 2142 2 133 42243 1223423 32212 1 332 21 2 22 322 31 1 21 3 3 1134321422112 24232122 1432 22 21 40 60 400 800 MCMC run 80 322423143223123231232231212132123321233223123422121142221323312311211132232123143323331221334232122312421112223221423422331223322142233314312323222123121521214231323243232132241232233221123242332412311233422241221222312322321321232131322112111233434122112232332221211424213522221231211122321341221211214223432 60 80 232322132122122232124224322113322231221233221212212222121113223324142312223521332123221132321331222233221213223211212313423322121222433122332211232334231245353122124231222232323123133222532112233212124225531212131322323322232112222224313212333411212212143112432131323232223222313321221331322122312334212353232112232231212221 80 321312121331121214211312123223121222123212231232222231133141231213141211211212424122211231222132323223213123112112141122212111223211112322332221131222122141122222121222222232111122121124123221134221212122221322211213122231123122231121112122123322111132121211211212211334311112222122211123222124122111113122111113111223231112 60 40 20 0 1 231 1 2 242 2 1 322221 212233221 221 332 1222122 11 1233 1 12 32 1 321311112133 2122 312312 12 211112 12 2 12 22 2 112211 1 2112 12131 11 2 111 11 122112 212 1 21 2 2 2 3 1 1 3122 31 131311 2 3 2 2 1 1 1 1 1 21 2 2 13 221 1 122122 2 222 3 1 1 2 1 1 1 1 2 1 1 3 2 1 2 1 1 2 2 2 2 3 2 1 1 2 3 1 122 33 1 21 1 1 1 2 2 1 2 1 1 2 3 1 2 1 2 1 2 1 2 2 1 3 21 1 2 1 1 12 2 11 32 1 2 11 2 1 1 2 13 131211 1 2 2 32213 2 2 2 2112 22 1 13 21 3 2 11121 211 24112 1111 2 1 2 2 11 1 2 21 2211 11 111 1111211 123 211 1 121 2211131111 11 2 211 11 3422 1211 111 22121 121112 111 221111 113 13 112 1 21 1 2 2221 11 1 22221 1 21321 12 213 211 112112 1 1 3 312 1121 111 1211 2 1 2 211 211 13 1 2 1 311 2 1 113121 1 1 1 1 1 1 212121121 1 3 1 2221 1 1 1 2 1 1 1 2 1 3 1 1 1 2 2 1 1 2 1 1 1 2 2 1 2 11 1 3 1 2 2 2 2 1 1 2 2 2 1 1 1 2 1 1 1 1 1 2 1 3 2 1 1 2 1 2 1 2 1 1 1 1 2 1 2 1 1 1 3 1 2 221 1 2 3 1 1 1 1 1 122 1 2 2 3 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 2 2 1 2 1 1 2 2 2 1 1 1 2 1 1 2 2222211 22 2 1 1 2 2 1 2 1 2 2 2 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 2 3 1 1 1 1 1 1 1221131 1 2 1111 11 111211111211 21 1 2 111212121 21 1 221 111 2 2 11 211111 11 11111111 1111 1111 11121121112 122111212 12 111 1111111111121111111211211 12111111 112 11 1 1 1 111 111 11 21111211 1112111 1 1 111 112111211 2111 11111111 11 111111112 1 1 111 111 11111 111 1 1 1 2 1 1 1 1 1 1 1 1 1 11 2112 1 1 1 11 11 1 1111 1211 1 11111 1 1 2 1 1 11 111 11 11 11 1111 21111 11111 1111 11 1 11 1 11 121111 1 1 1 1 11 111 1 11 1 1111111 1 11 1 1111 1 1 1 1 111 1 1 1111 11 1 1111111 111 11 1 1 1 112 11 11 1 1 1 11 0 400 800 MCMC run/1000 NCSU QTL II Workshop © Brian S. Yandell 133 0.0 0.01 0.02 0.03 0.04 Raw Histogram of loci 0 June 2001 20 40 distance (cM) 60 NCSU QTL II Workshop © Brian S. Yandell 80 134 0.0 0.0 m=2 0.02 0.04 m=1 0.02 0.05 Conditional Histograms 40 60 distance (cM) 80 20 40 60 distance (cM) 80 20 40 60 distance (cM) 80 0 20 40 60 distance (cM) 80 0.0 0.0 m=3 0.02 0 June 2001 0 m>3 0.02 0.04 0.04 20 NCSU QTL II Workshop © Brian S. Yandell 135 June 2001 NCSU QTL II Workshop © Brian S. Yandell 136 Bayes Factors •ratio of posterior odds to prior odds – RJ-MCMC gives posterior on number of QTL – prior is Poisson B12 June 2001 ( model1 | y) / ( model2 | y) (y | model1 ) ( model1 ) / ( model2 ) (y | model2 ) BF(1:2) 2log(BF) evidence for 1st <1 <0 negative 1 to 3 0 to 2 negligible 3 to 12 2 to 5 positive 12 to 150 5 to 10 strong > 150 > 10 very strong NCSU QTL II Workshop © Brian S. Yandell 137 0.4 0.2 prior mean = 6 012345678 NCSU QTL II Workshop © Brian S. Yandell 0.0 0.15 0.2 prior mean = 4 012345678 0.0 0.2 0.0 June 2001 prior mean = 3 012345678 0.30 prior mean = 2 012345678 0.0 0.2 0.0 prior mean = 1 012345678 0.4 0.0 0.6 0.4 prior post. 0.4 0.8 #QTL for Brassica 8-week prior mean = 10 012345678 138 Bayes Factors & LODs • others have tried arithmetic & harmonic mean • why not geometric mean? • terms that are averaged are log likelihoods... G log (y | k ; modelk ) ˆ (y | modelk ) exp g 1 G g 1, , G MCMC runs June 2001 NCSU QTL II Workshop © Brian S. Yandell 139 Bayesian LOD • Bayesian “LOD” computed at each step – – – – based on LR given sampled genotypes and effects can be larger or smaller than profile LOD informal diagnostic of fit combine to for geometric estimates of Bayes factors n LOD ( ) (log 10 e) ln j 1 ˆ* , ˆ 2 ) ( x | ) ˆ ( y | x ; , b j x 1, 0 ,1 ( y j | ˆ , b 0, ˆˆ 2 ) * ( y j | x*j ; , b* , 2 ) BLOD (log 10 e) ln * 2 ( y | , b 0, ) j 1 j n June 2001 NCSU QTL II Workshop © Brian S. Yandell 140 Compare LODs • • • • 200 simulations (only 100 for some) n = 100, 200; h^2 = 25, 30, 40% 2 QTL at 15, 65cM Bayesian vs. CIM-based LODs – Bayesian for simultaneous fit – classical for sum of CIM LODs at peaks • plot symbol is number of CIM peaks June 2001 NCSU QTL II Workshop © Brian S. Yandell 141 Comparing LODs 6 15 10 5 RJ-MCMC mean LOD 10 15 5 0 2 2 32234 2333 3 4 3 212 2 3333243333 23 323232332 2 1 2 3 4 2 2 2 2 3 2 222 332333 4 2223 333 3 2 2 3 3 3 3 2 2 1 1 22 2 3 2 2 3 2 3 32 23 2 2 212 3232 22 22 3 2 2 12 11122 23 3 2 2223 3 5 10 10 20 n = 200, h^2 = 40 4 8 12 QT L Cart sum of LODs June 2001 2 2 2 22222222 3 2 2 2 22222222 2 3 2 2 2 2 2 2 2 32 3 222 2 22222 2 2 2 1222222 2 2 2 5 10 20 QT L Cart sum of LODs NCSU QTL II Workshop © Brian S. Yandell 15 15 5 2 2 2 2 2 222 1 222 2 2 2222222 2 1222 2 2 2 1 2 2 2 1 2 2222 3 1 121 22222 1 2 2 2122222222 3 2 2 122 122 1 121121 2 1 2 222322222323 2 1 1 1 1 2222222 1 1 2 25 n = 200, h^2 = 30 RJ-MCMC mean LOD n = 200, h^2 = 25 25 QT L Cart sum of LODs RJ-MCMC mean LOD QT L Cart sum of LODs 20 QT L Cart sum of LODs 5 10 RJ-MCMC mean LOD 2 22 2 3 1 2222 2222 3333 222 2 2 2 22 112121 22 2221 22232 3 23 22 2 2 1 1 2 2 2 1 2 2 2 1 222 111121 222 22 1 11111222 2 1 1 22 1 2 8 12 4 0 3 RJ-MCMC mean LOD 12 8 4 0 RJ-MCMC mean LOD 1 11 22 2222 1222 2 1 212 11 2 1 211222222222 11111112 1 2 2 2 2222232223 11212 1111 2222 3 2 1 1 22122 1111 1 222222232 1 1 2 1 1 0 0 11 222 0 n = 100, h^2 = 40 n = 100, h^2 = 30 n = 100, h^2 = 25 2 3 2 2222 2 2 22 22222222 2 3 3 2223223 33 2 2 2 2222 2323 22 23 22 222 22223 3 22 2 2 22 2222 3 2 2 2 4 2 10 20 30 40 QT L Cart sum of LODs 142 VIII: RJ-MCMC Software • General MCMC software – U Bristol links • http://www.stats.bris.ac.uk/MCMC/pages/links.html – BUGS (Bayesian inference Using Gibbs Sampling) • http://www.mrc-bsu.cam.ac.uk/bugs/ • Our MCMC software for QTLs – C code using LAPACK • http://www.stat.wisc.edu/~yandell/statgen – coming soon: • perl preprocessing (to/from QtlCart format) • R post processing June 2001 NCSU QTL II Workshop © Brian S. Yandell 143 The Art of MCMC • convergence issues – burn-in period & when to stop • proper mixing of the chain – smart proposals & smart updates • frequentist approach – simulated annealing: reaching the peak – simulated tempering: heating & cooling the chain • Bayesian approach – influence of priors on posterior – Rao-Blackwell smoothing • bump-hunting for mixtures (e.g. QTL) June 2001 NCSU QTL II Workshop © Brian S. Yandell 144 Bayes Factor References • MA Newton & AE Raftery (1994) “Approximate Bayesian inference with the weighted likelihood bootstrap”, J Royal Statist Soc B 56: 3-48. • RE Kass & AE Raftery (1995) “Bayes factors”, J Amer Statist Assoc 90: 773-795. • JM Satagopan, MA Newton & AE Rafter (2000) “On the harmonic mean estimator of marginal probability”, ms in prep, mailto:satago@biosta.mskcc.org. June 2001 NCSU QTL II Workshop © Brian S. Yandell 145 Reversible Jump MCMC References • PJ Green (1995) “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination”, Biometrika 82: 711-732. • S Richardson & PJ Green (1997) “On Bayesian analysis of mixture with an unknown of components”, J Royal Statist Soc B 59: 731-792. • BK Mallick (1995) “Bayesian curve estimation by polynomials of random order”, TR 95-19, Math Dept, Imperial College London. • L Kuo & B Mallick (1996) “Bayesian variable selection for regression models”, ASA Proc Section on Bayesian Statistical Science, 170-175. June 2001 NCSU QTL II Workshop © Brian S. Yandell 146 QTL Reversible Jump MCMC: Inbred Lines • JM Satagopan & BS Yandell (1996) “Estimating the number of quantitative trait loci via Bayesian model determination”, Proc JSM Biometrics Section. • DA Stephens & RD Fisch (1998) “Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo”, Biometrics 54: 1334-1347. • MJ Sillanpaa & E Arjas (1998) “Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data”, Genetics 148: 1373-1388. • R Waagepetersen & D Sorensen (1999) “Understanding reversible jump MCMC”, mailto:sorensen@inet.uni2.dk. • PJ Gaffney (2001) PhD Thesis, UW-Madison June 2001 NCSU QTL II Workshop © Brian S. Yandell 147 QTL Reversible Jump MCMC: Pedigrees • S Heath (1997) “Markov chain Monte Carlo segregation and linkage analysis for oligenic models”, Am J Hum Genet 61: 748-760. • I Hoeschele, P Uimari , FE Grignola, Q Zhang & KM Gage (1997) “Advances in statistical methods to map quantitative trait loci in outbred populations”, Genetics 147:1445-1457. • P Uimari and I Hoeschele (1997) “Mapping linked quantitative trait loci using Bayesian analysis and Markov chain Monte Carlo algorithms”, Genetics 146: 735-743. • MJ Sillanpaa & E Arjas (1999) “Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data”, Genetics 151, 1605-1619. June 2001 NCSU QTL II Workshop © Brian S. Yandell 148 Bayes & MCMC References • CJ Geyer (1992) “Practical Markov chain Monte Carlo”, Statistical Science 7: 473-511 • L Tierney (1994) “Markov Chains for exploring posterior distributions”, The Annals of Statistics 22: 1701-1728 (with disc:1728-1762). • A Gelman, J Carlin, H Stern & D Rubin (1995) Bayesian Data Analysis, CRC/Chapman & Hall. • BP Carlin & TA Louis (1996) Bayes and Empirical Bayes Methods for Data Analysis, CRC/Chapman & Hall. • WR Gilks, S Richardson, & DJ Spiegelhalter (Ed 1996) Markov Chain Monte Carlo in Practice, CRC/Chapman & Hall. June 2001 NCSU QTL II Workshop © Brian S. Yandell 149 QTL References • D Thomas & V Cortessis (1992) “A Gibbs sampling approach to linkage analysis”, Hum. Hered. 42: 63-76. • I Hoeschele & P vanRanden (1993) “Bayesian analysis of linkage between genetic markers and quantitative trait loci. I. Prior knowledge”, Theor. Appl. Genet. 85:953-960. • I Hoeschele & P vanRanden (1993) “Bayesian analysis of linkage between genetic markers and quantitative trait loci. II. Combining prior knowledge with experimental evidence”, Theor. Appl. Genet. 85:946-952. • SW Guo & EA Thompson (1994) “Monte Carlo estimation of mixed models for large complex pedigrees”, Biometrics 50: 417-432. • JM Satagopan, BS Yandell, MA Newton & TC Osborn (1996) “A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo”, Genetics 144: 805-816. June 2001 NCSU QTL II Workshop © Brian S. Yandell 150