Bayesian Model Selection for Quantitative Trait Loci with Markov chain Monte Carlo in Experimental Crosses Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell/statgen↑ Jackson Laboratory, September 2004 September 2004 Jax Workshop © Brian S. Yandell 1 outline 1. What is the goal of QTL study? 2. Bayesian priors & posteriors 3. Model search using MCMC • • • Gibbs sampler and Metropolis-Hastings Reversible jump MCMC Fully MCMC approach (loci indicators) 4. Model assessment • • • Bayes factors model selection diagnostics simulation and Brassica napus example September 2004 Jax Workshop © Brian S. Yandell 2 1. what is the goal of QTL study? • uncover underlying biochemistry – – – – identify how networks function, break down find useful candidates for (medical) intervention epistasis may play key role statistical goal: maximize number of correctly identified QTL • basic science/evolution – – – – how is the genome organized? identify units of natural selection additive effects may be most important (Wright/Fisher debate) statistical goal: maximize number of correctly identified QTL • select “elite” individuals – predict phenotype (breeding value) using suite of characteristics (phenotypes) translated into a few QTL – statistical goal: mimimize prediction error September 2004 Jax Workshop © Brian S. Yandell 3 advantages of multiple QTL approach • improve statistical power, precision – increase number of QTL that can be detected – better estimates of loci and effects: less bias, smaller intervals • improve inference of complex genetic architecture – infer number of QTL and their pattern across chromosomes – construct “good” estimates of effects • gene action (additive, dominance) and epistatic interactions – assess relative contributions of different QTL • improve estimates of genotypic values – want less bias (more accurate) and smaller variance (more precise) – balance in mean squared error = MSE = (bias)2 + variance • always a compromise… September 2004 Jax Workshop © Brian S. Yandell 4 why worry about multiple QTL? • many, many QTL may affect most any trait – how many QTL are detectable with these data? • limits to useful detection (Bernardo 2000) • depends on sample size, heritability, environmental variation – consider probability that a QTL is in the model • avoid sharp in/out dichotomy • major QTL usually selected, minor QTL sampled infrequently • build M = model = genetic architecture into model – – – – M = {loci 1,2,…,m, plus interactions 12,13,…} directly allow uncertainty in genetic architecture model selection over number of QTL, genetic architecture use Bayes factors and model averaging • to identify “better” models September 2004 Jax Workshop © Brian S. Yandell 5 minor QTL polygenes 1 2 major QTL 0 3 additive effect major QTL on linkage map 3 Pareto diagram of QTL effects 2 1 September 2004 0 4 5 5 10 15 20 25 30 rank order of QTL Jax Workshop © Brian S. Yandell 6 interval mapping basics • observed measurements – Y = phenotypic trait – X = markers & linkage map observed • i = individual index 1,…,n • missing • alleles QQ, Qq, or qq at locus • Q unknown quantities – = QT locus (or loci) – = phenotype model parameters – m = number of QTL unknown pr(Q|X,,m) genotype model – grounded by linkage map, experimental cross – recombination yields multinomial for Q given X • Y missing data – missing marker data – Q = QT genotypes • X after Sen Churchill (2001) pr(Y|Q,,m) phenotype model – distribution shape (assumed normal here) – unknown parameters (could be non-parametric) September 2004 Jax Workshop © Brian S. Yandell 7 2. Bayesian priors for QTL • genomic region = locus – may be uniform over genome – pr( | X ) = 1 / length of genome – or may be restricted based on prior studies • missing genotypes Q – depends on marker map and locus for QTL – pr( Q | X, ) – genotype (recombination) model is formally a prior • genotypic means and variance = ( Gq, 2 ) – pr( ) = pr( Gq | 2 ) pr(2 ) – use conjugate priors for normal phenotype • pr( Gq | 2 ) = normal • pr(2 ) = inverse chi-square September 2004 Jax Workshop © Brian S. Yandell 8 Bayesian model posterior • augment data (Y,X) with unknowns Q • study unknowns (,,Q) given data (Y,X) – properties of posterior pr(,,Q | Y, X ) • sample from posterior in some clever way – multiple imputation or MCMC pr (Y | Q, )pr (Q | X , )pr ( )pr ( | X ) pr ( , , Q | Y , X ) pr (Y | X ) pr ( , | Y , X ) sum Q pr ( , , Q | Y , X ) September 2004 Jax Workshop © Brian S. Yandell 9 genotype prior model: pr(Q|X,) • locus is distance along linkage map – map assumed known from earlier study – identifies flanking marker interval • use flanking markers to approximate prior on Q – slight inaccuracy by ignoring multipoint map function – use next flanking markers if missing data pr(Q|X,) = pr(geno | map, locus) pr(geno | flanking markers, locus) Xk September 2004 Q? X k 1 Jax Workshop © Brian S. Yandell 10 how does phenotype Y improve posterior for genotype Q? D4Mit41 D4Mit214 what are probabilities for genotype Q between markers? 120 bp 110 recombinants AA:AB 100 all 1:1 if ignore Y and if we use Y? 90 AA AA AB AA AA AB AB AB Genotype September 2004 Jax Workshop © Brian S. Yandell 11 posterior on QTL genotypes • full conditional of Q given data, parameters – proportional to prior pr(Q | Xi, ) • weight toward Q that agrees with flanking markers – proportional to likelihood pr(Yi|Q,) • weight toward Q so that group mean GQ Yi • phenotype and flanking markers may conflict – posterior recombination balances these two weightsE • E step of EM algorithm pr (Q | X i , ) pr (Yi | Q, ) pr (Q | Yi , X i , , ) pr (Yi | X i , , ) September 2004 Jax Workshop © Brian S. Yandell 12 idealized phenotype model • trait = mean + additive + error • trait = effect_of_geno + error • pr( trait | geno, effects ) Y GQ E Qq pr (Y | Q, ) qq QQ normal( GQ , ) 2 8 September 2004 9 10 11 12 13 Jax Workshop © Brian S. Yandell 14 15 16 17 18 19 20 13 priors & posteriors: normal data large prior variance n large n large n small prior n small prior small prior variance 6 8 10 12 14 16 6 8 y = phenotype values September 2004 Jax Workshop © Brian S. Yandell 10 12 14 16 y = phenotype values 14 priors & posteriors: normal data model environment likelihood prior Yi = + Ei E ~ N( 0, 2 ), 2 known Y ~ N( , 2 ) ~ N( 0, 2 ), known posterior: single individual mean tends to sample mean ~ N( 0 + B1(Y1 – 0), B12) sample of n individuals fudge factor (shrinks to 1) September 2004 2 ~ N BnY (1 Bn ) 0 , Bn n Yi with Y sum n n Bn 1 n 1 Jax Workshop © Brian S. Yandell 15 n large n small prior prior & posteriors: genotypic means Gq 6 qq September 2004 8 10 Qq 12 y = phenotype values Jax Workshop © Brian S. Yandell 14 16 QQ 16 prior & posteriors: genotypic means Gq posterior centered on sample genotypic mean but shrunken slightly toward overall mean is related to heritability prior: posterior: fudge factor: September 2004 Gq ~ N Y , 2 2 Gq ~ N BqYq (1 Bq )Y , Bq n q Yi nq count {Qi q}, Yq sum {i:Qi q} n q Bq nq nq 1 1 Jax Workshop © Brian S. Yandell 17 What if variance 2 is unknown? • sample variance is proportional to chi-square – ns2 / 2 ~ 2 ( n ) – likelihood of sample variance s2 given n, 2 • conjugate prior is inverse chi-square – 2 / 2 ~ 2 ( ) – prior of population variance 2 given , 2 • posterior is weighted average of likelihood and prior – (2+ns2) / 2 ~ 2 ( +n ) – posterior of population variance 2 given n, s2, , 2 • empirical choice of hyper-parameters – 2= s2/3, =6 – E(2 | ,2) = s2/2, Var(2 | ,2) = s4/4 September 2004 Jax Workshop © Brian S. Yandell 18 multiple QTL phenotype model • phenotype affected by genotype & environment pr(Y|Q=q,) ~ N(Gq , 2) Y = GQ + environment • partition genotypic mean into QTL effects Gq = + 1q + . . . + mq Gq = mean + main effects + 12q + . . . + epistatic interactions • general form of QTL effects for model M Gq = + sumj in M jq |M| = number of terms in model M < 2m • can partition prior and posterior into effects jq (details omitted) September 2004 Jax Workshop © Brian S. Yandell 19 prior & posterior on number of QTL • what prior on number of QTL? • push data in discovery process – bad: skeptic revolts! • “answer” depends on “guess” e e 0 September 2004 Jax Workshop © Brian S. Yandell e p u exponential Poisson uniform p p e u u p u u u u u p e p e e p p e 0.00 – good: reflects prior belief prior probability 0.10 0.20 • prior influences posterior 0.30 – uniform over some range – Poisson with prior mean – geometric with prior mean e p e e p u p u p u e u 2 4 6 8 m = number of QTL 20 10 3. QTL Model Search using MCMC • construct Markov chain around posterior – want posterior as stable distribution of Markov chain – in practice, the chain tends toward stable distribution • initial values may have low posterior probability • burn-in period to get chain mixing well • update m-QTL model components from full conditionals – update locus given Q,X (using Metropolis-Hastings step) – update genotypes Q given ,,Y,X (using Gibbs sampler) – update effects given Q,Y (using Gibbs sampler) ( , Q, , m) ~ pr ( , Q, , m | Y , X ) ( , Q, , m)1 ( , Q, , m)2 ( , Q, , m) N September 2004 Jax Workshop © Brian S. Yandell 21 Gibbs sampler idea • two correlated normals (genotypic means in BC) – could draw samples from both together – but easier to sample one at a time • Gibbs sampler: – sample each from its full conditional – pick order of sampling at random – repeat N times GQQ ~ N (0,1); GQq ~ N (0,1) but cor (GQQ , GQq ) ~ N G ,1 GQQ given GQq ~ N GQq ,1 2 GQq given GQQ September 2004 QQ 2 Jax Workshop © Brian S. Yandell 22 Gibbs sampler samples: = 0.6 N = 200 samples 3 -2 1 0 -2 -1 Gibbs: mean 2 2 1 0 -1 Gibbs: mean 1 2 3 2 1 0 Gibbs: mean 2 -1 1 0 -1 -2 -2 Gibbs: mean 1 2 N = 50 samples 2 0 100 150 200 -2 Gibbs: mean 2 -1 0 1 2 3 Gibbs: mean 1 2 3 2 1 0 -2 -2 Gibbs: mean 2 50 Markov chain index 2 1 1 0 -1 3 2 1 0 -1 -2 Gibbs: mean 2 -1 Gibbs: mean 1 0 -2 -1 50 Gibbs: mean 2 40 -2 30 1 20 0 10 Markov chain index -1 0 0 10 20 30 40 Markov chain index September 2004 50 -2 -1 0 1 Gibbs: mean 1 2 0 50 100 150 Markov chain index Jax Workshop © Brian S. Yandell 200 -2 -1 0 1 2 Gibbs: mean 1 23 3 How to sample a locus ? • cannot easily sample from locus full conditional pr( |Y,X,,Q) = pr( | X,Q) = pr( ) pr( Q | X, ) / constant • to explicitly determine constant, must average – over all possible genotypes – over entire map • Gibbs sampler will not work in general – but can use method based on ratios of probabilities – Metropolis-Hastings is extension of Gibbs sampler September 2004 Jax Workshop © Brian S. Yandell 24 Metropolis-Hastings idea – unless too complicated 0.4 f() 0.2 • want to study distribution f() • take Monte Carlo samples • Gibbs sampler: A = 1 f ( * ) g ( * , ) A min 1, * f ( ) g ( , ) 6 8 10 g(–*) -4 September 2004 4 0.4 – accept new value with prob A 2 0.2 • from some distribution g(,*) • Gibbs sampler: g(,*) = f(*) 0 0.0 – current sample value – propose new value * 0.0 • Metropolis-Hastings samples: Jax Workshop © Brian S. Yandell -2 0 2 4 25 0 0.0 0.1 2000 0.2 pr( |Y) 0.3 0.4 mcmc sequence 4000 6000 0.5 8000 0.6 10000 MCMC realization 0 2 4 6 8 10 2 3 4 5 6 7 8 added twist: occasionally propose from whole domain September 2004 Jax Workshop © Brian S. Yandell 26 0 2 4 6 8 Jax Workshop © Brian S. Yandell 800 400 0.0 0.2 0.4 0.6 pr( |Y) 1.0 0.0 pr( |Y) 0 2 4 6 8 0 0 400 800 mcmc sequence 0 2 4 6 8 0 2 4 6 8 2.0 0.0 0.4 0.8 1.2 pr( |Y) 6 4 pr( |Y) 2 0 September 2004 N = 1000 samples narrow g wide g 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 mcmc sequence 150 0 50 150 mcmc sequence N = 200 samples narrow g wide g 0 50 mcmc sequence Metropolis-Hastings samples 0 2 4 6 8 27 sampling multiple loci 0 1 m+1 2 … m L action steps: draw one of three choices • update m-QTL model with probability 1-b(m+1)-d(m) – update current model using full conditionals – sample m QTL loci, effects, and genotypes • add a locus with probability b(m+1) – propose a new locus along genome – innovate new genotypes at locus and phenotype effect – decide whether to accept the “birth” of new locus • drop a locus with probability d(m) – propose dropping one of existing loci – decide whether to accept the “death” of locus September 2004 Jax Workshop © Brian S. Yandell 28 reversible jump MCMC • consider known genotypes Q at 2 known loci – models with 1 or 2 QTL • jump between 1-QTL and 2-QTL models – adjust parameters when model changes – and 1 differ due to collinearity of QTL genotypes m 1 : Y 1Q e m 2 : Y 1Q 2 Q e September 2004 Jax Workshop © Brian S. Yandell 29 geometry of reversible jump 0.6 0.6 0.8 Reversible Jump Sequence 0.8 Move Between Models b2 0.2 0.4 b2 0.2 0.4 c21 = 0.7 0.0 0.0 m=2 m=1 0.0 0.2 September 2004 0.4 1b1 0.6 0.8 0.0 0.2 0.4 b1 0.6 0.8 1 Jax Workshop © Brian S. Yandell 30 geometry allowing Q and to change first 1000 with m<3 0.0 0.0 0.05 b2 0.1 0.2 b2 0.10 0.3 0.15 0.4 a short sequence 0.05 September 2004 0.10 b1 1 0.15 -0.3 -0.2 -0.1 0.0 0.1 0.2 b1 Jax Workshop © Brian S. Yandell 1 31 Gibbs sampler with loci indicators • partition genome into intervals – at most one QTL per interval – interval = marker interval or large chromosome region • use loci indicators in each interval – = 1 if QTL in interval – = 0 if no QTL • Gibbs sampler on loci indicators – still need to adjust genetic effects for collinearity of Q – see work of Nengjun Yi (and earlier work of Ina Hoeschele) Y 1 1Q 2 2 Q e September 2004 Jax Workshop © Brian S. Yandell 32 epistatic interactions • model space issues – 2-QTL interactions only? – Fisher-Cockerham partition vs. tree-structured? – general interactions among multiple QTL • model search issues – epistasis between significant QTL • check all possible pairs when QTL included? • allow higher order epistasis? – epistasis with non-significant QTL • • whole genome paired with each significant QTL? • pairs of non-significant QTL? Yi Xu (2000) Genetics; Yi, Xu, Allison (2003) Genetics; Yi (2004) September 2004 Jax Workshop © Brian S. Yandell 33 limits of epistatic inference • power to detect effects – epistatic model size grows exponentially • |M| = 3m for general interactions – power depends on ratio of n to model size • want n / |M| to be fairly large (say > 5) • n = 100, m = 3, n / |M| ≈ 4 • empty cells mess up adjusted (Type 3) tests – – – – missing q1Q2 / q1Q2 or q1Q2q3 / q1Q2q3 genotype null hypotheses not what you would expect can confound main effects and interactions can bias AA, AD, DA, DD partition September 2004 Jax Workshop © Brian S. Yandell 34 4. Model Assessment • balance model fit against model complexity model fit prediction interpretation parameters smaller model miss key features may be biased easier low variance bigger model fits better no bias more complicated high variance • information criteria: penalize L by model size |M| – compare IC = – 2 log L( M | Y ) + penalty( M ) • Bayes factors: balance posterior by prior choice – compare pr( data Y | model M ) September 2004 Jax Workshop © Brian S. Yandell 35 QTL Bayes factors – general comparison of models – want Bayes factor >> 1 • m = number of QTL – indexes model complexity – genetic architecture also important prior probability 0.10 0.20 • same as LOD test e e e p u exponential Poisson uniform p p u u e u u p u u u p e p e e p p e 0.00 – simple comparison: 1 vs 2 QTL 0.30 • BF = posterior odds / prior odds • BF equivalent to BIC 0 e p e e p u p u p u e u 2 4 6 8 m = number of QTL pr (m|data ) /pr (m) BFm,m 1 pr (m 1|data ) /pr (m 1) September 2004 Jax Workshop © Brian S. Yandell 36 10 Bayes factors to assess models •Bayes factor: which model best supports the data? – ratio of posterior odds to prior odds – ratio of model likelihoods •equivalent to LR statistic when – comparing two nested models – simple hypotheses (e.g. 1 vs 2 QTL) •Bayes Information Criteria (BIC) – Schwartz introduced for model selection in general settings – penalty to balance model size (p = number of parameters) pr ( model 1 | Y ) / pr ( model 2 | Y ) pr (Y | model 1 ) B12 pr ( model 1 ) / pr ( model 2 ) pr (Y | model 2 ) 2 log( B12 ) 2 log( LR) ( p2 p1 ) log( n ) September 2004 Jax Workshop © Brian S. Yandell 37 3 4 BF sensitivity to fixed prior for effects Bayes factors 0.5 1 2 4 3 2 3 4 2 4 3 2 4 3 2 4 3 2 4 3 2 4 3 2 3 4 2 4 3 2 1 1 1 1 1 0.2 3 4 2 4 3 2 1 1 1 1 0.05 1 0.20 1 0.50 2.00 5.00 2 hyper-prior heritability h 4 3 2 1 B45 B34 B23 B12 20.00 50.00 h2 s 2 2 , h fixed jq ~ N 0, |M | September 2004 Jax Workshop © Brian S. Yandell 38 BF insensitivity to random effects prior insensitivity to hyper-prior 1.0 3.0 hyper-prior density 2*Beta(a,b) 0.0 0.5 1.0 1.5 2 hyper-parameter heritability h 3 2 3 2 3 2 3 2 Bayes factors 0.2 0.4 0.6 0.0 density 1.0 2.0 0.25,9.75 0.5,9.5 1,9 2,10 1,3 1,1 2.0 3 2 1 1 1 1 0.05 0.10 0.20 2 Eh 1 3 3 2 2 B34 B23 B12 1 1 0.50 1.00 h2 s 2 h2 , jq ~ N 0, ~ Beta (a, b) |M | 2 September 2004 Jax Workshop © Brian S. Yandell 39 simulations and data studies •increase to detect all 8 loci effect 1 2 3 4 5 6 7 8 11 50 62 107 152 32 54 195 –3 –5 +2 –3 +3 –4 +1 +2 1 1 3 6 6 8 8 9 30 20 8 9 10 11 12 13 Genetic map ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10 0 September 2004 7 number of QTL Chromosome QTL chr 0 – n=500, heritability to 97% posterior 10 frequency in % – (Stephens, Fisch 1998) – n=200, heritability = 50% – detected 3 QTL 40 •simulated F2 intercross, 8 QTL 50 100 Jax Workshop © Brian S. Yandell 150 200 40 loci pattern across genome • notice which chromosomes have persistent loci • best pattern found 42% of the time m 8 9 7 9 9 9 Chromosome 1 2 3 4 2 0 1 0 3 0 1 0 2 0 1 0 2 0 1 0 2 0 1 0 2 0 1 0 September 2004 5 0 0 0 0 0 0 6 2 2 2 2 3 2 7 0 0 0 0 0 0 8 2 2 1 2 2 2 9 1 1 1 1 1 2 10 0 0 0 0 0 0 Jax Workshop © Brian S. Yandell Count of 8000 3371 751 377 218 218 198 41 B. napus 8-week vernalization whole genome study • 108 plants from double haploid – similar genetics to backcross: follow 1 gamete – parents are Major (biennial) and Stellar (annual) • 300 markers across genome – 19 chromosomes – average 6cM between markers • median 3.8cM, max 34cM – 83% markers genotyped • phenotype is days to flowering – after 8 weeks of vernalization (cooling) – Stellar parent requires vernalization to flower • available in R/bim package • Ferreira et al. (1994); Kole et al. (2001); Schranz et al. (2002) September 2004 Jax Workshop © Brian S. Yandell 42 -50000 -40000 -30000 -20000 -10000 0 -50000 -40000 -30000 -20000 -10000 0 -50000 -40000 -30000 -20000 -10000 0 -50000 -40000 -30000 -20000 -10000 0 8 10 6 0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06 mcmc sequence 0.5 herit mcmc sequence 0.1 0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06 mcmc sequence 15 0 5 10 10 LOD 15 20 20 burnin sequence 5 LOD 0.010 0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06 0.3 0.4 burnin sequence 0.2 herit 0.6 0.004 0.006 0.010 envvar 0.014 0.016 burnin sequence 0.0 number of QTL environmental variance h2 = heritability (genetic/total variance) LOD = likelihood envvar 0 2 5 4 nqtl nqtl burnin (sets up chain) mcmc sequence 10 15 20 25 Markov chain Monte Carlo sequence September 2004 burnin sequence Jax Workshop © Brian S. Yandell 0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06 mcmc sequence 43 MCMC sampled loci 120 tg2f12 includes all models 100 40 note concentration on chromosome N2 wg6b2 E35M59.117 wg9c7 E38M50.133 Aca1 E33M62.140 E33M50.183 wg4a4b tg6a12 wg5a5 wg8g1b E32M61.218 ec4e7a wg5b1a wg6b10 E35M62.117 E33M59.59 wg7f3a ec3a8 wg7a11 ec4g7b wg6c6 E35M48.148 wg4f4c E33M49.165 wg3g11 E32M49.73 wg1g8a 20 points jittered for view blue lines at markers isoIdh wg2e12b E33M49.211 ec2d1a MCMC sampled loci 60 80 subset of chromosomes N2, N3, N16 wg3f7c tg2h10b wg1a10 ec2d8a E33M47.338 wg2d11b ec3b12 E38M50.159 E33M49.117 E32M50.409 E35M59.85 slg6 tg5d9b wg4d10 wg7f5b E32M47.159 ec5e12a 0 ec4h7 wg1g10b E38M62.229 E33M49.491 ec2e5a N2 September 2004 Jax Workshop © Brian S. Yandell N3 chromosome wg2a3c wg5a1b ec3e12b N16 44 Bayesian model assessment Bayes factor ratios posterior / prior 0.3 0.2 50 QTL posterior moderate weak 1 0.0 3 number of QTL Bayes factor ratios 1 September 2004 3 5 7 9 model index 11 13 Jax Workshop © Brian S. Yandell 4 3 moderate 1 2 weak 2 2 strong 2 2 3 1 e+01 3 posterior / prior 5 e-01 0.0 0.1 2*2 2:2,12 3:2*2,12 2:2,13 2:2,3 2:2,16 2:2,11 3:2*2,3 2:2,15 4:2*2,3,16 3:2*2,13 2:2,14 0.2 model posterior 0.3 2 5 e+02 pattern posterior evidence suggests 4-5 QTL N2(2-3),N3,N16 11 9 7 5 3 1 11 9 7 5 number of QTL 2 1 2 2 col 1: posterior col 2: Bayes factor note error bars on bf strong 5 0.1 row 1: # QTL row 2: pattern 500 QTL posterior 1 3 5 7 9 model index 11 13 45 Bayesian estimates of loci & effects model averaging: at least 4 QTL loci histogram 0.02 0.04 0.00 histogram of loci blue line is density red lines at estimates 4 0.06 napus8 summaries with pattern 1,1,2,3 and m September 2004 0 50 N2 100 150 N3 200 250 100 150 N3 200 250 N16 additive -0.02 0.02 50 -0.08 estimate additive effects (red circles) grey points sampled from posterior blue line is cubic spline dashed line for 2 SD 0 N2 Jax Workshop © Brian S. Yandell N16 46 0.008 100 0 0.006 50 density pattern: N2(2),N3,N16 col 1: density col 2: boxplots by m envvar 200 0.010 Bayesian model diagnostics 0.004 0.006 0.008 0.010 0.012 4 4 5 6 7 8 9 11 12 envvar conditional on number of QTL 0.5 0.6 0.7 4 4 5 6 7 8 9 11 12 LOD 14 16 18 20 0.12 heritability conditional on number of QTL 0.08 density 0.50 0.30 0.4 0.04 5 10 15 marginal LOD, m September 2004 0.40 heritability 4 3 2 density 1 0 0.3 marginal heritability, m 10 12 but note change with m 0.2 0.00 environmental variance 2 = .008, = .09 heritability h2 = 52% LOD = 16 (highly significant) 0.60 5 marginal envvar, m 20 25 4 Jax Workshop © Brian S. Yandell 4 5 6 7 8 9 11 12 LOD conditional on number of QTL 47 Bayesian software for QTLs • R/bim (Satagopan Yandell 1996; Gaffney 2001) – www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl – www.r-project.org contributed package – version available within WinQTLCart (statgen.ncsu.edu/qtlcart) • Bayesian IM with epistasis (Nengjun Yi, U AB) – separate C++ software (papers with Xu) – plans in progress to incorporate into R/bim • R/qtl (Broman et al. 2003) – biosun01.biostat.jhsph.edu/~kbroman/software – www.r-project.org contributed package • Pseudomarker (Sen Churchill 2002) – www.jax.org/staff/churchill/labsite/software • Bayesian QTL / Multimapper – Sillanpää Arjas (1998) – www.rni.helsinki.fi/~mjs • Stephens & Fisch (email) September 2004 Jax Workshop © Brian S. Yandell 48 R/bim: our software • www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl – R contributed library (www.r-project.org) • library(bim) is cross-compatible with library(qtl) – Bayesian module within WinQTLCart • WinQTLCart output can be processed using R library • Software history – initially designed by JM Satagopan (1996) – major revision and extension by PJ Gaffney (2001) • • • • whole genome multivariate update of effects; long range position updates substantial improvements in speed, efficiency pre-burnin: initial prior number of QTL very large – upgrade (H Wu, PJ Gaffney, CF Jin, BS Yandell 2003) – epistasis in progress (H Wu, BS Yandell, N Yi 2004) September 2004 Jax Workshop © Brian S. Yandell 49 many thanks Michael Newton Daniel Sorensen Daniel Gianola Liang Li Hong Lan Hao Wu Nengjun Yi David Allison Tom Osborn David Butruille Marcio Ferrera Josh Udahl Pablo Quijada Alan Attie Jonathan Stoehr Gary Churchill Jaya Satagopan Fei Zou Patrick Gaffney Chunfang Jin Yang Song Elias Chaibub Neto Xiaodan Wei USDA Hatch, NIH/NIDDK, Jackson Labs September 2004 Jax Workshop © Brian S. Yandell 50