Bayesian Model Selection for Quantitative Trait Loci with Markov chain Monte Carlo

advertisement
Bayesian Model Selection
for Quantitative Trait Loci
with Markov chain Monte Carlo
in Experimental Crosses
Brian S. Yandell
University of Wisconsin-Madison
www.stat.wisc.edu/~yandell/statgen↑
Jackson Laboratory, September 2004
September 2004
Jax Workshop © Brian S. Yandell
1
outline
1. What is the goal of QTL study?
2. Bayesian priors & posteriors
3. Model search using MCMC
•
•
•
Gibbs sampler and Metropolis-Hastings
Reversible jump MCMC
Fully MCMC approach (loci indicators)
4. Model assessment
•
•
•
Bayes factors
model selection diagnostics
simulation and Brassica napus example
September 2004
Jax Workshop © Brian S. Yandell
2
1. what is the goal of QTL study?
• uncover underlying biochemistry
–
–
–
–
identify how networks function, break down
find useful candidates for (medical) intervention
epistasis may play key role
statistical goal: maximize number of correctly identified QTL
• basic science/evolution
–
–
–
–
how is the genome organized?
identify units of natural selection
additive effects may be most important (Wright/Fisher debate)
statistical goal: maximize number of correctly identified QTL
• select “elite” individuals
– predict phenotype (breeding value) using suite of characteristics
(phenotypes) translated into a few QTL
– statistical goal: mimimize prediction error
September 2004
Jax Workshop © Brian S. Yandell
3
advantages of multiple QTL approach
• improve statistical power, precision
– increase number of QTL that can be detected
– better estimates of loci and effects: less bias, smaller intervals
• improve inference of complex genetic architecture
– infer number of QTL and their pattern across chromosomes
– construct “good” estimates of effects
• gene action (additive, dominance) and epistatic interactions
– assess relative contributions of different QTL
• improve estimates of genotypic values
– want less bias (more accurate) and smaller variance (more precise)
– balance in mean squared error = MSE = (bias)2 + variance
• always a compromise…
September 2004
Jax Workshop © Brian S. Yandell
4
why worry about multiple QTL?
• many, many QTL may affect most any trait
– how many QTL are detectable with these data?
• limits to useful detection (Bernardo 2000)
• depends on sample size, heritability, environmental variation
– consider probability that a QTL is in the model
• avoid sharp in/out dichotomy
• major QTL usually selected, minor QTL sampled infrequently
• build M = model = genetic architecture into model
–
–
–
–
M = {loci 1,2,…,m, plus interactions 12,13,…}
directly allow uncertainty in genetic architecture
model selection over number of QTL, genetic architecture
use Bayes factors and model averaging
• to identify “better” models
September 2004
Jax Workshop © Brian S. Yandell
5
minor
QTL
polygenes
1
2
major
QTL
0
3
additive effect
major QTL on
linkage map
3
Pareto diagram of QTL effects
2
1
September 2004
0
4
5
5
10
15
20
25
30
rank order of QTL
Jax Workshop © Brian S. Yandell
6
interval mapping basics
•
observed measurements
– Y = phenotypic trait
– X = markers & linkage map
observed
• i = individual index 1,…,n
•
missing
• alleles QQ, Qq, or qq at locus
•
Q
unknown quantities
–  = QT locus (or loci)
–  = phenotype model parameters
– m = number of QTL
unknown
pr(Q|X,,m) genotype model
– grounded by linkage map, experimental cross
– recombination yields multinomial for Q given X
•
Y
missing data
– missing marker data
– Q = QT genotypes
•
X
after
Sen Churchill (2001)
pr(Y|Q,,m) phenotype model
– distribution shape (assumed normal here)
– unknown parameters  (could be non-parametric)
September 2004
Jax Workshop © Brian S. Yandell
7
2. Bayesian priors for QTL
• genomic region = locus 
– may be uniform over genome
– pr( | X ) = 1 / length of genome
– or may be restricted based on prior studies
• missing genotypes Q
– depends on marker map and locus for QTL
– pr( Q | X,  )
– genotype (recombination) model is formally a prior
• genotypic means and variance  = ( Gq, 2 )
– pr(  ) = pr( Gq | 2 ) pr(2 )
– use conjugate priors for normal phenotype
• pr( Gq | 2 ) = normal
• pr(2 ) = inverse chi-square
September 2004
Jax Workshop © Brian S. Yandell
8
Bayesian model posterior
• augment data (Y,X) with unknowns Q
• study unknowns (,,Q) given data (Y,X)
– properties of posterior pr(,,Q | Y, X )
• sample from posterior in some clever way
– multiple imputation or MCMC
pr (Y | Q, )pr (Q | X ,  )pr ( )pr ( | X )
pr ( ,  , Q | Y , X ) 
pr (Y | X )
pr ( ,  | Y , X )  sum Q pr ( ,  , Q | Y , X )
September 2004
Jax Workshop © Brian S. Yandell
9
genotype prior model: pr(Q|X,)
• locus  is distance along linkage map
– map assumed known from earlier study
–  identifies flanking marker interval
• use flanking markers to approximate prior on Q
– slight inaccuracy by ignoring multipoint map function
– use next flanking markers if missing data
pr(Q|X,) = pr(geno | map, locus) 
pr(geno | flanking markers, locus)

Xk
September 2004
Q?
X k 1
Jax Workshop © Brian S. Yandell
10
how does phenotype Y improve
posterior for genotype Q?
D4Mit41
D4Mit214
what are probabilities
for genotype Q
between markers?
120
bp
110
recombinants AA:AB
100
all 1:1 if ignore Y
and if we use Y?
90
AA
AA
AB
AA
AA
AB
AB
AB
Genotype
September 2004
Jax Workshop © Brian S. Yandell
11
posterior on QTL genotypes
• full conditional of Q given data, parameters
– proportional to prior pr(Q | Xi,  )
• weight toward Q that agrees with flanking markers
– proportional to likelihood pr(Yi|Q,)
• weight toward Q so that group mean GQ  Yi
• phenotype and flanking markers may conflict
– posterior recombination balances these two weightsE
• E step of EM algorithm
pr (Q | X i ,  ) pr (Yi | Q, )
pr (Q | Yi , X i , ,  ) 
pr (Yi | X i , ,  )
September 2004
Jax Workshop © Brian S. Yandell
12
idealized phenotype model
• trait = mean + additive + error
• trait = effect_of_geno + error
• pr( trait | geno, effects )
Y  GQ  E
Qq
pr (Y | Q, ) 
qq
QQ
normal( GQ ,  )
2
8
September 2004
9
10
11
12
13
Jax Workshop © Brian S. Yandell
14
15
16
17
18
19
20
13
priors & posteriors: normal data
large prior variance
n large
n large
n small
prior
n small prior
small prior variance
6
8
10
12
14
16
6
8
y = phenotype values
September 2004
Jax Workshop © Brian S. Yandell
10
12
14
16
y = phenotype values
14
priors & posteriors: normal data
model
environment
likelihood
prior
Yi =  + Ei
E ~ N( 0, 2 ), 2 known
Y ~ N( , 2 )
 ~ N( 0, 2 ),  known
posterior:
single individual
mean tends to sample mean
 ~ N( 0 + B1(Y1 – 0), B12)
sample of n individuals
fudge factor
(shrinks to 1)
September 2004

2 

 ~ N  BnY  (1  Bn ) 0 , Bn
n 

Yi
with Y  sum
n
n
Bn 
1
n  1
Jax Workshop © Brian S. Yandell
15
n large
n small prior
prior & posteriors: genotypic means Gq
6
qq
September 2004
8
10
Qq
12
y = phenotype values
Jax Workshop © Brian S. Yandell
14
16
QQ
16
prior & posteriors: genotypic means Gq
posterior centered on sample genotypic mean
but shrunken slightly toward overall mean
 is related to heritability
prior:
posterior:
fudge factor:
September 2004

Gq ~ N Y ,  2

2 



Gq ~ N  BqYq  (1  Bq )Y , Bq


n
q


Yi
nq  count {Qi  q}, Yq  sum
{i:Qi  q} n
q
Bq 
nq
nq  1
1
Jax Workshop © Brian S. Yandell
17
What if variance 2 is unknown?
• sample variance is proportional to chi-square
– ns2 / 2 ~ 2 ( n )
– likelihood of sample variance s2 given n, 2
• conjugate prior is inverse chi-square
– 2 / 2 ~ 2 (  )
– prior of population variance 2 given , 2
• posterior is weighted average of likelihood and prior
– (2+ns2) / 2 ~ 2 ( +n )
– posterior of population variance 2 given n, s2, , 2
• empirical choice of hyper-parameters
– 2= s2/3, =6
– E(2 | ,2) = s2/2, Var(2 | ,2) = s4/4
September 2004
Jax Workshop © Brian S. Yandell
18
multiple QTL phenotype model
• phenotype affected by genotype & environment
pr(Y|Q=q,) ~ N(Gq , 2)
Y = GQ + environment
• partition genotypic mean into QTL effects
Gq = 
+ 1q + . . . + mq
Gq = mean + main effects
+ 12q + . . .
+ epistatic interactions
• general form of QTL effects for model M
Gq = 
+ sumj in M jq
|M| = number of terms in model M < 2m
• can partition prior and posterior into effects jq
(details omitted)
September 2004
Jax Workshop © Brian S. Yandell
19
prior & posterior on number of QTL
• what prior on number of QTL?
• push data in discovery process
– bad: skeptic revolts!
• “answer” depends on “guess”
e
e
0
September 2004
Jax Workshop © Brian S. Yandell
e
p
u
exponential
Poisson
uniform
p p
e u u
p u u
u u u
p
e
p
e
e p
p
e
0.00
– good: reflects prior belief
prior probability
0.10
0.20
• prior influences posterior
0.30
– uniform over some range
– Poisson with prior mean
– geometric with prior mean
e
p e e
p
u p
u p
u e
u
2
4
6
8
m = number of QTL
20
10
3. QTL Model Search using MCMC
• construct Markov chain around posterior
– want posterior as stable distribution of Markov chain
– in practice, the chain tends toward stable distribution
• initial values may have low posterior probability
• burn-in period to get chain mixing well
• update m-QTL model components from full conditionals
– update locus  given Q,X (using Metropolis-Hastings step)
– update genotypes Q given ,,Y,X (using Gibbs sampler)
– update effects  given Q,Y (using Gibbs sampler)
( , Q, , m) ~ pr ( , Q, , m | Y , X )
( , Q, , m)1  ( , Q, , m)2    ( , Q, , m) N
September 2004
Jax Workshop © Brian S. Yandell
21
Gibbs sampler idea
• two correlated normals (genotypic means in BC)
– could draw samples from both together
– but easier to sample one at a time
• Gibbs sampler:
– sample each from its full conditional
– pick order of sampling at random
– repeat N times
GQQ ~ N (0,1); GQq ~ N (0,1) but cor (GQQ , GQq )  

~ N G

,1   
GQQ given GQq ~ N GQq ,1   2
GQq given GQQ
September 2004
QQ
2
Jax Workshop © Brian S. Yandell
22
Gibbs sampler samples:  = 0.6
N = 200 samples
3
-2
1
0
-2
-1
Gibbs: mean 2
2
1
0
-1
Gibbs: mean 1
2
3
2
1
0
Gibbs: mean 2
-1
1
0
-1
-2
-2
Gibbs: mean 1
2
N = 50 samples
2
0
100
150
200
-2
Gibbs: mean 2
-1
0
1
2
3
Gibbs: mean 1
2
3
2
1
0
-2
-2
Gibbs: mean 2
50
Markov chain index
2
1
1
0
-1
3
2
1
0
-1
-2
Gibbs: mean 2
-1
Gibbs: mean 1
0
-2
-1
50
Gibbs: mean 2
40
-2
30
1
20
0
10
Markov chain index
-1
0
0
10
20
30
40
Markov chain index
September 2004
50
-2
-1
0
1
Gibbs: mean 1
2
0
50
100
150
Markov chain index
Jax Workshop © Brian S. Yandell
200
-2
-1
0
1
2
Gibbs: mean 1
23
3
How to sample a locus ?
• cannot easily sample from locus full conditional
pr( |Y,X,,Q) = pr(  | X,Q)
= pr( ) pr( Q | X,  ) / constant
• to explicitly determine constant, must average
– over all possible genotypes
– over entire map
• Gibbs sampler will not work in general
– but can use method based on ratios of probabilities
– Metropolis-Hastings is extension of Gibbs sampler
September 2004
Jax Workshop © Brian S. Yandell
24
Metropolis-Hastings idea
– unless too complicated
0.4
f()
0.2
• want to study distribution f()
• take Monte Carlo samples
• Gibbs sampler: A = 1
 f ( * ) g ( * , ) 

A  min 1,
* 
 f ( ) g ( , ) 
6
8
10
g(–*)
-4
September 2004
4
0.4
– accept new value with prob A
2
0.2
• from some distribution g(,*)
• Gibbs sampler: g(,*) = f(*)
0
0.0
– current sample value 
– propose new value *
0.0
• Metropolis-Hastings samples:
Jax Workshop © Brian S. Yandell
-2
0
2
4
25
0
0.0
0.1
2000
0.2
pr( |Y)
0.3
0.4
mcmc sequence
4000
6000
0.5
8000
0.6
10000
MCMC realization
0
2
4
6
8
10
2
3
4
5
6
7
8
added twist: occasionally propose from whole domain
September 2004
Jax Workshop © Brian S. Yandell
26
0 2 4 6 8
Jax Workshop © Brian S. Yandell
800
400
0.0 0.2 0.4 0.6
pr( |Y)
1.0
0.0
pr( |Y)
0 2 4 6 8
0
0
400
800
mcmc sequence
0 2 4 6 8
0 2 4 6 8
2.0
0.0 0.4 0.8 1.2
pr( |Y)
6
4
pr( |Y)
2
0
September 2004
N = 1000 samples
narrow g
wide g
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
mcmc sequence
150
0 50
150
mcmc sequence
N = 200 samples
narrow g
wide g
0 50
mcmc sequence
Metropolis-Hastings samples
0 2 4 6 8
27
sampling multiple loci
0
1
m+1 2 … m
L
action steps: draw one of three choices
• update m-QTL model with probability 1-b(m+1)-d(m)
– update current model using full conditionals
– sample m QTL loci, effects, and genotypes
• add a locus with probability b(m+1)
– propose a new locus along genome
– innovate new genotypes at locus and phenotype effect
– decide whether to accept the “birth” of new locus
• drop a locus with probability d(m)
– propose dropping one of existing loci
– decide whether to accept the “death” of locus
September 2004
Jax Workshop © Brian S. Yandell
28
reversible jump MCMC
• consider known genotypes Q at 2 known loci 
– models with 1 or 2 QTL
• jump between 1-QTL and 2-QTL models
– adjust parameters when model changes
–  and 1 differ due to collinearity of QTL genotypes
m  1 : Y     1Q  e
m  2 : Y     1Q   2 Q  e
September 2004
Jax Workshop © Brian S. Yandell
29
geometry of reversible jump
0.6
0.6
0.8
Reversible Jump Sequence
0.8
Move Between Models
b2
0.2 0.4
b2
0.2 0.4
c21 = 0.7
0.0
0.0
m=2
m=1
0.0
0.2
September 2004
0.4
1b1
0.6
0.8
0.0
0.2
0.4
b1
0.6
0.8
1
Jax Workshop © Brian S. Yandell
30
geometry allowing Q and  to change
first 1000 with m<3
0.0
0.0
0.05
b2
0.1 0.2
b2
0.10
0.3
0.15
0.4
a short sequence
0.05
September 2004
0.10
 b1
1
0.15
-0.3 -0.2 -0.1 0.0 0.1 0.2
b1

Jax Workshop © Brian S. Yandell
1
31
Gibbs sampler with loci indicators
• partition genome into intervals
– at most one QTL per interval
– interval = marker interval or large chromosome region
• use loci indicators in each interval
–  = 1 if QTL in interval
–  = 0 if no QTL
• Gibbs sampler on loci indicators
– still need to adjust genetic effects for collinearity of Q
– see work of Nengjun Yi (and earlier work of Ina Hoeschele)
Y     1  1Q   2  2 Q  e
September 2004
Jax Workshop © Brian S. Yandell
32
epistatic interactions
• model space issues
– 2-QTL interactions only?
– Fisher-Cockerham partition vs. tree-structured?
– general interactions among multiple QTL
• model search issues
– epistasis between significant QTL
• check all possible pairs when QTL included?
• allow higher order epistasis?
– epistasis with non-significant QTL
•
• whole genome paired with each significant QTL?
• pairs of non-significant QTL?
Yi Xu (2000) Genetics; Yi, Xu, Allison (2003) Genetics; Yi (2004)
September 2004
Jax Workshop © Brian S. Yandell
33
limits of epistatic inference
• power to detect effects
– epistatic model size grows exponentially
• |M| = 3m for general interactions
– power depends on ratio of n to model size
• want n / |M| to be fairly large (say > 5)
• n = 100, m = 3, n / |M| ≈ 4
• empty cells mess up adjusted (Type 3) tests
–
–
–
–
missing q1Q2 / q1Q2 or q1Q2q3 / q1Q2q3 genotype
null hypotheses not what you would expect
can confound main effects and interactions
can bias AA, AD, DA, DD partition
September 2004
Jax Workshop © Brian S. Yandell
34
4. Model Assessment
• balance model fit against model complexity
model fit
prediction
interpretation
parameters
smaller model
miss key features
may be biased
easier
low variance
bigger model
fits better
no bias
more complicated
high variance
• information criteria: penalize L by model size |M|
– compare IC = – 2 log L( M | Y ) + penalty( M )
• Bayes factors: balance posterior by prior choice
– compare pr( data Y | model M )
September 2004
Jax Workshop © Brian S. Yandell
35
QTL Bayes factors
– general comparison of models
– want Bayes factor >> 1
• m = number of QTL
– indexes model complexity
– genetic architecture also important
prior probability
0.10
0.20
• same as LOD test
e
e
e
p
u
exponential
Poisson
uniform
p p
u u e
u u p
u u u
p
e
p
e
e p
p
e
0.00
– simple comparison: 1 vs 2 QTL
0.30
• BF = posterior odds / prior odds
• BF equivalent to BIC
0
e
p e e
p
u p
u p
u e
u
2
4
6
8
m = number of QTL
pr (m|data ) /pr (m)
BFm,m 1 
pr (m  1|data ) /pr (m  1)
September 2004
Jax Workshop © Brian S. Yandell
36
10
Bayes factors to assess models
•Bayes factor: which model best supports the data?
– ratio of posterior odds to prior odds
– ratio of model likelihoods
•equivalent to LR statistic when
– comparing two nested models
– simple hypotheses (e.g. 1 vs 2 QTL)
•Bayes Information Criteria (BIC)
– Schwartz introduced for model selection in general settings
– penalty to balance model size (p = number of parameters)
pr ( model 1 | Y ) / pr ( model 2 | Y ) pr (Y | model 1 )
B12 

pr ( model 1 ) / pr ( model 2 )
pr (Y | model 2 )
 2 log( B12 )  2 log( LR)  ( p2  p1 ) log( n )
September 2004
Jax Workshop © Brian S. Yandell
37
3 4
BF sensitivity to fixed prior for effects
Bayes factors
0.5
1
2
4
3
2
3
4
2
4
3
2
4
3
2
4
3
2
4
3
2
4
3
2
3
4
2
4
3
2
1
1
1
1
1
0.2
3
4
2
4
3
2
1
1
1
1
0.05
1
0.20
1
0.50
2.00 5.00 2
hyper-prior heritability h
4
3
2
1
B45
B34
B23
B12
20.00 50.00
 h2 s 2  2
, h fixed
 jq ~ N 0,
 |M |
September 2004
Jax Workshop © Brian S. Yandell
38
BF insensitivity to random effects prior
insensitivity to hyper-prior
1.0
3.0
hyper-prior density 2*Beta(a,b)
0.0
0.5
1.0
1.5
2
hyper-parameter heritability h
3
2
3
2
3
2
3
2
Bayes factors
0.2
0.4 0.6
0.0
density
1.0
2.0
0.25,9.75
0.5,9.5
1,9
2,10
1,3
1,1
2.0
3
2
1
1
1
1
0.05
0.10
0.20
2
Eh
1
3
3
2
2
B34
B23
B12
1
1
0.50
1.00
 h2 s 2  h2
,
 jq ~ N 0,
~ Beta (a, b)
 |M | 2
September 2004
Jax Workshop © Brian S. Yandell
39
simulations and data studies
•increase to detect all 8
loci
effect
1
2
3
4
5
6
7
8
11
50
62
107
152
32
54
195
–3
–5
+2
–3
+3
–4
+1
+2
1
1
3
6
6
8
8
9
30
20
8
9
10
11
12
13
Genetic map
ch1
ch2
ch3
ch4
ch5
ch6
ch7
ch8
ch9
ch10
0
September 2004
7
number of QTL
Chromosome
QTL chr
0
– n=500, heritability to 97%
posterior
10
frequency in %
– (Stephens, Fisch 1998)
– n=200, heritability = 50%
– detected 3 QTL
40
•simulated F2 intercross, 8 QTL
50
100
Jax Workshop © Brian S. Yandell
150
200
40
loci pattern across genome
• notice which chromosomes have persistent loci
• best pattern found 42% of the time
m
8
9
7
9
9
9
Chromosome
1 2 3 4
2 0 1 0
3 0 1 0
2 0 1 0
2 0 1 0
2 0 1 0
2 0 1 0
September 2004
5
0
0
0
0
0
0
6
2
2
2
2
3
2
7
0
0
0
0
0
0
8
2
2
1
2
2
2
9
1
1
1
1
1
2
10
0
0
0
0
0
0
Jax Workshop © Brian S. Yandell
Count of 8000
3371
751
377
218
218
198
41
B. napus 8-week vernalization
whole genome study
• 108 plants from double haploid
– similar genetics to backcross: follow 1 gamete
– parents are Major (biennial) and Stellar (annual)
• 300 markers across genome
– 19 chromosomes
– average 6cM between markers
• median 3.8cM, max 34cM
– 83% markers genotyped
• phenotype is days to flowering
– after 8 weeks of vernalization (cooling)
– Stellar parent requires vernalization to flower
• available in R/bim package
• Ferreira et al. (1994); Kole et al. (2001); Schranz et al. (2002)
September 2004
Jax Workshop © Brian S. Yandell
42
-50000
-40000
-30000
-20000
-10000
0
-50000
-40000
-30000
-20000
-10000
0
-50000
-40000
-30000
-20000
-10000
0
-50000
-40000
-30000
-20000
-10000
0
8 10
6
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06
mcmc sequence
0.5
herit
mcmc sequence
0.1
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06
mcmc sequence
15
0
5
10
10
LOD
15
20
20
burnin sequence
5
LOD
0.010
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06
0.3
0.4
burnin sequence
0.2
herit
0.6
0.004
0.006
0.010
envvar
0.014
0.016
burnin sequence
0.0
number of QTL
environmental variance
h2 = heritability
(genetic/total variance)
LOD = likelihood
envvar
0
2
5
4
nqtl
nqtl
burnin (sets up chain)
mcmc sequence
10 15 20 25
Markov chain Monte Carlo sequence
September 2004
burnin sequence
Jax Workshop © Brian S. Yandell
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06
mcmc sequence
43
MCMC sampled loci
120
tg2f12
includes all models
100
40
note concentration
on chromosome N2
wg6b2
E35M59.117
wg9c7
E38M50.133
Aca1
E33M62.140
E33M50.183
wg4a4b
tg6a12
wg5a5
wg8g1b
E32M61.218
ec4e7a
wg5b1a
wg6b10
E35M62.117
E33M59.59
wg7f3a
ec3a8
wg7a11
ec4g7b
wg6c6
E35M48.148
wg4f4c
E33M49.165
wg3g11
E32M49.73
wg1g8a
20
points jittered for view
blue lines at markers
isoIdh
wg2e12b
E33M49.211
ec2d1a
MCMC sampled loci
60
80
subset of chromosomes
N2, N3, N16
wg3f7c
tg2h10b
wg1a10
ec2d8a
E33M47.338
wg2d11b
ec3b12
E38M50.159
E33M49.117
E32M50.409
E35M59.85
slg6
tg5d9b
wg4d10
wg7f5b
E32M47.159
ec5e12a
0
ec4h7
wg1g10b
E38M62.229
E33M49.491
ec2e5a
N2
September 2004
Jax Workshop © Brian S. Yandell
N3
chromosome
wg2a3c
wg5a1b
ec3e12b
N16
44
Bayesian model assessment
Bayes factor ratios
posterior / prior
0.3
0.2
50
QTL posterior
moderate
weak
1
0.0
3
number of QTL
Bayes factor ratios
1
September 2004
3
5
7
9
model index
11
13
Jax Workshop © Brian S. Yandell
4
3
moderate
1
2
weak
2
2
strong
2
2
3
1 e+01
3
posterior / prior
5 e-01
0.0
0.1
2*2
2:2,12
3:2*2,12
2:2,13
2:2,3
2:2,16
2:2,11
3:2*2,3
2:2,15
4:2*2,3,16
3:2*2,13
2:2,14
0.2
model posterior
0.3
2
5 e+02
pattern posterior
evidence suggests
4-5 QTL
N2(2-3),N3,N16
11
9
7
5
3
1
11
9
7
5
number of QTL
2
1
2
2
col 1: posterior
col 2: Bayes factor
note error bars on bf
strong
5
0.1
row 1: # QTL
row 2: pattern
500
QTL posterior
1
3
5
7
9
model index
11
13
45
Bayesian estimates of loci & effects
model averaging: at least 4 QTL
loci histogram
0.02
0.04
0.00
histogram of loci
blue line is density
red lines at estimates
4
0.06
napus8 summaries with pattern 1,1,2,3 and m
September 2004
0
50
N2
100
150
N3
200
250
100
150
N3
200
250
N16
additive
-0.02 0.02
50
-0.08
estimate additive effects
(red circles)
grey points sampled
from posterior
blue line is cubic spline
dashed line for 2 SD
0
N2
Jax Workshop © Brian S. Yandell
N16
46
0.008
100
0
0.006
50
density
pattern: N2(2),N3,N16
col 1: density
col 2: boxplots by m
envvar
200
0.010
Bayesian model diagnostics
0.004
0.006
0.008
0.010
0.012
4
4
5
6
7
8
9
11
12
envvar conditional on number of QTL
0.5
0.6
0.7
4
4
5
6
7
8
9
11
12
LOD
14 16
18
20
0.12
heritability conditional on number of QTL
0.08
density
0.50
0.30
0.4
0.04
5
10
15
marginal LOD, m
September 2004
0.40
heritability
4
3
2
density
1
0
0.3
marginal heritability, m
10 12
but note change with m
0.2
0.00
environmental variance
2 = .008,  = .09
heritability
h2 = 52%
LOD = 16
(highly significant)
0.60
5
marginal envvar, m
20
25
4
Jax Workshop © Brian S. Yandell
4
5
6
7
8
9
11
12
LOD conditional on number of QTL
47
Bayesian software for QTLs
• R/bim (Satagopan Yandell 1996; Gaffney 2001)
– www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl
– www.r-project.org contributed package
– version available within WinQTLCart (statgen.ncsu.edu/qtlcart)
• Bayesian IM with epistasis (Nengjun Yi, U AB)
– separate C++ software (papers with Xu)
– plans in progress to incorporate into R/bim
• R/qtl (Broman et al. 2003)
– biosun01.biostat.jhsph.edu/~kbroman/software
– www.r-project.org contributed package
• Pseudomarker (Sen Churchill 2002)
– www.jax.org/staff/churchill/labsite/software
• Bayesian QTL / Multimapper
– Sillanpää Arjas (1998)
– www.rni.helsinki.fi/~mjs
• Stephens & Fisch (email)
September 2004
Jax Workshop © Brian S. Yandell
48
R/bim: our software
• www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl
– R contributed library (www.r-project.org)
• library(bim) is cross-compatible with library(qtl)
– Bayesian module within WinQTLCart
• WinQTLCart output can be processed using R library
• Software history
– initially designed by JM Satagopan (1996)
– major revision and extension by PJ Gaffney (2001)
•
•
•
•
whole genome
multivariate update of effects; long range position updates
substantial improvements in speed, efficiency
pre-burnin: initial prior number of QTL very large
– upgrade (H Wu, PJ Gaffney, CF Jin, BS Yandell 2003)
– epistasis in progress (H Wu, BS Yandell, N Yi 2004)
September 2004
Jax Workshop © Brian S. Yandell
49
many thanks
Michael Newton
Daniel Sorensen
Daniel Gianola
Liang Li
Hong Lan
Hao Wu
Nengjun Yi
David Allison
Tom Osborn
David Butruille
Marcio Ferrera
Josh Udahl
Pablo Quijada
Alan Attie
Jonathan Stoehr
Gary Churchill
Jaya Satagopan
Fei Zou
Patrick Gaffney
Chunfang Jin
Yang Song
Elias Chaibub Neto
Xiaodan Wei
USDA Hatch, NIH/NIDDK, Jackson Labs
September 2004
Jax Workshop © Brian S. Yandell
50
Download