Bayesian Inference for QTLs in Inbred Lines I Brian S Yandell

advertisement
Bayesian Inference for QTLs
in Inbred Lines I
Brian S Yandell
University of Wisconsin-Madison
www.stat.wisc.edu/~yandell
NCSU Statistical Genetics
June 1999
June 1999
NCSU QTL Workshop © Brian S.
Yandell
1
Note on Outbred Studies
• Interval Mapping
–
–
–
–
Haley, Knott & Elsen (1994) Genetics
Thomas & Cortessis (1992) Hum. Hered.
Hoeschele & vanRanden (1993ab) Theor. Appl. Genet. (etc.)
Guo & Thompson (1994) Biometrics
• Nuances -- faking it
– experimental outbred crosses
• collapse markers from 4 to 2 alleles
– pedigrees
• polygenic effects not modeled here
• related individuals are correlated (via coancestry)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
2
Many Thanks
Michael Newton
Daniel Sorensen
Daniel Gianola
Jaya Satagopan
Patrick Gaffney
Fei Zou
Liang Li
Tom Osborn
David Butruille
Marcio Ferrera
Josh Udahl
Pablo Quijada
Alan Attie
Jonathan Stoehr
USDA Hatch Grants
June 1999
NCSU QTL Workshop © Brian S.
Yandell
3
What is the Goal Today?
• show MCMC ideas
• Bayesian perspective
– Gibbs sampler
– Metropolis-Hastings
– common in animal model
– use “prior” information
• handle hard problems
• previous experiments
• related genomes
– image analysis
– genetics
– large dependent data
• resampling our data
– permutation tests
– MCMC
– other (bootstrap,…)
June 1999
• inbred lines “easy”
– can check against *IM
– ready extension
• multiple experiments
• pedigrees
• non-normal data
NCSU QTL Workshop © Brian S.
Yandell
4
Overview
• Part I: Single QTL
–
–
–
–
–
Modeling a trait with a QTL
Likelihoods, Frequentists & Bayesians
Profile LODs & Bayesian Posteriors
Graphical Summaries
Testing & Estimation
• Part II: Multiple QTLs
–
–
–
–
June 1999
Sampling from a Markov Chain
Issues for Multiple QTLs
Bayes Factors & Model Selection
Brassica data analysis
NCSU QTL Workshop © Brian S.
Yandell
5
Part I: Single QTL
• Modelling a trait with a QTL
– linear model for trait given genotype
– recombination near loci for genotype
• Bayesian Posterior
• Likelihoods
– EM & MCMC
– Frequentists & Bayesians
• Review Interval Maps & Profile LODs
• Case Study: Simulated Single QTL
June 1999
NCSU QTL Workshop © Brian S.
Yandell
6
QTL Components
• observed data on individual
– trait: field or lab measurement
• log( days to flowering ) , yield, …
– markers: from wet lab (RFLPs, etc.)
• linkage map of markers assumed known
• unobserved data on individual
– geno: genotype (QQ=1/Qq=0/qq=-1)
• unknown model parameters
– effects: mean, difference, variance
– locus: quantitative trait locus (QTL)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
7
Single QTL trait Model
• trait = mean + additive + error
• trait = effect_of_geno + error
• prob( trait | geno, effects )
y j    b* x*j  e j
x=1
x=-1
x=0
 ( y j | x*j ;  , b* ,  2 )
 y j    b* x*j 

 





June 1999
10
NCSU QTL Workshop © Brian S.
Yandell
12
14
8
6
6
8
8
10
trait
10
12
12
14
14
Simulated Data with 1 QTL
x=-1
June 1999
x=1
0
2
NCSU QTL Workshop © Brian S.
Yandell
4
6
frequency
8
9
Recombination and Distance
• no interference--easy approximation
– Haldane map function
– no interference with recombination
• all computations consistent in approximation
– rely on given map
• marker loci assumed known
– 1-to-1 relation of distance to recombination
– all map functions are approximate
• assume marker positions along map are known
r
June 1999
1
2
1  e 
2 
NCSU QTL Workshop © Brian S.
Yandell
10
markers, QTL & recombination rates
r3
r2
r1
r5
r4
*
x
?
M1 M 2
M3
M4
M5
M6
?

June 1999
distance along chromosome
NCSU QTL Workshop © Brian S.
Yandell
11
Interval Mapping of QT genotype
• can express probabilities in terms of distance
– locus is distance along linkage map
– flanking markers sufficient if no missing data
– could consider more complicated relationship
prob( geno | locus, map )
= prob( geno | locus, flanking markers )
 ( x*j |  )   ( x*j | , M j ,k , M j ,k 1 )
June 1999
NCSU QTL Workshop © Brian S.
Yandell
12
Building trait Likelihood
• likelihood is mixture across possible genotypes
• sum over all possible genotypes at locus
like( effects, locus | trait )
= sum of prob( trait, genos | effects, locus )
L (  , b* ,  2 ,  | y j )   ( y j |  , b* ,  2 ;  )

*
2

(
y
|
x
;

,
b
,

) ( x |  )
 j
x  1, 0 ,1
June 1999
NCSU QTL Workshop © Brian S.
Yandell
13
Likelihood over Individuals
• product of trait probabilities across individuals
– product of sum across possible genotypes
like( effects, locus | traits, map )
= product of prob( trait | effects, locus, map )
n
L(  , b* ,  2 ;  | y )    ( y j |  , b* ,  2 ;  )
j 1
n

*
2

(
y
|
x
;

,
b
,

) ( x |  )
 j
j 1 x  1, 0 ,1
June 1999
NCSU QTL Workshop © Brian S.
Yandell
14
25
Profile LOD for 1 QTL
0
5
10
15
LOD
20
QTL
IM
0
10
20
30
40
50
60
70
80
90
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
15
Interval Mapping for
Quantitative Trait Loci
• profile likelihood (LOD) across QTL
– scan whole genome locus by locus
• use flanking markers for interval mapping
– maximize likelihood ratio (LOD) at locus
• best estimates of effects for each locus
• EM method (Lander & Botstein 1989)

n

LOD ( )  (log 10 e) ln 
j 1


June 1999
  (y
| x; ˆ , bˆ* , ˆ 2 ) ( x |  ) 
x  1, 0 ,1

*
2
ˆ
 ( y j | ˆ , b  0, ˆ )


j
NCSU QTL Workshop © Brian S.
Yandell
16
Interval Mapping Tests
• profile LOD across possible loci in genome
– maximum likelihood estimates of effects at locus
– LOD is rescaling of L(effects, locus|y)
• test for evidence of QTL at each locus
– LOD score (LR test)
– adjust (?) for multiple comparisons
June 1999
NCSU QTL Workshop © Brian S.
Yandell
17
Basic Idea of Likelihood Use
• build likelihood in steps
– build from trait & genotypes at locus
– likelihood for individual i
– log likelihood over individuals
• maximize likelihood (interval mapping)
– EM method (Lander & Botstein 1989)
– MCMC method (Guo & Thompson 1994)
• study whole likelihood as posterior (Bayesian)
– analytical methods (e.g. Carlin & Louis 1998)
– MCMC method (Satagopan et al 1996)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
18
EM-MCMC duality
• EM approach can be redone with MCMC
– EM estimates & maximizes
– MCMC draws random samples
– both can address same problem
• sometimes EM is hard (impossible) to use
• MCMC is tool of “last resort”
–
–
–
–
June 1999
use exact methods if you can
try other approximate methods
be clever!
very handy for hard problems in genetics
NCSU QTL Workshop © Brian S.
Yandell
19
0.2
0.0
0.2
parameter
parameter = 4
012345678910
parameter = 6
012345678910
data : y  1,3,8
posterior
0.3
0
0.0
0.1
0.0002
0.2
likelihood
0 2 4 6 8
June 1999
0.0
parameter = 2
012345678910
0.0004
0.0
0.2
Likelihood & Posterior Example
parameter : t  ?
t k e t
prob{ y  k | t} 
k!
NCSU QTL Workshop © Brian S.
Yandell
20
Studying the Likelihood
• maximize (*IM)
– find the peak
– avoid local maxima
– profile LOD
• across locus
• max for effects
• sample (Bayes)
– get whole curve
– summarize later
– posterior
• EM method
– always go up
– steepest ascent
• MCMC method
–
–
–
–
jump around
go up if you can
sometimes go down
cool down to find peak
• simulated annealing
• simulated tempering
• locus & effects together
June 1999
NCSU QTL Workshop © Brian S.
Yandell
21
Bayes Theorem
• posteriors and priors
– prior:
– posterior:
prob( parameters )
prob( parameters | data )
• posterior = likelihood * prior / constant
• posterior distribution is proportional to
– likelihood of parameters given data
– prior distribution of parameters
 ( param , data)  (data | param)   ( param)
 ( param | data) 

 (data)
 (data)
 ( param | data)   (data | param)   ( param)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
22
Bayesian Prior
• “prior” belief used to infer “posterior” estimates
– higher weight for more probable parameter values
• based on prior knowledge
– use previous study to inform current study
• weather prediction: tomorrow is much like today
• previous QTL studies on related organisms
– historical criticism: can get “religious” about priors
• often want negligible effect of prior on posterior
– pick non-informative priors
• all parameter values equally likely
• large variance on priors
– always check sensitivity to prior
June 1999
NCSU QTL Workshop © Brian S.
Yandell
23
How to Study Posterior?
• exact methods
• Monte Carlo methods
– exact if possible
– can be difficult or
impossible to analyze
• approximate methods
– importance sampling
– numerical integration
– Monte Carlo & other
June 1999
– easy to implement
– independent samples
• MCMC methods
– handle hard problems
– art to efficient use
– correlated samples
NCSU QTL Workshop © Brian S.
Yandell
24
QTL Bayesian Inference
• study posterior distribution of locus & effects
– sample joint distribution
• locus, effects & genotypes
– study marginal distribution of
• locus
• effects
– overall mean, genotype difference, variance
• locus & effects together
• estimates & confidence regions
– histograms, boxplots & scatter plots
– HPD regions
June 1999
NCSU QTL Workshop © Brian S.
Yandell
25
38
40
2.0
2.2
36
1.8
1.8
2.0
additive
2.2
Posterior for locus & effect
42
QTL 1
0.0
0.1
0.2
0.3
distance (cM)
36
38
40
42
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
26
QTL Marginal Posterior
• posterior = likelihood * prior / constant
• posterior distribution is proportional to
– likelihood of traits & genos given effects & locus
– prior distribution of effects & locus
 (b* | y)
is proportional to
n
 (b )   ( y j | x*j ;  , b* ,  2 )
*
j 1
June 1999
NCSU QTL Workshop © Brian S.
Yandell
27
Marginal Posterior Summary
• marginal posterior for locus & effects
• highest probability density (HPD) region
– smallest region with highest probability
– credible region for locus & effects
• HPD with 50,80,90,95%
– range of credible levels can be useful
– marginal bars and bounding boxes
– joint regions (harder to draw)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
28
1.8
2.0
additive
2.2
HPD Region for locus & effect
36
38
40
42
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
29
QTL Bayesian Posterior
• posterior = likelihood * prior / constant
• posterior( paramaters | data )
prob( genos, effects, loci | trait, map )
 ( x * ;  , b* ,  2 ;  | y )
is proportional to
n
*
*
2

(
y
|
x
;

,
b
,

)
 j j
j 1
n
 (  ) (b* ) ( 2 ) ( )   ( x*j |  )
j 1
June 1999
NCSU QTL Workshop © Brian S.
Yandell
30
Frequentist or Bayesian?
• Frequentist approach
– fixed parameters
• range of values
– maximize likelihood
• ML estimates
• find the peak
– confidence regions
• random region
• invert a test
– hypothesis testing
• 2 nested models
June 1999
• Bayesian approach
– random parameters
• distribution
– posterior distribution
• posterior mean
• sample from dist
– credible sets
• fixed region given data
• HPD regions
– model selection/critique
• Bayes factors
NCSU QTL Workshop © Brian S.
Yandell
31
Frequentist or Bayesian?
• Frequentist approach
– maximize over mixture
of QT genotypes
– locus profile likelihood
• max over effects
– HPD region for locus
• natural for locus
– 1-2 LOD drop
• work to get effects
• Bayesian approach
– joint distribution over
QT genotypes
– sample distribution
• joint effects & loci
– HPD regions for
• joint locus & effects
• use density estimator
– approximate shape of
likelihood peak
June 1999
NCSU QTL Workshop © Brian S.
Yandell
32
Interval Mapping for
Quantitative Trait Loci
• profile likelihood (LOD) across QTL
– scan whole genome locus by locus
• use flanking markers for interval mapping
– maximize likelihood ratio (LOD) at locus
• best estimates of effects for each locus
• EM method (Lander & Botstein 1989)

n

LOD ( )  (log 10 e) ln 
j 1


June 1999
  (y
| x; ˆ , bˆ* , ˆ 2 ) ( x |  ) 
x  1, 0 ,1

*
2
ˆ
 ( y j | ˆ , b  0, ˆ )


j
NCSU QTL Workshop © Brian S.
Yandell
33
25
Profile LOD for 1 QTL
0
5
10
15
LOD
20
QTL
IM
0
10
20
30
40
50
60
70
80
90
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
34
Interval Mapping Tests
• profile LOD across possible loci in genome
– maximum likelihood estimates of effects at locus
– LOD is rescaling of L(effects, locus|y)
• test for evidence of QTL at each locus
– LOD score (LR test)
– adjust (?) for multiple comparisons
June 1999
NCSU QTL Workshop © Brian S.
Yandell
35
Interval Mapping Estimates
• confidence region for locus
– based on inverting test of no QTL
– 2 LODs down gives approximate CI for locus
– based on chi-square approximation to LR
• confidence region for effects
– approximate CI for effect based on normal
– point estimate from profile LOD


locus CI   | LOD (ˆ )  LOD ( )  2
effect CI  bˆ*  1.96se(bˆ* )
June 1999
NCSU QTL Workshop © Brian S.
Yandell
36
1.6
1.8
2.0
additive
2.2
2.4
IM Confidence Region
36
38
40
42
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
37
How to Proceed?
• figure out how to construct posterior
– Markov chain Monte Carlo
• run Markov chain
• summarize results
• diagnostics
June 1999
NCSU QTL Workshop © Brian S.
Yandell
38
MCMC Run for 1 locus Data
distance (cM)
40
38
36
42
40
38
36
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1
11 1 11 1
11
1 11 11 11
1
1
1 1
1 11 1 1 1 111 1 11
1
1
1
1
1 1 1 1 11
11
1
1
1
1
1
1
1
1
1
1
1
1 1 11 1
1
1111 1 11 111 1
111
11 1
11 1
11
1 11
11 1 111 11 1 1 1 11111 1 1 1111 1
111 1 1
1
1
1
1
1
1
1
1 1 1 11 11 1 1 11 1 1 11
1
1
11 1 11111111 1 1111 1
11
1 111
1 11 111 11 11
1 11111111
1
1
1
1
11 1 1 11
111 11 1
1 11 1
11 1
1
1 11 111 111 11111
11
1
1 1 1 1111 1 11
11111 111 1111 1 1
1
1
1
11111 1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 11 11111111
1
1
1
1
1
1
1
1
1
1
1
11
1 1 1 11 1 1111 1 1 11 1 11 111 1 111 111 1
1 111 11111
11111
1 1 1 1 11 11111 111 111111 11 1
11111 111 1 1
1
1
1
1
1
1
1
1
1
11
111111 11
1 1 1 1 11
11 1 111
1 1 1 1 1 111111 1 1 1 1111111 11 1 11 11 11 1 11
1
111111111 111111111111 1 1 11 1111 1 11111 111 11 111 11 1 1
1 111
1
1
11 111111 1111 11 1 11111 1 1
1 11 1
1
1
1
1
1
1
1
1
1
11
1
1
1 1111
1 1111 11
1
1 1 1 1 11
1 111
111 11 1 11 111 11111 1 111 11 1111 11 11 11 1111111
1
1
1
111 11
1 11
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 11 1111 11 11 1 1 1 1 1 11 1 1111 1 111111 11
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1 111 111
1
1 1 1 1 111 111 11 11 11 111 1 1 1 11111111111 11 111
1
1 11 11 1
11 1
11 11 1 1 1
111 1 1111
111
1
1 1 11
111 1 11 1
1
11 11 1
1
1
1
1
1
1
1
1
1
11 1 11 1 11 1 1111
1
1 1 1 1 1 11 111
1
1
1
1
1
1
11 11 11 1 1
11 1
1 1 1 11 11 11 1 1 11
1
1 1 11 11 111 1 1 11 1 1 1
1
1
1
1
1
1
1
1 1 1 11
11 1 111 1 1 111
1
1
1
11
1
1 1 1 11
1 11 1
1
1 111
1
11
1 1
1
1
1 11
11
1
1 1 11 1
1
1
1
1
111
1
1 1
1
1
1
1
1
1 11
1
11
1
1
1
1
1
1
1
42
1
1
1
1
1
0
200
400
600
800
1000
0
MCMC run/100
June 1999
NCSU QTL Workshop © Brian S.
Yandell
20
40
60
frequency
39
Part II: Multiple QTL
• Sampling from the Posterior
– Markov chain Monte Carlo
• Gibbs sampler for effects
• Metropolis-Hastings for loci
• Issues for 2 QTL
• Bayes factors & Model Selection
• Simulated data for 0,1,2 QTL
• Brassica data on days to flowering
June 1999
NCSU QTL Workshop © Brian S.
Yandell
40
Posterior for effects
•start with joint distribution of effects
– does not depend on locus given genotypes
– standard linear model problem
 ( , b ,  | y, x )
*
2
*
is proportional to
n
 (  ) (b ) ( )  ( y j | x ;  , b ,  )
*
2
*
j
*
2
j 1
June 1999
NCSU QTL Workshop © Brian S.
Yandell
41
How to Study Posterior?
• exact methods
• Monte Carlo methods
– exact if possible
– can be difficult or
impossible to analyze
• approximate methods
– importance sampling
– numerical integration
– Monte Carlo & other
June 1999
– easy to implement
– independent samples
• MCMC methods
– handle hard problems
– art to efficient use
– correlated samples
NCSU QTL Workshop © Brian S.
Yandell
42
Ordinary Monte Carlo
• independent samples of joint distribution
• chaining (or peeling) of effects
• requires numerical integration
– possible analytically here
– very messy in general
 (  , b* ,  2 | y, x * ) 
 ( 2 | y, x * ;  , b* )   (b* | y, x * ;  )   (  | y, x * )
 (  | y, x * )  E(b , )  (  , b* ,  2 | y, x * ) 
*
June 1999
2
NCSU QTL Workshop © Brian S.
Yandell
43
Gibbs Sampler for effects
• set up Markov chain around posterior for effects
• sample from posterior by sampling from full conditionals
– conditional posterior of each parameter given the other
– update parameter by sampling full conditional
update mean
 (  | y, x * ; b* ,  2 )   (  ) (y | x * ;  , b* ,  2 )
update additive
 (b* | y, x * ;  ,  2 )   (b* ) (y | x * ;  , b* ,  2 )
update variance
 ( 2 | y, x * ;  , b* )   ( 2 ) (y | x * ;  , b* ,  2 )
June 1999
NCSU QTL Workshop © Brian S.
Yandell
44
Markov chain updates
genos
mean
additive
variance
traits
June 1999
NCSU QTL Workshop © Brian S.
Yandell
45
Markov chain idea
• future events given the present state are
independent of past events
• update chain based on current value
• can make chain arbitrarily complicated
• chain converges to stable pattern
 (1)  p /( p  q)
p
1-p
0
1
1-q
q
June 1999
NCSU QTL Workshop © Brian S.
Yandell
46
Markov chain idea
p
1-p
Mitch’s
0
 (1)  p /( p  q)
q
1
1-q
1
Series1
Series2
Other
0
1
June 1999
3
5
7
9 11 13 15 17 19 21
NCSU QTL Workshop © Brian S.
Yandell
47
Markov chain Monte Carlo
• can study arbitrarily complex models
– need only specify how parameters affect each other
– can reduce to specifying full conditionals
• construct Markov chain with “right” model
– update some parameters given data and others
– can fudge on “right” (importance sampling)
– next step depends only on current estimates
• nice Markov chains have nice properties
– sample summaries make sense
– consider almost as random sample from distribution
June 1999
NCSU QTL Workshop © Brian S.
Yandell
48
0
200
400
600
800
10.4
mean
10.0 10.2
1
1
1
1
1
1 11
11
1
1
1
11 11
1
1
1
111
11 11
1111 1111 1 1111 11 1111
1
1
1
1
1
1
1
1
1
1
1
1 111 111 11
1
11 1 1 1 111 1 111 1 1 1 11
1
11
1 1 1 1111111111 1
111
111
11111
11111111111111111111
1111 11 1
1111
11111111
11111111 111
11 1111
11
111111
111 11 1 11
111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1111111
1 1111111 111111 1111111
1
1
1
1
1
1
1
1111 11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 11111111111111111111
11111111 1111111
111
11111
1111111 11
1111111 111
11
111
1111111 111
1
1111
11 111111111
11
111
11111
11
1111
1111
11 1111111111
11 111111
11111111
111111111
1111
111111
11
111 11
11
1111111111
1111
11111
111111
1
1
1
1
1 111111
1
1
11
1
1
1
111111 1 11111 1111111
1
1
1
1
1
1
1
1
1
111111111
1
1
1
1
1
1
1
1111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
11 1111 1111 111111
111111 1 1111111 1111 11 1111111 1111111
1
11 1111111 1 111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1 11
1 1 1 11 1 1 1111111 111 111
11
11 111
11 1
1
1
11 1 11 11 1 11
1 11 11 11 1 11 111
1
1
1
1 111 1 1 1
1 1
1
1
11
1
1
1
1
9.8
1
9.6
9.6
9.8
10.0
10.2
10.4
MCMC run of mean & effect
1000
0
20
MCMC run/100
40
60
frequency
11
1 1
1
1
1
1
11
111 1
1
1
11
1
1 1 11
11 1 1 1111 1 11 111 1 11 1
1
1
1
1
1
1
1
1
1 11 1111
11111 1 1
1 11 1 1 1 111 11 1 11 11 111 11111111
111 1 1111 1 11 11
111111111111111
1
11 111 111111111 1
1
1
1
1
1
1
1
1
11
11111 1 1 1111111111
11 11111111 1111 111 11111
1111111 11111111111
1 1111111
11
1111 11 111111111
1111 1111111 11111111111111111
11
111 1111111
1 11 111
1111111
111 11111 1111 111111
1111
1111
1
111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 111
1111
1 111
11111
1111 111111111 11 1 1111
11111111111111111111
1 1 1111111
111111111 111
11111
11111
1111
11111
11
11
1 111111
111
11111 111 111111
1
1
11
111 1111111 1 111
11111
1
11
1
1
1
1
1
1
1
1
1
1
11111 1111
11 111111 11111
11
1 111111
1111111
11111
111 11
11111
1111111 11 11111
1111
1111111
11111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11 1 11 1
1111111111 111
1111 1 1 1 1111
11 11 1 1 1
111
1 111
1 1111111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111 11 1
1 1111111111 11111
1 11 11 1 1 11 1 11 11
11 111111 11 11
11
11
1 11
1 11111111 111 11 1
11 1 1111111111 11
1
1
1 11
11111111 111 1 1 111 111
11111
1 1 1 11 1 11 1
1 111
1 1 11
1
1
1
1
1 1
1
0
200
400
600
800
1000
additive
2.0
2.2
1.8
1.8
2.0
2.2
1
0
MCMC run/100
June 1999
NCSU QTL Workshop © Brian S.
Yandell
20
40
60
frequency
49
MCMC Idea for QTLs
• construct Markov chain around posterior
– want posterior as stable distribution of Markov chain
– in practice, the chain tends toward stable distribution
• initial values may have low posterior probability
• burn-in period to get chain mixing well
• update one (or several) components at a time
– update effects given genotypes & traits
– update locus given genotypes & traits
– update genotypes give locus & effects
  ( x * ;  , b* ,  2 ;  ) ~  ( x * ;  , b* ,  2 ;  | y )
1   2     N
June 1999
NCSU QTL Workshop © Brian S.
Yandell
50
Markov chain updates
genos
locus
effects
traits
June 1999
NCSU QTL Workshop © Brian S.
Yandell
51
Care & Use of MCMC
• sample chain for long run (100,000-1,000,000)
– longer for more complicated likelihoods
– use diagnostic plots to assess “mixing”
• standard error of estimates
– use histogram of posterior
– compute variance of posterior--just another summary
• studying the Markov chain
– Monte Carlo error of series (Geyer 1992)
• time series estimate based on lagged auto-covariances
– convergence diagnostics for “proper mixing”
June 1999
NCSU QTL Workshop © Brian S.
Yandell
52
Full Conditional for genos
• full conditional for genotype depends on
– effects via trait model
– locus via recombination model
• can explicitly decompose by individual j
– binomial (or trinomial) probability
x*j  1, 0, or 1
*
*
2
*

(
y
|
x
;

,
b
,

)

(
x
j
j
j | )
*
*
2
Pj   ( x j | y j ;  , b ,  ;  ) 
*
2

(
y
|
x
;

,
b
,

) ( x |  )
 j
x  1, 0 ,1
June 1999
NCSU QTL Workshop © Brian S.
Yandell
53
Prior for locus
• prior information from
other studies
– uniform prior over genome
– use framework map
•choose interval proportional
to length
•then pick uniform position
within interval
0.0
0.2
•concentrate on credible
regions
•use posterior of previous
study as new prior
• no prior information on
locus
0
June 1999
20
40
60
distance (cM)
NCSU QTL Workshop © Brian S.
Yandell
80
54
Full Conditional for locus
• cannot easily sample from locus full conditional
• cannot explicitly determine full conditional
– difficult to normalize
– need to consider all possible genotypes over entire map
• Gibbs sampler will not work
– but can get something proportional ...
 ( | y, x * ;  , b * ,  2 )   ( | x * )
n
  ( )   ( x
*
j
| )
j 1
June 1999
NCSU QTL Workshop © Brian S.
Yandell
55
Metropolis-Hastings Step
• pick new locus based upon current locus
– propose new locus from distribution q( )
• pick value near current one?
• pick uniformly across genome?
– accept new locus with probability a()
• Gibbs sampler is special case of M-H
– always accept new proposal
• acceptance insures right stable distribution
  (new | x * )q(new , old ) 

a(old , new )  min 1,
*
  (old | x )q(old , new ) 
June 1999
NCSU QTL Workshop © Brian S.
Yandell
56
MCMC run: 2 loci assuming only 1
1
1
1
1 1 11
1 11 1 1
1
1
11
1 11 1 1 1 1 1 1 1 1
1
1
1
1
1 1
11
1 11 1 1
1 11 1 1 1
1
1 111 1 1 1
1
1 1
1
1
1 111 11
1
11
1
11
1111 11 1 11 11 1 1 1
1
1
1
11 1 11111 1 11 1
1
1
1 11 111 111 11 1 111 1111111 111 11111111
1
111 111
111 111
11
111111
11
111111
11111111111111111111
11111 111111111111111111
1111
1
1
11
111111111
1
1
111
1
11
1
1
1111111
11111
1
1 11
1
111111111111 11
1111111
1
1
1
1
111
1111
111111111
1
1111111
1111111
11
1111
11
1111111
111
11
111
11111111111111111111
11 11111111
1111
111 111111
1111
1
1
1
1
1
111
1
1
1
1
1
1
1
111
1
1
1
1
1
11
111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 11111111 11111
11111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11111111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
11 11111
1111 1 111 11111111111 11111 11111 111111111 1111111111111
1
1
1111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1111 11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1 11
11 1 1 111 11
1 1 1
1
1
1111
1
1 11111 1 1 11 11111 111 1 1 1 111 1 1 111111111 1
1
1
1
1 11 111 11 1 111
1
1 1 11 11 1
1 111 1 1 1
1 11
1 1 1 11 11
1111111 1 11 11111
1
1 1
1
1
1
1 1 11 1
1
1
11
1
1 1 11 1
1 1 1
1 1 1 1 111
1
1
1 1
1
1
1
70
1
50
distance60(cM)
1
50
60
70
1
1
1
1
11
1
40
40
1
1
11
11
1
0
200
400
600
800
1000
0
50
MCMC run/1000
June 1999
NCSU QTL Workshop © Brian S.
Yandell
100
150
200
250
frequency
57
Multiple QTL model
• trait = mean + add1 + add2 + error
• trait = effect_of_genos + error
• prob( trait | genos, effects )
y j    b x  b x  ej
* *
1 j1
* *
2 j2
m
y j    b x  ej
r 1
June 1999
* *
r jr
NCSU QTL Workshop © Brian S.
Yandell
58
4
4
6
6
8
8
10
trait
10
12
12
14
14
16
16
Simulated Data with 2 QTL
-1,-1 -1,1 1,-1 1,1
recombinants
June 1999
0
1
NCSU QTL Workshop © Brian S.
Yandell
2 3 4
frequency
5
6
59
Issues for Multiple QTL
• how many QTL influence a trait?
– 1, several (oligogenic) or many (polygenic)?
– how many are supported by the data?
• searching for 2 or more QTL
– conditional search (IM, CIM)
– simultaneous search (MIM)
• epistasis (inter-loci interaction)
– many more parameters to estimate
– effects of ignored QTL
June 1999
NCSU QTL Workshop © Brian S.
Yandell
60
30
LOD for 2 QTL
0
5
10
15
LOD
20
25
QTL
IM
CIM
0
10
20
30
40
50
60
70
80
90
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
61
Interval Mapping Approach
• interval mapping (IM)
– scan genome for 1 QTL
• composite interval mapping (CIM)
– scan for 1 QTL while adjusting for others
– use markers as surrogates for other QTL
• multiple interval mapping (MIM)
– search for multiple QTL
June 1999
NCSU QTL Workshop © Brian S.
Yandell
62
70
distance
(cM)
60
60
1
1
111111 11 11111111 1 1111111 1 11 1 1111 11 1111111111 111 11 1 1111
1 111
11
11
1
1111
1111111111111
11111
1111
11111111111
1 11111111
11
11111111
111
1111
11
111111
111111111
1111
11
1111
11
1
111
11
111
1
111
111111
11
11111111
1
11
1
111
11
11
11
11
1111
111
111111
11
11111
1111111
1
11
11111
11
111
111111
111
1111
11
11
11
11
1111
1
111111
111
11
111
111
11
111
1
1
11111111
11
1111
1
11111
11
11
11111
1
11
11
1
11111
1
1
111111
1
11111
111
11
11
11
111
1
1
11
1
1
1
111
11
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111111
11
1
1
1
1
1
1
1
1
1
1
11111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111111111111111111111
11111111111
1
11111
1
111
1111111
111111
11111111111
11111111 1
1 1111111111111
11111111111
111
1
11
111 1
1
1
1
1
11111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111 11 111 11 111
1 1 111 1111111 111
1 11 11111
1
11111 111
11
11 1
1
1 11111
1
1
1
1
1
1
1
1
1
1
1
11
11 1
11 111 1
1 1
0
200
400
600
800
1000
40
50
50
40
80
2
22 22 2 2 2
2
22
2
2
2
2
2 2 2222 2
2
22 2
222
22
222 222 2 2
2
2 222 22222222 22222222
22
22 22222 2
22222 2 2 222 2
22 222222222
2
2
2
2
2
2
2
2
2
2
2
2
2
222 2
2
2
2
2
2
2
222
2
2222 2222 222
2 2222 22 222
22 2 222222 222 22222222222222
2 222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22222222222222222 2
22222222222
2222222222222
222222
2 22 22
2 222222222
22
222
22222
22
2 222
222222222222222
2222
2222
2
2222222
222
2 22
2222
22222
222
22222222222
22222
2222222
22222
222222222
222222222
2222
2
222222
22222 22
22222222
2222
222222
222
222
222
2
2
2222
222222222
22222222222
22
2222222222
22222222222222
2
2
2
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22 2222222 22222222222222 2222222
22222222 2 22 22 222
22
2
2222222222
2222222222
2
2 222 222222 2222 2222222
2222
2222222
22222222 22222222 22222222 22
2 22 2222
2
222 22
22 22 22222 2222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2 22
2 22 2
22 22 22
2 2
2
2 2
70
80
MCMC run with 2 loci
0
50
MCMC run/100
June 1999
NCSU QTL Workshop © Brian S.
Yandell
100
150
200
250
300
frequency
63
Bayesian Approach
• simultaneous search for multiple QTL
• use Bayesian paradigm
– easy to consider joint distributions
– easy to modify later for other types of data
• counts, proportions, etc.
– employ MCMC to estimate posterior dist
• study estimates of loci & effects
• use Bayes factors for model selection
– number of QTL
June 1999
NCSU QTL Workshop © Brian S.
Yandell
64
2.6
2.4
2.2
1.8
2.0
2.2
2.0
1.8
effect
2.4
2.6
Effects for 2 Simulated QTL
40
50
60
70
80
effect 1
effect 2
0.0
0.05
0.10
0.15
distance (cM)
40
50
60
70
80
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
65
Brassica napus Data
• 4-week & 8-week vernalization effect
– log(days to flower)
• genetic cross of
– Stellar (annual canola)
– Major (biennial rapeseed)
• 105 F1-derived double haploid (DH) lines
– homozygous at every locus (QQ or qq)
• 10 molecular markers (RFLPs) on LG9
– two QTLs inferred on LG9 (now chromosome N2)
– corroborated by Butruille (1998)
– exploiting synteny with Arabidopsis thaliana
June 1999
NCSU QTL Workshop © Brian S.
Yandell
66
2.5
3.0
2.5
8-week
3.5
3.5
Brassica 4- & 8-week Data
2.5
3.0
3.5
4.0
0
2
4
6
8 10
8-week vernalization
0
2
4
6
8
4-week
2.5 3.0 3.5 4.0
4-week vernalization
June 1999
NCSU QTL Workshop © Brian S.
Yandell
67
8
Brassica Data LOD Maps
8-week
0
2
4
LOD
6
QTL
IM
CIM
0
10
20
30
40
50
60
70
80
90
60
70
80
90
15
distance (cM)
0
5
10
LOD
4-week
QTL
IM
CIM
0
10
20
30
40
50
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
68
4-week vs 8-week vernalization
4-week vernalization
• longer time to flower
• larger LOD at 40cM
• modest LOD at 80cM
• loci well determined
cM
40
80
June 1999
add
.30
.16
•
•
•
•
8-week vernalization
shorter time to flower
larger LOD at 80cM
modest LOD at 40cM
loci poorly determined
cM
40
80
NCSU QTL Workshop © Brian S.
Yandell
add
.06
.13
69
Brassica Credible Regions
8-week
-0.3
-0.6
-0.2
-0.4
-0.2
additive
additive
-0.1
0.0
0.0
0.1
0.2
0.2
4-week
20
40
60
80
20
distance (cM)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
40
60
80
distance (cM)
70
Collinearity of QTLs
• multiple QT genotypes are correlated
– QTL linked on same chromosome
– difficult to distinguish if close
• estimates of QT effects are correlated
– poor identifiability of effects parameters
– correlations give clue of how much to trust
• which QTL to go after in breeding?
– largest effect?
– may be biased by nearby QTL
June 1999
NCSU QTL Workshop © Brian S.
Yandell
71
Brassica effect Correlations
8-week
additive
2
-0.2
-0.1
cor = -0.7
-0.6
-0.3
-0.4
-0.2
additive 2
cor = -0.81
0.0
0.0
4-week
-0.6
-0.4
-0.2
0.0
0.2
-0.2
additive 1
June 1999
-0.1
0.0
0.1
0.2
additive 1
NCSU QTL Workshop © Brian S.
Yandell
72
Bayes Factors
Which model (1 or 2 or 3 QTLs?) has higher
probability of supporting the data?
– ratio of posterior odds to prior odds
– ratio of model likelihoods
 ( model1 | y) /  ( model2 | y)  (y | model1 )
B12 

 ( model1 ) /  ( model2 )
 (y | model2 )
June 1999
BF(1:2)
2log(BF)
evidence for 1st
<1
<0
negative
1 to 3
0 to 2
negligible
3 to 12
2 to 5
positive
12 to 150
5 to 10
strong
> 150
> 10
very strong
NCSU QTL Workshop © Brian S.
Yandell
73
Bayes Factors & LR
• equivalent to LR statistic when
– comparing two nested models
– simple hypotheses (e.g. 1 vs 2 QTL)
• Bayes Information Criteria (BIC) in general
– Schwartz introduced for model selection
– penalty for different number of parameters p
 2 log( B12 )  2 log( LR)  ( p2  p1 ) log( n)
June 1999
NCSU QTL Workshop © Brian S.
Yandell
74
Model Determination
using Bayes Factors
• pick most plausible model
– histogram for range of models
– posterior distribution of models
– use Bayes theorem
– often assume flat prior across models
• posterior distribution of number of QTLs
June 1999
NCSU QTL Workshop © Brian S.
Yandell
75
Brassica Bayes Factors
•compare models for 1, 2, 3 QTL
•Bayes factor and -2log(LR)
•large value favors first model
•8-week vernalization only here
June 1999
i vs. j
Bayes Factor
-log(LR)
2 vs. 1
2.49
7.82
3 vs. 1
.005
7.41
3 vs. 2
.002
4.17
NCSU QTL Workshop © Brian S.
Yandell
76
Computing Bayes Factors
• using samples from prior
– mean across Monte Carlo or MCMC runs
– can be inefficient if prior differs from posterior
 (y | modelk )    (y |  k ; modelk ) ( k | modelk )d k
• using samples from posterior
– weighted harmonic mean more efficient but unstable
– care in choosing weight h() close to posterior
– increase stability: absorb variance (normal to t dist)


h( k )
ˆ (y | modelk )  G 

 g 1  (y | k ; modelk ) ( k | modelk ) 
G
June 1999
NCSU QTL Workshop © Brian S.
Yandell
1
77
MCMC Software
• General MCMC software
– U Bristol links
• http://www.stats.bris.ac.uk/MCMC/pages/links.html
– BUGS (Bayesian inference Using Gibbs Sampling)
• http://www.mrc-bsu.cam.ac.uk/bugs/
• Our MCMC software for QTLs
– C code using LAPACK
• ftp://ftp.stat.wisc.edu/pub/yandell/revjump.tar.gz
– coming soon:
• perl preprocessing (to/from QtlCart format)
• Splus post processing
• Bayes factor computation
June 1999
NCSU QTL Workshop © Brian S.
Yandell
78
Bayes & MCMC References
• CJ Geyer (1992) “Practical Markov chain Monte Carlo”,
Statistical Science 7: 473-511
• L Tierney (1994) “Markov Chains for exploring posterior
distributions”, The Annals of Statistics 22: 1701-1728
(with disc:1728-1762).
• A Gelman, J Carlin, H Stern & D Rubin (1995) Bayesian
Data Analysis, CRC/Chapman & Hall.
• BP Carlin & TA Louis (1996) Bayes and Empirical Bayes
Methods for Data Analysis, CRC/Chapman & Hall.
• WR Gilks, S Richardson, & DJ Spiegelhalter (Ed 1996) Markov
Chain Monte Carlo in Practice, CRC/Chapman & Hall.
June 1999
NCSU QTL Workshop © Brian S.
Yandell
79
QTL References
• D Thomas & V Cortessis (1992) “A Gibbs sampling
approach to linkage analysis”, Hum. Hered. 42: 63-76.
• I Hoeschele & P vanRanden (1993) “Bayesian analysis of
linkage between genetic markers and quantitative trait
loci. I. Prior knowledge”, Theor. Appl. Genet. 85:953-960.
• I Hoeschele & P vanRanden (1993) “Bayesian analysis of
linkage between genetic markers and quantitative trait
loci. II. Combining prior knowledge with experimental
evidence”, Theor. Appl. Genet. 85:946-952.
• SW Guo & EA Thompson (1994) “Monte Carlo estimation
of mixed models for large complex pedigrees”, Biometrics
50: 417-432.
• JM Satagopan, BS Yandell, MA Newton & TC Osborn (1996)
“A Bayesian approach to detect quantitative trait loci using
Markov chain Monte Carlo”, Genetics 144: 805-816.
June 1999
NCSU QTL Workshop © Brian S.
Yandell
80
Download