QTL Mapping, MAS, and Genomic Selection Dr. Ben Hayes

advertisement
QTL Mapping, MAS, and
Genomic Selection
Dr. Ben Hayes
Department of Primary Industries
Victoria, Australia
A short-course organized by
Animal Breeding & Genetics
Department of Animal Science
Iowa State University
June 4-8, 2007
With financial support from
Pioneer Hi-bred Int.
USE AND ACKNOWLEDGEMENT OF SHORT COURSE MATERIALS
Materials provided in these notes are copyright of Dr. Ben Hayes and the Animal Breeding and Genetics group at Iowa State University but are available for use with
proper acknowledgement of the author and the short course. Materials that include references to third parties should properly acknowledge the original source.
Linkage Disequilbrium to
Genomic Selection
Course overview
• Day 1
– Linkage disequilibrium in animal and plant genomes
• Day 2
– QTL mapping with LD
• Day 3
– Marker assisted selection using LD
• Day 4
– Genomic selection
• Day 5
– Genomic selection continued
Genomic selection
• Introduction
• Genomic selection with Least Squares
and BLUP
• Introduction to Bayesian methods
• Genomic selection with Bayesian
methods
• Comparison of accuracy of methods
Genomic selection
• Problem with LD-MAS is only a
proportion of genetic variance is
tracked with markers
– Eg. 10 QTL ~ 50% of the genetic variance
• Alternative is to trace all segments of
the genome with markers
– Divide genome into chromosome
segments based on marker intervals?
– Capture all QTL = all genetic variance
Genomic selection
• Problem with LD-MAS is only a
proportion of genetic variance is
tracked with markers
– Eg. 10 QTL ~ 50% of the genetic variance
• Alternative is to trace all segments of
the genome with markers
– Divide genome into chromosome
segments based on marker intervals?
– Capture all QTL = all genetic variance
Genomic selection
chromosome
M
M M M M
M M M M M
M
Genomic selection
chromosome
chromosome
segment i
M
M M M M
M M M M M
M
Genomic selection
chromosome
M
M M M M
M M M M M
chromosome
segment i
chromosome
segment effects gi
1_1
2_1
1_2
2_2
M
Genomic selection
• Predict genomic breeding values as
sum of effects over all segments
p
∧
GEBV = ∑ X i g i
i
Genomic selection
• Predict genomic breeding values as
sum of effects over all segments
p
∧
GEBV = ∑ X i g i
i
Number of chromosome segments
Genomic selection
• Genomic selection can
be implemented
– with marker haplotypes
within chromosome
segments
p
∧
GEBV = ∑ X i g i
i
1_1 0.3
1_2 0.0
2_1 -0.2
2_2 -0.1
Genomic selection
• Genomic selection can
be implemented
– with marker haplotypes
within chromosome
segments
p
GEBV = ∑ X i g i
i
p
– with single markers
∧
∧
GEBV = ∑ X i g i
i
2
-0.5
Genomic selection
• Genomic selection exploits linkage
disequilibrium
– Assumption is that effect of haplotypes or
markers within chromosome segments will
have same effect across the whole
population
• Possible within dense marker maps now
1_1 0.3
available
1_2 0.0
2_1 -0.2
2_2 -0.1
Genomic selection
• Genomic selection avoids bias in
estimation of effects due to multiple
testing, as all effects fitted
simultaneously
Genomic selection
• First step is to predict the
chromosome segment effects in a
reference population
• Number of effects >>> than number
of records
• Eg. 10 000 intervals * 4 haplotypes =
40 000 haplotype effects
• From ~ 2000 records?
• Need methods that can deal with this
Genomic selection
• Introduction
• Genomic selection with Least Squares
and BLUP
• Introduction to Bayesian methods
• Genomic selection with Bayesian
methods
• Comparison of accuracy of methods
Least squares Genomic selection
• Two step procedure
– Test each chromosome segment for presence of QTL
(fitting haplotypes within segment), take significant
effects
– Fit the significant effects simultaneously in multiple
regression
– Predict GEBVs
• Identical to LD-MAS with multiple markers
• Problems remain
– Do not capture all QTL
– Over-estimation of haplotype effects due to setting of
significance threshold
Genomic selection with BLUP
• BLUP = best linear unbiased prediction
• Model:
p
y = μ1n + ∑ X i g i + e
i =1
• In BLUP we assume variance of haplotype
effects across all segments is equal, eg
E(g) ~ N(0,σg2), where g = [g1g2g3..gp]
Genomic selection with BLUP
• BLUP = best linear unbiased prediction
• Then we can estimate segment effects as:
⎡ ∧ ⎤ ⎡1 '1
μ
⎢∧⎥ = ⎢ n n
⎢ g ⎥ ⎣ X'1n
⎣ ⎦
• λ=σe2 / σg2
−1
1n ' X ⎤ ⎡1n ' y ⎤
⎥
⎢
⎥
X' X + Iλ ⎦ ⎣ X' y ⎦
Genomic selection with BLUP
• Example
• A “simulated” data set
• Single chromosome, with 4 markers defining three
chromosome segments.
– SNPs, so there are 4 possible haplotypes per segment.
• Phenotypes “simulated”
– overall mean of 2
– an effect of haplotype 1 in the first segment of 1, an
effect of haplotype 1 in the second segment of -0.5, all
other haplotypes 0 effect
– normally distributed error term with mean 0 and variance
1.
Genomic selection with BLUP
• Example
Animal
1
2
3
4
5
Haplotype segment 1
Haplotype segment 2
Haplotype segment 3
Paternal
Paternal
Paternal
Maternal
Maternal
Phenotype
Maternal
1
1
2
2
1
1
3.41
1
2
1
2
1
1
2.47
2
2
1
1
1
2
2.32
1
3
2
3
2
1
2.32
1
4
1
3
2
1
1.75
• 9 haplotypes observed in total
– 4 for the first segment, 3 for the second segment, and 2
for the third segment
• Only 5 phenotypic records.
Genomic selection with BLUP
• Example
Animal
1
2
3
4
5
Haplotype segment 1
Haplotype segment 2
Haplotype segment 3
Paternal
Paternal
Paternal
Maternal
Maternal
Maternal
1
1
2
2
1
1
3.41
1
2
1
2
1
1
2.47
2
2
1
1
1
2
2.32
1
3
2
3
2
1
2.32
1
4
1
3
2
1
1.75
• X
Segment 1 haplotypes
Animal
1
2
3
4
5
Phenotype
Segment 2 haplotypes
Segment 3 haplotypes
1
2
3
4
1
2
3
1
2
2
0
0
0
0
2
0
2
0
1
1
0
0
2
0
0
2
0
0
2
0
0
0
2
0
1
1
1
0
1
0
0
1
1
1
1
1
0
0
1
1
0
1
1
1
Genomic selection with BLUP
• Example
Animal
1
2
3
4
5
Haplotype segment 1
Haplotype segment 2
Haplotype segment 3
Paternal
Paternal
Paternal
Maternal
Maternal
Maternal
1
1
2
2
1
1
3.41
1
2
1
2
1
1
2.47
2
2
1
1
1
2
2.32
1
3
2
3
2
1
2.32
1
4
1
3
2
1
1.75
• Assume value of 1 for λ
• 1n = [1 1 1 1 1]
⎡ ⎤ ⎡1 '1
μ
n n
⎢∧⎥ = ⎢
⎢ g ⎥ ⎣ X'1n
⎣ ⎦
∧
Phenotype
−1
1n ' X ⎤ ⎡1n ' y ⎤
⎥
⎢
⎥
X' X + Iλ ⎦ ⎣ X' y ⎦
Genomic selection with BLUP
• Example
Effect
Estimate
∧
μ
Segment 1
Segment 2
Segment 3
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 4
Haplotype 1
Haplotype 2
Haplotype 3
Haplotype 1
Haplotype 2
2.05
0.2
-0.06
0
-0.14
-0.07
0.21
-0.14
0.19
-0.19
Genomic selection with BLUP
• Now we want to predict GEBV for a group of
young animals without phenotypes.
∧
GEBV = X g
• We have the g_hat, and we can get X from their
haplotypes (after genotyping)…………
Animal
Haplotype segment 1
Haplotype segment 2
Haplotype segment 3
Paternal
Paternal
Paternal
Maternal
Maternal
Maternal
6
1
2
1
2
1
1
7
1
1
2
2
1
2
8
2
3
2
2
1
2
9
1
4
3
1
1
2
10
2
4
2
2
1
2
Genomic selection with BLUP
• Haplotypes
Animal
Haplotype segment 1
Haplotype segment 2
Haplotype segment 3
Paternal
Paternal
Paternal
Maternal
Maternal
Maternal
6
1
2
1
2
1
1
7
1
1
2
2
1
2
8
2
3
2
2
1
2
9
1
4
3
1
1
2
10
2
4
2
2
1
2
• X
Segment 1 haplotypes
Segment 2 haplotypes
Segment 3 haplotypes
Animal
1
2
3
4
1
2
3
1
2
6
1
1
0
0
1
1
0
2
0
7
2
0
0
0
0
2
0
1
1
8
0
1
1
0
0
2
0
1
1
9
1
0
0
1
1
0
1
1
1
10
0
1
0
1
0
2
0
1
1
Genomic selection with BLUP
• GEBV
∧
GEBV = X g
∧
X
1
2
0
1
0
1
0
1
0
1
0
0
1
0
0
0
0
0
1
1
GEBV
g
1
0
0
1
0
1
2
2
0
2
0
0
0
1
0
2
1
1
1
1
0
0.2
1 -0.06
1
0
1 -0.14
1 -0.07
0.21
-0.14
0.19
-0.19
=
0.66
0.82
0.36
-0.15
0.22
Genomic selection with BLUP
• GEBV
Animal
6
7
8
9
10
GEBV
TBV
0.66
0.82
0.36
-0.15
0.22
• Corr(GEBV,TBV) = 0.6
0.5
2
0
0.5
0
Genomic selection with BLUP
• Where do we get σg2 from?
• Can estimate total additive genetic variance and
divide by number of segments, eg σg2 = σa2 /p
• Model with mutation-drift approaches?
– Need assumptions about finite population size, mutation
rates
Genomic selection
• What to assume about the variance of
effects across chromosome segments?
(σgi2 across the i) ?
Genomic selection
chromosome
M
M M M M
M M M M M
chromosome
segment 5
chromosome
segment effects g5
1_1
2_1
1_2
2_2
σg52
M
Genomic selection
chromosome
M
M M M M
M M M M M
M
chromosome
segment 5, 10
chromosome
segment effects
g5, g10
1_1
2_1
1_1
2_1
1_2
2_2
1_2
2_2
σg52
σg102
Genomic selection
chromosome
M
M M M M
M M M M M
M
chromosome
segment 5, 10
chromosome
segment effects
g5, g10
1_1
2_1
1_1
2_1
1_2
2_2
1_2
2_2
σg52
=
σg102
Genomic selection
chromosome
M
M M M M
M M M M M
M
chromosome
segment 5, 10
chromosome
segment effects
g5, g10
g1,g2,g3,g4,g5…….. ~ N(0, σg
1_1
2_1
1_1
2_1
1_2
2_2
1_2
2_2
2)
σg52
=
σg102
Genomic selection
chromosome
M
M M M M
M M M M M
M
chromosome
segment 5, 10
chromosome
segment effects
g5, g10
1_1
2_1
1_1
2_1
1_2
2_2
1_2
2_2
σg52
≠
σg102
Genomic selection
chromosome
M
M M M M
M M M M M
M
chromosome
segment 5, 10
chromosome
segment effects
g5, g10
g5~ N(0, σg5
2),
g10~ N(0, σg10
2)
1_1
2_1
1_1
2_1
1_2
2_2
1_2
2_2
σg52
≠
σg102
Bayesian methods
• BLUP assumes equal variance of segment effects
across segments
• Does not incorporate our prior knowledge on the
distribution of QTL effects into the model
0.5
Proportion of QTL
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
Size of QTL (phenotypic standard deviations)
1
Bayesian methods
• BLUP assumes equal variance of segment effects
across segments
• Does not incorporate our prior knowledge on the
distribution of QTL effects into the model
• Bayesian approach does allow us to incorporate
prior knowledge
Genomic selection
• Introduction
• Genomic selection with Least Squares
and BLUP
• Introduction to Bayesian methods
• Genomic selection with Bayesian
methods
• Comparison of accuracy of methods
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
Probability of
parameters x given
the data y (posterior)
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
Probability of
Is proportional to
parameters x given
the data y (posterior)
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
Probability of
Is proportional to Probability of
parameters x given
data y given the
the data y (posterior)
x (likelihood of
data)
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
Probability of
Is proportional to Probability of
parameters x given
data y given the
the data y (posterior)
x (likelihood of
data)
Prior
probability
of x
Bayesian methods
• Consider an experiment where we measure height
of 10 people to estimate average height
• We want to use prior knowledge from many
previous studies that average height is 174cm
y=average height + e
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
Prior probability of x (average height)
0.09
0.08
0.07
Density
0.06
0.05
0.04
0.03
0.02
0.01
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
From the data……
Prior probability of x (average height)
0.09
0.08
0.06
Density
x = 178
s.e = 5
0.07
0.05
0.04
0.03
0.02
0.01
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
0.09
0.09
0.08
0.08
0.07
0.07
0.06
0.06
Density
L(y|x)
Likelihood of data (y) given
height x, most likely x = 178cm Prior probability of x (average height)
0.05
0.04
0.05
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
160
165
170
175
Height
180
185
190
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Bayes theorem
P( x | y ) ∝ P( y | x) P( x)
P(x|y) mean = 176cm
L(y|x)
0.005
L(y|x)
P(Height|y)
0.004
0.003
0.002
0.001
0
160
165
170
175
Height
180
185
190
0.09
0.09
0.08
0.08
0.07
0.07
0.06
0.06
Density
0.006
P(x)
0.05
0.04
0.05
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
160
165
170
175
Height
180
185
190
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Bayes theorem
• Less certainty about prior information? Use less informative (flat)
prior
P( x | y ) ∝ P( y | x) P( x)
P(x)
0.09
0.009
0.08
0.008
0.07
0.007
0.06
0.006
Density
L(y|x)
L(y|x)
0.05
0.04
0.005
0.004
0.03
0.003
0.02
0.002
0.01
0.001
0
160
165
170
175
Height
180
185
190
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Bayes theorem
• Less certainty about prior information? Use less informative (flat)
prior
P( x | y ) ∝ P( y | x) P( x)
P(x|y) mean = 178cm
L(y|x)
0.0006
0.0004
L(y|x)
P(Height|y)
0.0005
0.0003
0.09
0.009
0.08
0.008
0.07
0.007
0.06
0.006
Density
0.0007
P(x)
0.05
0.04
0.005
0.004
0.03
0.003
0.02
0.002
0.01
0.001
0.0002
0.0001
0
160
165
170
175
Height
180
185
190
0
160
165
170
175
Height
180
185
190
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Bayes theorem
• More certainty about prior information? Use more informative prior
P( x | y ) ∝ P( y | x) P( x)
L(y|x)
P(x)
0.09
0.45
0.08
0.4
0.07
0.35
0.3
P(height)
L(y|x)
0.06
0.05
0.04
0.25
0.2
0.03
0.15
0.02
0.1
0.01
0.05
0
160
165
170
175
Height
180
185
190
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Bayes theorem
• More certainty about prior information? Use more informative prior
P( x | y ) ∝ P( y | x) P( x)
P(x|y) mean = 174.5cm
L(y|x)
P(x)
0.09
0.025
0.02
0.45
0.08
0.4
0.07
0.35
0.3
P(height)
L(y|x)
P(Height|y)
0.06
0.015
0.05
0.04
0.01
0.005
0
160
165
170
175
Height
180
185
190
0.25
0.2
0.03
0.15
0.02
0.1
0.01
0.05
0
160
165
170
175
Height
180
185
190
0
160
165
170
175
Height
180
185
190
Bayesian methods
• Assuming equal variance of effects………
P (g | y ) ∝ P ( y | g ) P (g )
Bayesian methods
• Assuming equal variance of effects………
P (g | y ) ∝ P ( y | g ) P (g )
• Consider the previous example
Animal
1
2
3
4
5
Haplotype segment 1
Haplotype segment 2
Haplotype segment 3
Paternal
Paternal
Paternal
Maternal
Maternal
Phenotype
Maternal
1
1
2
2
1
1
3.41
1
2
1
2
1
1
2.47
2
2
1
1
1
2
2.32
1
3
2
3
2
1
2.32
1
4
1
3
2
1
1.75
• Lets use the Bayesian approach to estimate
the effect of haplotype 1 in segment 1
Bayesian methods
• First correct the data for the mean and other
haplotype effects:
p
∧
y = y j − μ − ∑ xij g i
*
j
i =2
y*
0.61
0.29
0.02
0.24
0.1
Bayesian methods
• Example
P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 )
Bayesian methods
• Example
P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 )
0.45
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
1
2
Effect
N (0, σ g21 = 1)
3
Bayesian methods
• Example
P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 )
1.2
0.45
1
0.35
0.4
0.3
Density
0.8
0.25
0.2
0.6
0.15
0.4
0.05
0.1
0
-3
0.2
-2
-1
0
1
2
Effect
0
-1
-0.5
0
0.5
1
L( y* | g11 )
1.5
N (0, σ g21 = 1)
3
Bayesian methods
• Example
P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 )
1.2
0.45
1
0.35
0.4
0.3
Density
0.8
0.25
0.2
0.6
0.15
0.4
0.05
0.1
0
-3
0.2
-2
-1
0
1
2
Effect
0
-1
-0.5
0
0.5
1
L( y* | g11 ) *
1.5
N (0, σ g21 = 1)
3
Bayesian methods
• Example
P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 )
0.45
1.2
0.45
1
0.35
0.4
0.4
0.35
0.3
Density
0.8
0.3
0.25
0.25
0.2
0.6
0.15
0.4
0.05
0.2
0.1
0.15
0
0.1
-3
0.2
-2
1
2
0
0
-0.5
0
Effect
0.05
-1
-1
0
0.5
Posterior
-1
-0.5 1.5
1
=
0
0.5
1
L( y* | g11 ) *
1.5
N (0, σ g21 = 1)
3
Bayesian methods
• Example
P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 )
Mean = 0.2
0.45
1.2
0.45
1
0.35
0.4
0.4
0.35
0.3
Density
0.8
0.3
0.25
0.25
0.2
0.6
0.15
0.4
0.05
0.2
0.1
0.15
0
0.1
-3
0.2
-2
1
2
0
0
-0.5
0
Effect
0.05
-1
-1
0
0.5
Posterior
-1
-0.5 1.5
1
=
0
0.5
1
L( y* | g11 ) *
1.5
N (0, σ g21 = 1)
3
Bayesian methods
• Example
– Mean of posterior = 0.2
– Same as BLUP!
Genomic selection
• Introduction
• Genomic selection with Least Squares
and BLUP
• Introduction to Bayesian methods
• Genomic selection with Bayesian
methods
• Comparison of accuracy of methods
Bayesian methods
• Now lets allow different variances of
chromosome segment effects
• Need two levels of models
– Data
P (g , μ | y ) ∝ P ( y | g , μ ) P (g , μ )
– Variances of chromosome segments
P (σ | g i ) ∝ P ( g i | σ ) P (σ )
2
gi
2
gi
2
gi
Bayesian methods
• Now lets allow different variances of
chromosome segment effects
• Data
P (g , μ | y ) ∝ P ( y | g , μ ) P (g , μ )
1n '1n
⎡
⎡ ⎤ ⎢
⎢ μ∧ ⎥ ⎢
⎢ g ⎥ ⎢ X1 '1n
⎢ 1⎥ = ⎢
⎢ . ⎥ ⎢ .
⎢ ∧ ⎥ ⎢ X '1
⎢⎣g p ⎥⎦ ⎢ p n
⎣
∧
1n ' X1
2
e
2
gp
σ
X1 ' X1 + I
σ
.
X p ' X1
⎤
⎥
⎥
.
X1 ' X p
⎥
⎥
.
.
σ e2 ⎥
. Xp ' Xp + I 2 ⎥
σ gp ⎥⎦
.
1n ' X p
−1
⎡ 1n ' y ⎤
⎢X 'y⎥
⎢ 1 ⎥
⎢ . ⎥
⎢
⎥
⎢⎣ X p ' y ⎥⎦
Bayesian methods
• Variances of chromosome segments
P (σ | g i ) ∝ P ( g i | σ ) P (σ )
2
gi
2
gi
2
gi
• Prior?
– Inverted chi square is used for variances
– Sampling from an inverted chi square with v
degrees of freedom and scaled by S, eg.
2
v
S/χ
– Describes a distribution with mean S/(v-2) and
variance 1/(v-4)
Bayesian methods
• Prior?
– Inverted chi square is used for variances
– Sampling from an inverted chi square with v
degrees of freedom and scaled by S, eg.
2
v
S/χ
– Describes a distribution with mean S/(v-2) and
variance 1/(v-4)
– Larger v, more informative prior = more belief
about variance
Bayesian methods
v=2
Bayesian methods
v=2
v=20
Bayesian methods
• Variances of chromosome segments
P (σ | g i ) ∝ P ( g i | σ ) P (σ )
2
gi
• Prior?
2
gi
χ
2
gi
−2
(v,S )
• We can choose v and S so that the prior
reflects our knowledge that there are many
QTL of small effect and few of large effect
Bayesian methods
E(σgi2)=S/(v-2)
0.5
V(σgi2)/ [E(σgi2)]2=2/(v-4)
Proportion of QTL
0.4
χ
0.3
0.2
−2
( 4.012 , 0.002 )
0.1
0
0
0.2
0.4
0.6
0.8
Size of QTL (phenotypic standard deviations)
1
Bayesian methods
• Variances of chromosome segments
P (σ | g i ) ∝ P (g i | σ ) P (σ )
2
gi
2
gi
2
gi
• Posterior?
– An advantage of choosing the inverse chi-square
distribution for the prior is that the posterior will
also be an inverse chi-square distribution
• Degrees of freedom = prior + data
• Scaling factor = sums of squares prior (S) + sums of
squares from data
Bayesian methods
• Variances of chromosome segments
P (σ | g i ) ∝ P (g i | σ ) P (σ )
2
gi
2
gi
• Posterior?
– ni = number of haplotype effects
χ
−2
( v + ni , S + g i 'g i )
2
gi
Bayesian methods
• Variances of chromosome segments
P (σ | g i ) ∝ P (g i | σ ) P (σ )
2
gi
2
gi
2
gi
• Posterior?
χ
−2
( 4.012 + ni , 0.002 + g i 'g i )
• But posterior cannot be estimated directly,
dependent on gi!!
Bayesian methods
• Solution is to use Gibbs sampling
– Draw samples from the posterior distributions of
parameters conditional on all other effects
– The average of these samples can be used as the
estimates of the parameters
Bayesian methods
• Gibbs sampling scheme
– Parameters to estimate and their posteriors
χ
−2
( 4.012 + ni , 0.002 + g i 'g i )
χ
– P(σe2|e,g,μ)
(
−2
( n − 2 ,e i 'e i )
)
⎛1 '
⎞
'
2
N ⎜ 1n y − 1n Xg , σ e / n ⎟
2
– P(μ|e,g, σe ) ⎝ n
⎠
– P(gij|μ,g≠ij,σgi2,σe2)
0 .4 5
0 .4
0 .3 5
0 .3
P(height)
– P(σgi2|gi)
0 .2 5
0 .2
0 .1 5
0 .1
0 .0 5
0
160
165
170
175
180
185
190
H e ig h t
⎞
⎛ X'ijy−X'ijXg(ij=0) −X'ij1n μ 2
2
2 ⎟
⎜
N
,σe / Xij'Xij +σe /σgi
2
2
'
⎟
⎜
XijXij +σe /σgi
⎠
⎝
(
)
Bayesian methods
• Gibbs sampling scheme
– Parameters to estimate and their posteriors
χ
−2
( 4.012 + ni , 0.002 + g i 'g i )
χ
– P(σe2|e,g,μ)
(
−2
( n − 2 ,e i 'e i )
)
⎛1 '
⎞
'
2
N ⎜ 1n y − 1n Xg , σ e / n ⎟
2
– P(μ|e,g, σe ) ⎝ n
⎠
0 .4 5
0 .4
0 .3 5
0 .3
P(height)
– P(σgi2|gi)
0 .2 5
0 .2
0 .1 5
0 .1
0 .0 5
0
160
165
170
175
180
185
190
H e ig h t
0.45
0.4
(
)
Density
– P(gij|μ,g≠ij,σgi2,σe2)
⎛ X'ijy−X'ijXg(ij=0) −X'ij1n μ 2
⎞
2
2 ⎟
+
N⎜
,
σ
/
X
'
X
σ
/
σ
e
ij ij
e
gi ⎟
2
2
'
⎜
+
X
X
σ
/
σ
ij ij
e
gi
⎝
⎠
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
E ffect
1
2
3
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
χ (−42.012+ n ,0.002+ g 'g )
i
i
i
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
χ (−42.012+ n ,0.002+ g 'g )
i
• σg12=0.95
i
i
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
– Step 3. Draw a sample from P(σe2|e,g,μ).
First calculate the e as
e = y − Xg − 1 μ
'
n
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
– Step 3. Draw a sample from P(σe2|e,g,μ).
First calculate the e as
e = y − Xg − 1 μ
'
n
– Then sample…
χ
−2
( n − 2 ,e i 'e i )
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
– Step 3. Draw a sample from P(σe2|e,g,μ).
First calculate the e as
e = y − Xg − 1n μ
– Then sample…
– σe2 = 0.5
χ
−2
( n − 2 ,e i 'e i )
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
– Step 3. Draw a sample from P(σe2|e,g,μ)
– Step 4. Draw a sample from P(μ|g,σe2)
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
– Step 3. Draw a sample from P(σe2|e, g,μ)
– Step 4. Draw a sample from P(μ|g,σe2)
0.9
0.8
0.7
0.6
density
– μ=-0.1
0.5
0.4
0.3
0.2
0.1
0
-2.5
-2
-1.5
(
-1
-0.5
0
mean
0.5
1
)
1.5
2
2.5
⎛1
⎞
N ⎜ 1'n y − 1'n Xg , σ e2 / n ⎟
⎝n
⎠
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, eg. g=0.01
and μ, eg μ=0.01
– Step 2. For each i, draw from P(σgi2|gi)
– Step 3. Draw a sample from P(σe2|e,g,μ)
– Step 4. Draw a sample from P(μ|g, σe2)
– Step 5. For each gij, draw from
P(gij|μ,gσgi2,σe2)
0.9
0.8
0.7
0.6
density
– g11 = 0.5
0.5
0.4
0.3
0.2
0.1
0
-2.5
-2
-1.5
-1
-0.5
0
mean
0.5
1
1.5
2
2.5
Bayesian methods
• The Gibbs chain
– Repeat steps 2-5 many times to build up
samples from posterior distributions of the
parameters
Bayesian methods
• The Gibbs chain
– Repeat steps 2-5 many times to build up
samples from posterior distributions of the
parameters
– Finally, take estimates of parameters as
average over many cycles
– Discard first ~ 100 cycles as dependent on
starting values
Bayesian methods
• Example
– Consider a data set with three markers. The data
set was simulated as:
– the effect of a 2 allele at the first marker is 3, the
effect of a 2 allele at the second marker is 0, and
the effect of a 2 allele at the third marker was -2.
– the μ was 3
– σe2 was 0.23. The data set was:
Bayesian methods
• Example
Animal
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Phenotype Marker1 allele 1 Marker1 allele 2 Marker2 allele 1 Marker 2 allele 2 Marker3 allele 1 Marker 3 allele 2
9.68
2
2
2
1
1
1
5.69
2
2
2
2
2
2
2.29
1
2
2
2
2
2
3.42
1
1
2
1
1
1
5.92
2
1
1
1
1
1
2.82
2
1
2
1
2
2
5.07
2
2
2
1
2
2
8.92
2
2
2
2
1
1
2.4
1
1
2
2
1
2
9.01
2
2
2
2
1
1
4.24
1
2
1
2
2
1
6.35
2
2
1
1
1
2
8.92
2
2
1
2
1
1
-0.64
1
1
2
2
2
2
5.95
2
1
1
1
1
1
6.13
1
2
2
1
1
1
6.72
2
1
2
1
1
1
4.86
1
2
2
1
1
2
6.36
2
2
2
2
2
2
0.81
1
1
2
1
1
2
9.67
2
2
1
2
1
1
7.74
2
2
2
1
1
2
1.45
1
1
2
2
2
1
1.22
1
1
2
1
2
1
-0.52
1
1
2
2
2
2
Bayesian methods
• Example
– The Bayesian approach was applied, fitting
single marker effects
– X matrix
• Number of copies of two allele for each animal,
eg. 2 1 0 for animal 1.
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
– Step 2. For i=1,2,3, draw from P(σgi2|gi)
χ
−2
( 4.012 + ni , 0.002 + g i 'g i )
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
– Step 2. For i=1,2,3, draw from P(σgi2|gi)
χ (−42.012+1,0.002+ 0.001)
• σg12=0.002, σg22=0.06, σg32=0.009
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
– Step 2. For i=1,2,3, draw from P(σgi2|gi)
• σg12=0.002, σg22=0.06, σg32=0.009
– Step 3. Draw a sample from P(σe2|e,g,μ)
χ
−2
( n − 2 ,e i 'e i )
e = y − Xg − 1n μ
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
– Step 2. For i=1,2,3, draw from P(σgi2|gi)
• σg12=0.002, σg22=0.06, σg32=0.009
– Step 3. Draw a sample from P(σe2|e,g,μ)
χ
• σe2|e= 53.38
−2
( 23,812.031)
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
– Step 2. For i=1,2,3, draw from P(σgi2|gi)
• σg12=0.002, σg22=0.06, σg32=0.009
– Step 3. Draw a sample from P(σe2|e,g,μ)
• σe2|e= 53.38
– Step 4. Draw a sample from P(μ|g, σe2)
(
• μ=3.25
)
⎛1 '
⎞
'
2
N ⎜ 1n y − 1n Xg , σ e / n ⎟
⎝n
⎠
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
– Step 2. For i=1,2,3, draw from P(σgi2|gi)
• σg12=0.002, σg22=0.06, σg32=0.009
– Step 3. Draw a sample from P(σe2|e,g,μ)
• σe2|e= 53.38
– Step 4. Draw a sample from P(μ|g, σe2)
• μ=3.25
– Step 5. Draw a sample from
⎛ X y−X Xg
P(gij|μ,g≠ij,σgi2,σe2)
N⎜
⎜
⎝
'
ij
'
ij
(ij=0)
2
'
ij ij
e
−X'ij1n μ
X X +σ /σgi2
⎞
,σ / Xij'Xij +σ /σ ⎟
⎟
⎠
2
e
(
2
e
2
gi
)
Bayesian methods
• The Gibbs chain
– Step 1. Initialise value of g, μ
• g1=0.01, g2=0.01,g3=0.01, μ=0.1
– Step 2. For i=1,2,3, draw from P(σgi2|gi)
• σg12=0.002, σg22=0.06, σg32=0.009
– Step 3. Draw a sample from P(σe2|e)
• σe2|e= 53.38
– Step 4. Draw a sample from P(μ|g, σe2)
• μ=3.25
– Step 5. Draw a sample from
P(gij|μ,g≠ij,σgi2,σe2)
• g1=-0.02, g2=-0.81,g3=-0.005
Bayesian methods
• Gibbs chain for 1000 cycles
– P(g1|μ,g≠1,σg12,σe2)
Bayesian methods
• Gibbs chain for 1000 cycles
– P(g1|μ,g≠1,σg12,σe2)
“Burn in”
Bayesian methods
• Gibbs chain for 1000 cycles
– P(g1|μ,g≠1,σg12,σe2)
∧
g1 = 2.97
Bayesian methods
• Gibbs chain for 1000 cycles
∧
g1 = 2.97
∧
g 2 = 0.002
∧
g1 = −1.81
Bayesian methods
Vector of SNP effects for calculating GEBV
∧
g1 = 2.97
∧
g 2 = 0.002
∧
g1 = −1.81
Bayesian methods
• Alternative priors for variance of
chromosome segment effects
– Meuwissen BayesA
– Xu (2003)
• Uninformative
• Infinite mass at zero?
– Te Braak (2006)
– Meuwissen BayesB
χ (−42.012,0.002 )
χ1−2
χ
−2
0.998
Genomic selection
• Introduction
• Genomic selection with Least Squares
and BLUP
• Introduction to Bayesian methods
• Genomic selection with Bayesian
methods
• Comparison of accuracy of methods
Genomic selection
• Comparison of accuracy of methods
(Meuwissen et al. 2001)
– Genome of 1000 cM simulated, marker
spacing of 1 cM.
– Markers surrounding each 1-cM region
combined into haplotypes.
– Due to finite population size (Ne = 100),
marker haplotypes were in linkage
disequilibrium with QTL between markers.
– Effects of chromosome segments
predicted in one generation of 2000
animals
– Breeding values for progeny of these
animals predicted based only on their
marker genotypes
Genomic selection
• Comparison of accuracy of methods
(Meuwissen et al. 2001)
rTBV;EBV + SE bTBV.EBV + SE
LS
0.318 ± 0.018 0.285 ± 0.024
BLUP 0.732 ± 0.030 0.896 ± 0.045
BayesA
0.798
0.827
BayesB 0.848 + 0.012 0.946 + 0.018
Genomic selection
• Comparison of accuracy of methods
(Meuwissen et al. 2001)
– The least squares method does very poorly,
primarily because the haplotype effects are
over-estimated.
Genomic selection
• Comparison of accuracy of methods
(Meuwissen et al. 2001)
– The least squares method does very poorly,
primarily because the haplotype effects are
over-estimated.
– Increased accuracy of the Bayesian approach
because method sets many of the effects of
the chromosome segments close to zero in
BayesA, or zero in BayesB
Genomic selection
• Comparison of accuracy of methods
(Meuwissen et al. 2001)
– The least squares method does very poorly,
primarily because the haplotype effects are
over-estimated.
– Increased accuracy of the Bayesian approach
because method sets many of the effects of
the chromosome segments close to zero in
BayesA, or zero in BayesB
– Also “shrinks” estimates of effects of other
chromosome segments based on a prior
distribution of QTL effects.
Genomic selection
• Comparison of accuracy of methods
(Meuwissen et al. 2001)
– The least squares method does very poorly,
primarily because the haplotype effects are
over-estimated.
– Increased accuracy of the Bayesian approach
because method sets many of the effects of
the chromosome segments close to zero in
BayesA, or zero in BayesB
– Also “shrinks” estimates of effects of other
chromosome segments based on a prior
distribution of QTL effects.
– Accuracies were very high, as high as
following progeny testing for example
Download