QTL Mapping, MAS, and Genomic Selection Dr. Ben Hayes Department of Primary Industries Victoria, Australia A short-course organized by Animal Breeding & Genetics Department of Animal Science Iowa State University June 4-8, 2007 With financial support from Pioneer Hi-bred Int. USE AND ACKNOWLEDGEMENT OF SHORT COURSE MATERIALS Materials provided in these notes are copyright of Dr. Ben Hayes and the Animal Breeding and Genetics group at Iowa State University but are available for use with proper acknowledgement of the author and the short course. Materials that include references to third parties should properly acknowledge the original source. Linkage Disequilbrium to Genomic Selection Course overview • Day 1 – Linkage disequilibrium in animal and plant genomes • Day 2 – QTL mapping with LD • Day 3 – Marker assisted selection using LD • Day 4 – Genomic selection • Day 5 – Genomic selection continued Genomic selection • Introduction • Genomic selection with Least Squares and BLUP • Introduction to Bayesian methods • Genomic selection with Bayesian methods • Comparison of accuracy of methods Genomic selection • Problem with LD-MAS is only a proportion of genetic variance is tracked with markers – Eg. 10 QTL ~ 50% of the genetic variance • Alternative is to trace all segments of the genome with markers – Divide genome into chromosome segments based on marker intervals? – Capture all QTL = all genetic variance Genomic selection • Problem with LD-MAS is only a proportion of genetic variance is tracked with markers – Eg. 10 QTL ~ 50% of the genetic variance • Alternative is to trace all segments of the genome with markers – Divide genome into chromosome segments based on marker intervals? – Capture all QTL = all genetic variance Genomic selection chromosome M M M M M M M M M M M Genomic selection chromosome chromosome segment i M M M M M M M M M M M Genomic selection chromosome M M M M M M M M M M chromosome segment i chromosome segment effects gi 1_1 2_1 1_2 2_2 M Genomic selection • Predict genomic breeding values as sum of effects over all segments p ∧ GEBV = ∑ X i g i i Genomic selection • Predict genomic breeding values as sum of effects over all segments p ∧ GEBV = ∑ X i g i i Number of chromosome segments Genomic selection • Genomic selection can be implemented – with marker haplotypes within chromosome segments p ∧ GEBV = ∑ X i g i i 1_1 0.3 1_2 0.0 2_1 -0.2 2_2 -0.1 Genomic selection • Genomic selection can be implemented – with marker haplotypes within chromosome segments p GEBV = ∑ X i g i i p – with single markers ∧ ∧ GEBV = ∑ X i g i i 2 -0.5 Genomic selection • Genomic selection exploits linkage disequilibrium – Assumption is that effect of haplotypes or markers within chromosome segments will have same effect across the whole population • Possible within dense marker maps now 1_1 0.3 available 1_2 0.0 2_1 -0.2 2_2 -0.1 Genomic selection • Genomic selection avoids bias in estimation of effects due to multiple testing, as all effects fitted simultaneously Genomic selection • First step is to predict the chromosome segment effects in a reference population • Number of effects >>> than number of records • Eg. 10 000 intervals * 4 haplotypes = 40 000 haplotype effects • From ~ 2000 records? • Need methods that can deal with this Genomic selection • Introduction • Genomic selection with Least Squares and BLUP • Introduction to Bayesian methods • Genomic selection with Bayesian methods • Comparison of accuracy of methods Least squares Genomic selection • Two step procedure – Test each chromosome segment for presence of QTL (fitting haplotypes within segment), take significant effects – Fit the significant effects simultaneously in multiple regression – Predict GEBVs • Identical to LD-MAS with multiple markers • Problems remain – Do not capture all QTL – Over-estimation of haplotype effects due to setting of significance threshold Genomic selection with BLUP • BLUP = best linear unbiased prediction • Model: p y = μ1n + ∑ X i g i + e i =1 • In BLUP we assume variance of haplotype effects across all segments is equal, eg E(g) ~ N(0,σg2), where g = [g1g2g3..gp] Genomic selection with BLUP • BLUP = best linear unbiased prediction • Then we can estimate segment effects as: ⎡ ∧ ⎤ ⎡1 '1 μ ⎢∧⎥ = ⎢ n n ⎢ g ⎥ ⎣ X'1n ⎣ ⎦ • λ=σe2 / σg2 −1 1n ' X ⎤ ⎡1n ' y ⎤ ⎥ ⎢ ⎥ X' X + Iλ ⎦ ⎣ X' y ⎦ Genomic selection with BLUP • Example • A “simulated” data set • Single chromosome, with 4 markers defining three chromosome segments. – SNPs, so there are 4 possible haplotypes per segment. • Phenotypes “simulated” – overall mean of 2 – an effect of haplotype 1 in the first segment of 1, an effect of haplotype 1 in the second segment of -0.5, all other haplotypes 0 effect – normally distributed error term with mean 0 and variance 1. Genomic selection with BLUP • Example Animal 1 2 3 4 5 Haplotype segment 1 Haplotype segment 2 Haplotype segment 3 Paternal Paternal Paternal Maternal Maternal Phenotype Maternal 1 1 2 2 1 1 3.41 1 2 1 2 1 1 2.47 2 2 1 1 1 2 2.32 1 3 2 3 2 1 2.32 1 4 1 3 2 1 1.75 • 9 haplotypes observed in total – 4 for the first segment, 3 for the second segment, and 2 for the third segment • Only 5 phenotypic records. Genomic selection with BLUP • Example Animal 1 2 3 4 5 Haplotype segment 1 Haplotype segment 2 Haplotype segment 3 Paternal Paternal Paternal Maternal Maternal Maternal 1 1 2 2 1 1 3.41 1 2 1 2 1 1 2.47 2 2 1 1 1 2 2.32 1 3 2 3 2 1 2.32 1 4 1 3 2 1 1.75 • X Segment 1 haplotypes Animal 1 2 3 4 5 Phenotype Segment 2 haplotypes Segment 3 haplotypes 1 2 3 4 1 2 3 1 2 2 0 0 0 0 2 0 2 0 1 1 0 0 2 0 0 2 0 0 2 0 0 0 2 0 1 1 1 0 1 0 0 1 1 1 1 1 0 0 1 1 0 1 1 1 Genomic selection with BLUP • Example Animal 1 2 3 4 5 Haplotype segment 1 Haplotype segment 2 Haplotype segment 3 Paternal Paternal Paternal Maternal Maternal Maternal 1 1 2 2 1 1 3.41 1 2 1 2 1 1 2.47 2 2 1 1 1 2 2.32 1 3 2 3 2 1 2.32 1 4 1 3 2 1 1.75 • Assume value of 1 for λ • 1n = [1 1 1 1 1] ⎡ ⎤ ⎡1 '1 μ n n ⎢∧⎥ = ⎢ ⎢ g ⎥ ⎣ X'1n ⎣ ⎦ ∧ Phenotype −1 1n ' X ⎤ ⎡1n ' y ⎤ ⎥ ⎢ ⎥ X' X + Iλ ⎦ ⎣ X' y ⎦ Genomic selection with BLUP • Example Effect Estimate ∧ μ Segment 1 Segment 2 Segment 3 Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 4 Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 1 Haplotype 2 2.05 0.2 -0.06 0 -0.14 -0.07 0.21 -0.14 0.19 -0.19 Genomic selection with BLUP • Now we want to predict GEBV for a group of young animals without phenotypes. ∧ GEBV = X g • We have the g_hat, and we can get X from their haplotypes (after genotyping)………… Animal Haplotype segment 1 Haplotype segment 2 Haplotype segment 3 Paternal Paternal Paternal Maternal Maternal Maternal 6 1 2 1 2 1 1 7 1 1 2 2 1 2 8 2 3 2 2 1 2 9 1 4 3 1 1 2 10 2 4 2 2 1 2 Genomic selection with BLUP • Haplotypes Animal Haplotype segment 1 Haplotype segment 2 Haplotype segment 3 Paternal Paternal Paternal Maternal Maternal Maternal 6 1 2 1 2 1 1 7 1 1 2 2 1 2 8 2 3 2 2 1 2 9 1 4 3 1 1 2 10 2 4 2 2 1 2 • X Segment 1 haplotypes Segment 2 haplotypes Segment 3 haplotypes Animal 1 2 3 4 1 2 3 1 2 6 1 1 0 0 1 1 0 2 0 7 2 0 0 0 0 2 0 1 1 8 0 1 1 0 0 2 0 1 1 9 1 0 0 1 1 0 1 1 1 10 0 1 0 1 0 2 0 1 1 Genomic selection with BLUP • GEBV ∧ GEBV = X g ∧ X 1 2 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 1 1 GEBV g 1 0 0 1 0 1 2 2 0 2 0 0 0 1 0 2 1 1 1 1 0 0.2 1 -0.06 1 0 1 -0.14 1 -0.07 0.21 -0.14 0.19 -0.19 = 0.66 0.82 0.36 -0.15 0.22 Genomic selection with BLUP • GEBV Animal 6 7 8 9 10 GEBV TBV 0.66 0.82 0.36 -0.15 0.22 • Corr(GEBV,TBV) = 0.6 0.5 2 0 0.5 0 Genomic selection with BLUP • Where do we get σg2 from? • Can estimate total additive genetic variance and divide by number of segments, eg σg2 = σa2 /p • Model with mutation-drift approaches? – Need assumptions about finite population size, mutation rates Genomic selection • What to assume about the variance of effects across chromosome segments? (σgi2 across the i) ? Genomic selection chromosome M M M M M M M M M M chromosome segment 5 chromosome segment effects g5 1_1 2_1 1_2 2_2 σg52 M Genomic selection chromosome M M M M M M M M M M M chromosome segment 5, 10 chromosome segment effects g5, g10 1_1 2_1 1_1 2_1 1_2 2_2 1_2 2_2 σg52 σg102 Genomic selection chromosome M M M M M M M M M M M chromosome segment 5, 10 chromosome segment effects g5, g10 1_1 2_1 1_1 2_1 1_2 2_2 1_2 2_2 σg52 = σg102 Genomic selection chromosome M M M M M M M M M M M chromosome segment 5, 10 chromosome segment effects g5, g10 g1,g2,g3,g4,g5…….. ~ N(0, σg 1_1 2_1 1_1 2_1 1_2 2_2 1_2 2_2 2) σg52 = σg102 Genomic selection chromosome M M M M M M M M M M M chromosome segment 5, 10 chromosome segment effects g5, g10 1_1 2_1 1_1 2_1 1_2 2_2 1_2 2_2 σg52 ≠ σg102 Genomic selection chromosome M M M M M M M M M M M chromosome segment 5, 10 chromosome segment effects g5, g10 g5~ N(0, σg5 2), g10~ N(0, σg10 2) 1_1 2_1 1_1 2_1 1_2 2_2 1_2 2_2 σg52 ≠ σg102 Bayesian methods • BLUP assumes equal variance of segment effects across segments • Does not incorporate our prior knowledge on the distribution of QTL effects into the model 0.5 Proportion of QTL 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 Size of QTL (phenotypic standard deviations) 1 Bayesian methods • BLUP assumes equal variance of segment effects across segments • Does not incorporate our prior knowledge on the distribution of QTL effects into the model • Bayesian approach does allow us to incorporate prior knowledge Genomic selection • Introduction • Genomic selection with Least Squares and BLUP • Introduction to Bayesian methods • Genomic selection with Bayesian methods • Comparison of accuracy of methods Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) Probability of parameters x given the data y (posterior) Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) Probability of Is proportional to parameters x given the data y (posterior) Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) Probability of Is proportional to Probability of parameters x given data y given the the data y (posterior) x (likelihood of data) Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) Probability of Is proportional to Probability of parameters x given data y given the the data y (posterior) x (likelihood of data) Prior probability of x Bayesian methods • Consider an experiment where we measure height of 10 people to estimate average height • We want to use prior knowledge from many previous studies that average height is 174cm y=average height + e Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) Prior probability of x (average height) 0.09 0.08 0.07 Density 0.06 0.05 0.04 0.03 0.02 0.01 0 160 165 170 175 Height 180 185 190 Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) From the data…… Prior probability of x (average height) 0.09 0.08 0.06 Density x = 178 s.e = 5 0.07 0.05 0.04 0.03 0.02 0.01 0 160 165 170 175 Height 180 185 190 Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) 0.09 0.09 0.08 0.08 0.07 0.07 0.06 0.06 Density L(y|x) Likelihood of data (y) given height x, most likely x = 178cm Prior probability of x (average height) 0.05 0.04 0.05 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 160 165 170 175 Height 180 185 190 0 160 165 170 175 Height 180 185 190 Bayesian methods • Bayes theorem P( x | y ) ∝ P( y | x) P( x) P(x|y) mean = 176cm L(y|x) 0.005 L(y|x) P(Height|y) 0.004 0.003 0.002 0.001 0 160 165 170 175 Height 180 185 190 0.09 0.09 0.08 0.08 0.07 0.07 0.06 0.06 Density 0.006 P(x) 0.05 0.04 0.05 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 160 165 170 175 Height 180 185 190 0 160 165 170 175 Height 180 185 190 Bayesian methods • Bayes theorem • Less certainty about prior information? Use less informative (flat) prior P( x | y ) ∝ P( y | x) P( x) P(x) 0.09 0.009 0.08 0.008 0.07 0.007 0.06 0.006 Density L(y|x) L(y|x) 0.05 0.04 0.005 0.004 0.03 0.003 0.02 0.002 0.01 0.001 0 160 165 170 175 Height 180 185 190 0 160 165 170 175 Height 180 185 190 Bayesian methods • Bayes theorem • Less certainty about prior information? Use less informative (flat) prior P( x | y ) ∝ P( y | x) P( x) P(x|y) mean = 178cm L(y|x) 0.0006 0.0004 L(y|x) P(Height|y) 0.0005 0.0003 0.09 0.009 0.08 0.008 0.07 0.007 0.06 0.006 Density 0.0007 P(x) 0.05 0.04 0.005 0.004 0.03 0.003 0.02 0.002 0.01 0.001 0.0002 0.0001 0 160 165 170 175 Height 180 185 190 0 160 165 170 175 Height 180 185 190 0 160 165 170 175 Height 180 185 190 Bayesian methods • Bayes theorem • More certainty about prior information? Use more informative prior P( x | y ) ∝ P( y | x) P( x) L(y|x) P(x) 0.09 0.45 0.08 0.4 0.07 0.35 0.3 P(height) L(y|x) 0.06 0.05 0.04 0.25 0.2 0.03 0.15 0.02 0.1 0.01 0.05 0 160 165 170 175 Height 180 185 190 0 160 165 170 175 Height 180 185 190 Bayesian methods • Bayes theorem • More certainty about prior information? Use more informative prior P( x | y ) ∝ P( y | x) P( x) P(x|y) mean = 174.5cm L(y|x) P(x) 0.09 0.025 0.02 0.45 0.08 0.4 0.07 0.35 0.3 P(height) L(y|x) P(Height|y) 0.06 0.015 0.05 0.04 0.01 0.005 0 160 165 170 175 Height 180 185 190 0.25 0.2 0.03 0.15 0.02 0.1 0.01 0.05 0 160 165 170 175 Height 180 185 190 0 160 165 170 175 Height 180 185 190 Bayesian methods • Assuming equal variance of effects……… P (g | y ) ∝ P ( y | g ) P (g ) Bayesian methods • Assuming equal variance of effects……… P (g | y ) ∝ P ( y | g ) P (g ) • Consider the previous example Animal 1 2 3 4 5 Haplotype segment 1 Haplotype segment 2 Haplotype segment 3 Paternal Paternal Paternal Maternal Maternal Phenotype Maternal 1 1 2 2 1 1 3.41 1 2 1 2 1 1 2.47 2 2 1 1 1 2 2.32 1 3 2 3 2 1 2.32 1 4 1 3 2 1 1.75 • Lets use the Bayesian approach to estimate the effect of haplotype 1 in segment 1 Bayesian methods • First correct the data for the mean and other haplotype effects: p ∧ y = y j − μ − ∑ xij g i * j i =2 y* 0.61 0.29 0.02 0.24 0.1 Bayesian methods • Example P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 ) Bayesian methods • Example P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 ) 0.45 0.4 0.35 Density 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 1 2 Effect N (0, σ g21 = 1) 3 Bayesian methods • Example P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 ) 1.2 0.45 1 0.35 0.4 0.3 Density 0.8 0.25 0.2 0.6 0.15 0.4 0.05 0.1 0 -3 0.2 -2 -1 0 1 2 Effect 0 -1 -0.5 0 0.5 1 L( y* | g11 ) 1.5 N (0, σ g21 = 1) 3 Bayesian methods • Example P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 ) 1.2 0.45 1 0.35 0.4 0.3 Density 0.8 0.25 0.2 0.6 0.15 0.4 0.05 0.1 0 -3 0.2 -2 -1 0 1 2 Effect 0 -1 -0.5 0 0.5 1 L( y* | g11 ) * 1.5 N (0, σ g21 = 1) 3 Bayesian methods • Example P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 ) 0.45 1.2 0.45 1 0.35 0.4 0.4 0.35 0.3 Density 0.8 0.3 0.25 0.25 0.2 0.6 0.15 0.4 0.05 0.2 0.1 0.15 0 0.1 -3 0.2 -2 1 2 0 0 -0.5 0 Effect 0.05 -1 -1 0 0.5 Posterior -1 -0.5 1.5 1 = 0 0.5 1 L( y* | g11 ) * 1.5 N (0, σ g21 = 1) 3 Bayesian methods • Example P ( g11 | y*) ∝ P ( y* | g11 ) P ( g11 ) Mean = 0.2 0.45 1.2 0.45 1 0.35 0.4 0.4 0.35 0.3 Density 0.8 0.3 0.25 0.25 0.2 0.6 0.15 0.4 0.05 0.2 0.1 0.15 0 0.1 -3 0.2 -2 1 2 0 0 -0.5 0 Effect 0.05 -1 -1 0 0.5 Posterior -1 -0.5 1.5 1 = 0 0.5 1 L( y* | g11 ) * 1.5 N (0, σ g21 = 1) 3 Bayesian methods • Example – Mean of posterior = 0.2 – Same as BLUP! Genomic selection • Introduction • Genomic selection with Least Squares and BLUP • Introduction to Bayesian methods • Genomic selection with Bayesian methods • Comparison of accuracy of methods Bayesian methods • Now lets allow different variances of chromosome segment effects • Need two levels of models – Data P (g , μ | y ) ∝ P ( y | g , μ ) P (g , μ ) – Variances of chromosome segments P (σ | g i ) ∝ P ( g i | σ ) P (σ ) 2 gi 2 gi 2 gi Bayesian methods • Now lets allow different variances of chromosome segment effects • Data P (g , μ | y ) ∝ P ( y | g , μ ) P (g , μ ) 1n '1n ⎡ ⎡ ⎤ ⎢ ⎢ μ∧ ⎥ ⎢ ⎢ g ⎥ ⎢ X1 '1n ⎢ 1⎥ = ⎢ ⎢ . ⎥ ⎢ . ⎢ ∧ ⎥ ⎢ X '1 ⎢⎣g p ⎥⎦ ⎢ p n ⎣ ∧ 1n ' X1 2 e 2 gp σ X1 ' X1 + I σ . X p ' X1 ⎤ ⎥ ⎥ . X1 ' X p ⎥ ⎥ . . σ e2 ⎥ . Xp ' Xp + I 2 ⎥ σ gp ⎥⎦ . 1n ' X p −1 ⎡ 1n ' y ⎤ ⎢X 'y⎥ ⎢ 1 ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢⎣ X p ' y ⎥⎦ Bayesian methods • Variances of chromosome segments P (σ | g i ) ∝ P ( g i | σ ) P (σ ) 2 gi 2 gi 2 gi • Prior? – Inverted chi square is used for variances – Sampling from an inverted chi square with v degrees of freedom and scaled by S, eg. 2 v S/χ – Describes a distribution with mean S/(v-2) and variance 1/(v-4) Bayesian methods • Prior? – Inverted chi square is used for variances – Sampling from an inverted chi square with v degrees of freedom and scaled by S, eg. 2 v S/χ – Describes a distribution with mean S/(v-2) and variance 1/(v-4) – Larger v, more informative prior = more belief about variance Bayesian methods v=2 Bayesian methods v=2 v=20 Bayesian methods • Variances of chromosome segments P (σ | g i ) ∝ P ( g i | σ ) P (σ ) 2 gi • Prior? 2 gi χ 2 gi −2 (v,S ) • We can choose v and S so that the prior reflects our knowledge that there are many QTL of small effect and few of large effect Bayesian methods E(σgi2)=S/(v-2) 0.5 V(σgi2)/ [E(σgi2)]2=2/(v-4) Proportion of QTL 0.4 χ 0.3 0.2 −2 ( 4.012 , 0.002 ) 0.1 0 0 0.2 0.4 0.6 0.8 Size of QTL (phenotypic standard deviations) 1 Bayesian methods • Variances of chromosome segments P (σ | g i ) ∝ P (g i | σ ) P (σ ) 2 gi 2 gi 2 gi • Posterior? – An advantage of choosing the inverse chi-square distribution for the prior is that the posterior will also be an inverse chi-square distribution • Degrees of freedom = prior + data • Scaling factor = sums of squares prior (S) + sums of squares from data Bayesian methods • Variances of chromosome segments P (σ | g i ) ∝ P (g i | σ ) P (σ ) 2 gi 2 gi • Posterior? – ni = number of haplotype effects χ −2 ( v + ni , S + g i 'g i ) 2 gi Bayesian methods • Variances of chromosome segments P (σ | g i ) ∝ P (g i | σ ) P (σ ) 2 gi 2 gi 2 gi • Posterior? χ −2 ( 4.012 + ni , 0.002 + g i 'g i ) • But posterior cannot be estimated directly, dependent on gi!! Bayesian methods • Solution is to use Gibbs sampling – Draw samples from the posterior distributions of parameters conditional on all other effects – The average of these samples can be used as the estimates of the parameters Bayesian methods • Gibbs sampling scheme – Parameters to estimate and their posteriors χ −2 ( 4.012 + ni , 0.002 + g i 'g i ) χ – P(σe2|e,g,μ) ( −2 ( n − 2 ,e i 'e i ) ) ⎛1 ' ⎞ ' 2 N ⎜ 1n y − 1n Xg , σ e / n ⎟ 2 – P(μ|e,g, σe ) ⎝ n ⎠ – P(gij|μ,g≠ij,σgi2,σe2) 0 .4 5 0 .4 0 .3 5 0 .3 P(height) – P(σgi2|gi) 0 .2 5 0 .2 0 .1 5 0 .1 0 .0 5 0 160 165 170 175 180 185 190 H e ig h t ⎞ ⎛ X'ijy−X'ijXg(ij=0) −X'ij1n μ 2 2 2 ⎟ ⎜ N ,σe / Xij'Xij +σe /σgi 2 2 ' ⎟ ⎜ XijXij +σe /σgi ⎠ ⎝ ( ) Bayesian methods • Gibbs sampling scheme – Parameters to estimate and their posteriors χ −2 ( 4.012 + ni , 0.002 + g i 'g i ) χ – P(σe2|e,g,μ) ( −2 ( n − 2 ,e i 'e i ) ) ⎛1 ' ⎞ ' 2 N ⎜ 1n y − 1n Xg , σ e / n ⎟ 2 – P(μ|e,g, σe ) ⎝ n ⎠ 0 .4 5 0 .4 0 .3 5 0 .3 P(height) – P(σgi2|gi) 0 .2 5 0 .2 0 .1 5 0 .1 0 .0 5 0 160 165 170 175 180 185 190 H e ig h t 0.45 0.4 ( ) Density – P(gij|μ,g≠ij,σgi2,σe2) ⎛ X'ijy−X'ijXg(ij=0) −X'ij1n μ 2 ⎞ 2 2 ⎟ + N⎜ , σ / X ' X σ / σ e ij ij e gi ⎟ 2 2 ' ⎜ + X X σ / σ ij ij e gi ⎝ ⎠ 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 E ffect 1 2 3 Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) χ (−42.012+ n ,0.002+ g 'g ) i i i Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) χ (−42.012+ n ,0.002+ g 'g ) i • σg12=0.95 i i Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) – Step 3. Draw a sample from P(σe2|e,g,μ). First calculate the e as e = y − Xg − 1 μ ' n Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) – Step 3. Draw a sample from P(σe2|e,g,μ). First calculate the e as e = y − Xg − 1 μ ' n – Then sample… χ −2 ( n − 2 ,e i 'e i ) Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) – Step 3. Draw a sample from P(σe2|e,g,μ). First calculate the e as e = y − Xg − 1n μ – Then sample… – σe2 = 0.5 χ −2 ( n − 2 ,e i 'e i ) Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) – Step 3. Draw a sample from P(σe2|e,g,μ) – Step 4. Draw a sample from P(μ|g,σe2) Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) – Step 3. Draw a sample from P(σe2|e, g,μ) – Step 4. Draw a sample from P(μ|g,σe2) 0.9 0.8 0.7 0.6 density – μ=-0.1 0.5 0.4 0.3 0.2 0.1 0 -2.5 -2 -1.5 ( -1 -0.5 0 mean 0.5 1 ) 1.5 2 2.5 ⎛1 ⎞ N ⎜ 1'n y − 1'n Xg , σ e2 / n ⎟ ⎝n ⎠ Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, eg. g=0.01 and μ, eg μ=0.01 – Step 2. For each i, draw from P(σgi2|gi) – Step 3. Draw a sample from P(σe2|e,g,μ) – Step 4. Draw a sample from P(μ|g, σe2) – Step 5. For each gij, draw from P(gij|μ,gσgi2,σe2) 0.9 0.8 0.7 0.6 density – g11 = 0.5 0.5 0.4 0.3 0.2 0.1 0 -2.5 -2 -1.5 -1 -0.5 0 mean 0.5 1 1.5 2 2.5 Bayesian methods • The Gibbs chain – Repeat steps 2-5 many times to build up samples from posterior distributions of the parameters Bayesian methods • The Gibbs chain – Repeat steps 2-5 many times to build up samples from posterior distributions of the parameters – Finally, take estimates of parameters as average over many cycles – Discard first ~ 100 cycles as dependent on starting values Bayesian methods • Example – Consider a data set with three markers. The data set was simulated as: – the effect of a 2 allele at the first marker is 3, the effect of a 2 allele at the second marker is 0, and the effect of a 2 allele at the third marker was -2. – the μ was 3 – σe2 was 0.23. The data set was: Bayesian methods • Example Animal 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Phenotype Marker1 allele 1 Marker1 allele 2 Marker2 allele 1 Marker 2 allele 2 Marker3 allele 1 Marker 3 allele 2 9.68 2 2 2 1 1 1 5.69 2 2 2 2 2 2 2.29 1 2 2 2 2 2 3.42 1 1 2 1 1 1 5.92 2 1 1 1 1 1 2.82 2 1 2 1 2 2 5.07 2 2 2 1 2 2 8.92 2 2 2 2 1 1 2.4 1 1 2 2 1 2 9.01 2 2 2 2 1 1 4.24 1 2 1 2 2 1 6.35 2 2 1 1 1 2 8.92 2 2 1 2 1 1 -0.64 1 1 2 2 2 2 5.95 2 1 1 1 1 1 6.13 1 2 2 1 1 1 6.72 2 1 2 1 1 1 4.86 1 2 2 1 1 2 6.36 2 2 2 2 2 2 0.81 1 1 2 1 1 2 9.67 2 2 1 2 1 1 7.74 2 2 2 1 1 2 1.45 1 1 2 2 2 1 1.22 1 1 2 1 2 1 -0.52 1 1 2 2 2 2 Bayesian methods • Example – The Bayesian approach was applied, fitting single marker effects – X matrix • Number of copies of two allele for each animal, eg. 2 1 0 for animal 1. Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 – Step 2. For i=1,2,3, draw from P(σgi2|gi) χ −2 ( 4.012 + ni , 0.002 + g i 'g i ) Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 – Step 2. For i=1,2,3, draw from P(σgi2|gi) χ (−42.012+1,0.002+ 0.001) • σg12=0.002, σg22=0.06, σg32=0.009 Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 – Step 2. For i=1,2,3, draw from P(σgi2|gi) • σg12=0.002, σg22=0.06, σg32=0.009 – Step 3. Draw a sample from P(σe2|e,g,μ) χ −2 ( n − 2 ,e i 'e i ) e = y − Xg − 1n μ Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 – Step 2. For i=1,2,3, draw from P(σgi2|gi) • σg12=0.002, σg22=0.06, σg32=0.009 – Step 3. Draw a sample from P(σe2|e,g,μ) χ • σe2|e= 53.38 −2 ( 23,812.031) Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 – Step 2. For i=1,2,3, draw from P(σgi2|gi) • σg12=0.002, σg22=0.06, σg32=0.009 – Step 3. Draw a sample from P(σe2|e,g,μ) • σe2|e= 53.38 – Step 4. Draw a sample from P(μ|g, σe2) ( • μ=3.25 ) ⎛1 ' ⎞ ' 2 N ⎜ 1n y − 1n Xg , σ e / n ⎟ ⎝n ⎠ Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 – Step 2. For i=1,2,3, draw from P(σgi2|gi) • σg12=0.002, σg22=0.06, σg32=0.009 – Step 3. Draw a sample from P(σe2|e,g,μ) • σe2|e= 53.38 – Step 4. Draw a sample from P(μ|g, σe2) • μ=3.25 – Step 5. Draw a sample from ⎛ X y−X Xg P(gij|μ,g≠ij,σgi2,σe2) N⎜ ⎜ ⎝ ' ij ' ij (ij=0) 2 ' ij ij e −X'ij1n μ X X +σ /σgi2 ⎞ ,σ / Xij'Xij +σ /σ ⎟ ⎟ ⎠ 2 e ( 2 e 2 gi ) Bayesian methods • The Gibbs chain – Step 1. Initialise value of g, μ • g1=0.01, g2=0.01,g3=0.01, μ=0.1 – Step 2. For i=1,2,3, draw from P(σgi2|gi) • σg12=0.002, σg22=0.06, σg32=0.009 – Step 3. Draw a sample from P(σe2|e) • σe2|e= 53.38 – Step 4. Draw a sample from P(μ|g, σe2) • μ=3.25 – Step 5. Draw a sample from P(gij|μ,g≠ij,σgi2,σe2) • g1=-0.02, g2=-0.81,g3=-0.005 Bayesian methods • Gibbs chain for 1000 cycles – P(g1|μ,g≠1,σg12,σe2) Bayesian methods • Gibbs chain for 1000 cycles – P(g1|μ,g≠1,σg12,σe2) “Burn in” Bayesian methods • Gibbs chain for 1000 cycles – P(g1|μ,g≠1,σg12,σe2) ∧ g1 = 2.97 Bayesian methods • Gibbs chain for 1000 cycles ∧ g1 = 2.97 ∧ g 2 = 0.002 ∧ g1 = −1.81 Bayesian methods Vector of SNP effects for calculating GEBV ∧ g1 = 2.97 ∧ g 2 = 0.002 ∧ g1 = −1.81 Bayesian methods • Alternative priors for variance of chromosome segment effects – Meuwissen BayesA – Xu (2003) • Uninformative • Infinite mass at zero? – Te Braak (2006) – Meuwissen BayesB χ (−42.012,0.002 ) χ1−2 χ −2 0.998 Genomic selection • Introduction • Genomic selection with Least Squares and BLUP • Introduction to Bayesian methods • Genomic selection with Bayesian methods • Comparison of accuracy of methods Genomic selection • Comparison of accuracy of methods (Meuwissen et al. 2001) – Genome of 1000 cM simulated, marker spacing of 1 cM. – Markers surrounding each 1-cM region combined into haplotypes. – Due to finite population size (Ne = 100), marker haplotypes were in linkage disequilibrium with QTL between markers. – Effects of chromosome segments predicted in one generation of 2000 animals – Breeding values for progeny of these animals predicted based only on their marker genotypes Genomic selection • Comparison of accuracy of methods (Meuwissen et al. 2001) rTBV;EBV + SE bTBV.EBV + SE LS 0.318 ± 0.018 0.285 ± 0.024 BLUP 0.732 ± 0.030 0.896 ± 0.045 BayesA 0.798 0.827 BayesB 0.848 + 0.012 0.946 + 0.018 Genomic selection • Comparison of accuracy of methods (Meuwissen et al. 2001) – The least squares method does very poorly, primarily because the haplotype effects are over-estimated. Genomic selection • Comparison of accuracy of methods (Meuwissen et al. 2001) – The least squares method does very poorly, primarily because the haplotype effects are over-estimated. – Increased accuracy of the Bayesian approach because method sets many of the effects of the chromosome segments close to zero in BayesA, or zero in BayesB Genomic selection • Comparison of accuracy of methods (Meuwissen et al. 2001) – The least squares method does very poorly, primarily because the haplotype effects are over-estimated. – Increased accuracy of the Bayesian approach because method sets many of the effects of the chromosome segments close to zero in BayesA, or zero in BayesB – Also “shrinks” estimates of effects of other chromosome segments based on a prior distribution of QTL effects. Genomic selection • Comparison of accuracy of methods (Meuwissen et al. 2001) – The least squares method does very poorly, primarily because the haplotype effects are over-estimated. – Increased accuracy of the Bayesian approach because method sets many of the effects of the chromosome segments close to zero in BayesA, or zero in BayesB – Also “shrinks” estimates of effects of other chromosome segments based on a prior distribution of QTL effects. – Accuracies were very high, as high as following progeny testing for example