PowerPoint-presentatie - The Genome Analysis Centre

Added value of whole-genome sequence data to genomic predictions in dairy cattle

Rianne van Binsbergen 1,2 , Mario Calus 1 , Chris Schrooten 3 , Fred van Eeuwijk 2 , Roel Veerkamp 1 , Marco Bink 2

1 Animal Breeding & Genetics Centre, Wageningen UR (NL)

2 Biometris, Wageningen UR (NL)

3 CRV (cattle breeding company) , Arnhem (NL)

Genomic Prediction in agricultural species

Reference population:

1) Estimate effects for each SNP (w)

2) Generate a prediction equation that combines all the marker genotypes with their effects to predict the breeding value of each individual

Apply prediction equation to a group of individuals that have genotypes but not phenotypes

 Estimated genomic breeding values

 Select the best individuals for breeding

Each SNP represented by a variable (x), which takes the values

0 [ A A ]

1 [ A B]

2 [B B]

Advantages:

• Select at early age (before phenotypes available)

• Save costs to phenotype candidates

• Increase accuracy of predicted Breeding Values

Goddard & Hayes (2009)

Nature Reviews Genetics 10:381

One seminal paper on Genomic Prediction

Simulation Study

 Dense marker maps

 SNP markers at 1cM density

 Prediction Accuracy







Least Squares method:

Genomic BLUP method:

Bayesian methods(A,B):

0.32

0.73

0.85

 Conclusion:

“selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants , especially if combined with reproductive techniques to shorten the generation interval”

Another (seminal) paper on Genomic Prediction

“In the case of whole-genome sequence data, the polymorphisms that are causing the genetic differences between the individuals are among those being analyzed.”

Higher accuracy in genomic predictions since causal mutation is included (assumption)



No dependency on LD



Persistency across generations



Genomic prediction across breeds

Prediction of Total Genetic Value

Using Genome-Wide Dense Marker

Maps

T. H. E. Meuwissen,* B. J. Hayes† and

M. E. Goddard†,‡

“Only few SNPs were useful for predicting the trait [because they were in linkage disequilibrium (LD) with mutations causing variation in the trait] while many SNPs were not useful.”

Genomic predictions from whole-genome sequence data



Tremendous increase in number of SNPs (more noise)



Large (sequence) data are required

Solution



Sequence core set of individuals (e.g. founders)



Impute whole-genome sequence genotypes of other individuals

Accuracy of imputation to whole-genome sequence data was generally high for imputation from 777K SNP panel

Van Binsbergen, et al. Genet Sel Evol

2014 (in press)

This presentation:

First results of genomic prediction with imputed whole-genome sequence data for 5503 bulls with accurate phenotypes

Dataset: SNP genotypes & trait phenotypes

5503 Holstein Friesian bulls

777K SNP genotypes

(Illumina BovineHD BeadChip)

Imputation - Beagle v4 software

1000 bull genomes project

28M SNP genotypes

429 bulls

(multiple breeds)



MAF > 0.005

Imputation accuracy > 0.05

De-regressed progeny based proofs (DRP 1 ) and associated effective daughter contributions (EDC 2 )

 Somatic cell score (SCS)



Interval fist and last insemination (IFL)

 Protein yield (PY)

1 VanRaden et al. 2009 (J Dairy Sci)

2 VanRaden and Wiggans 1991 (J Dairy Sci)

Prediction reliability

= squared correlation between original phenotype (DRP) and estimated genetic values (GEBV)


777K SNP genotypes

(Illumina BovineHD BeadChip)



MAF > 0.005

Imputation accuracy > 0.05

training population

4322 old bulls validation population

1181 young bulls training population

4322 old bulls validation population

1181 young bulls

Validation population



Youngest bulls with EDC  0



Mainly sons of bulls in training population



Mimics breeding practice

Genomic prediction – 2 methods

GBLUP



Genome-enabled best linear unbiased prediction

BSSVS



Bayes stochastic search variable selection



Distribution QTL effects to be close to infinitesimal model (all

SNPs equally small effect)



Build a genomic relationship matrix to model variancecovariance structure

3 chains of 60,000 cycles

(10,000 cycles burn-in)



Large number of SNPs with tiny

(close to zero) and a few SNPs with moderate effects (=mixture of two Normal distributions)

Implementation via

Markov chain Monte Carlo (MCMC) simulation algorithms (computer intensive)

Calus M (2014). Right-hand-side updating for fast computing of genomic breeding values.

Genetics Selection Evolution 46(1): 24.

Computation

GBLUP

777K

●

HPC – 1 node

SNP

12M

SNP

●

~ 3 hours

●

~ 32 GB RAM

●

HPC – 12 nodes

●

~ 6 hours

●

~ 600 GB RAM

3 chains of 60,000 cycles

(10,000 cycles burn-in)

BSSVS (per MCMC chain)

●

Windows – 1 CPU

●

~ 5 days

●

~ 1.6 GB RAM

●

HPC – 1 node

●

~ 50 days

●

~ 32 GB RAM

Windows 7 Enterprise desktop pc:

32 CPU – 8 GB RAM/CPU (clock speed 2.60 GHz)

HPC Linux cluster:

Normal nodes – 64 GB/node (2.60 GHz); 2 fat nodes – 1 TB RAM/node (2.20 GHz)

Results: Prediction Reliability

0,6

0,5

0,4

0,3

0,2

0,1

0,0

SCS IFL

BSSVS: Average over 3 chains of 60,000 cycles


PY

BovineHD GBLUP

BovineHD BSSVS

Sequence GBLUP

Sequence BSSVS *

* Based on

45,000 cycles

Results: Prediction Reliability

0,6

0,5

0,4

0,3

0,2

0,1

0,0

SCS IFL PY

BovineHD GBLUP

BovineHD BSSVS

Sequence GBLUP

Sequence BSSVS *

* Based on

45,000 cycles

BSSVS: Convergence & SNP effects

Trace of variance of SNP effects Bayes Factor for SNP effects

777K SNP

12M SNP

3 chains of 60,000 cycles


Sequence: 45,000 cycles

Suitability of BSSVS model?



Large number of SNPs with tiny and a few SNPs with moderate effects

●

Sequence data: Really large number of SNPs with tiny effects



Captures too much signal?



Another Bayesian Prediction Model: Bayes-C

●

Large number of SNPs with NO effect and a few SNPs with moderate effects

Concentrate on single chromosome (BTA 6)

MCMC convergence

BSSSVS

777K SNP

Bayes-C

12M SNP

Concentrate on single chromosome (BTA 6)

Signal of QTL effects

BSSSVS

777K SNP

Bayes-C

12M SNP

Reliability estimates

BSSVS

BovineHD 0.328

Sequence 0.324

BayesC

0.328

0.325

Conclusions



Genomic prediction using sequence data becomes reality

●

However, sequence data requires intensive computation



Need for faster algorithms



Use of Sequence Data did not improve Prediction reliability

●

Convergence issues with BSSVS



Longer chains may yield better results



BSSVS slightly better compared to GBLUP



Preliminary results BTA6 hint that Bayes-C method may work better (than BSSVS) for sequence data

Next Steps: Did we bet on the wrong horse - named BSSVS?



Review choice of priors in BSSVS model.



Apply Bayes-C model to whole genome sequence data

Thanks!

Acknowledgments

1000 bull genomes project

(www.1000bullgenomes.com)

De-regressed proofs (DRP)

Effective daughter contribution (EDC)

𝐷𝑅𝑃 = 𝑃𝐴 + 𝐸𝐵𝑉 − 𝑃𝐴 ∗

𝐸𝐷𝐶

𝐸𝐵𝑉

𝐸𝐷𝐶 𝑝𝑟𝑜𝑔

Parent average

Effective Daughter

Estimated breeding value

Contribution

𝐸𝐷𝐶

𝐸𝐵𝑉

= 𝛼 𝑅𝐸𝐿

𝐸𝐵𝑉

/ 1 − 𝑅𝐸𝐿

𝐸𝐵𝑉

(4 − ℎ 2 )/ℎ 2 Published reliability of EBV

𝐸𝐷𝐶 𝑝𝑟𝑜𝑔

= 𝐸𝐷𝐶

𝐸𝐵𝑉

− 𝐸𝐷𝐶

𝑃𝐴

VanRaden et al. 2009 (J Dairy Sci)

Based on reliability of parents

𝑅𝐸𝐿 𝑠𝑖𝑟𝑒

+ 𝑅𝐸𝐿 𝑑𝑎𝑚

/4

VanRaden and Wiggans 1991 (J Dairy Sci)

PowerPoint-presentatie - The Genome Analysis Centre

One seminal paper on Genomic Prediction

Prediction reliability

Genomic prediction – 2 methods

Computation

Results: Prediction Reliability

Results: Prediction Reliability

BSSVS: Convergence & SNP effects

Suitability of BSSVS model?

Conclusions

Thanks!

Acknowledgments

Related documents

Products

Support

PowerPoint-presentatie - The Genome Analysis Centre

One seminal paper on Genomic Prediction

Prediction reliability

Genomic prediction – 2 methods

Computation

Results: Prediction Reliability

Results: Prediction Reliability

BSSVS: Convergence & SNP effects

Suitability of BSSVS model?

Conclusions

Thanks!

Acknowledgments

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib