PowerPoint-presentatie - The Genome Analysis Centre

advertisement

Added value of whole-genome sequence data to genomic predictions in dairy cattle

Rianne van Binsbergen 1,2 , Mario Calus 1 , Chris Schrooten 3 , Fred van Eeuwijk 2 , Roel Veerkamp 1 , Marco Bink 2

1 Animal Breeding & Genetics Centre, Wageningen UR (NL)

2 Biometris, Wageningen UR (NL)

3 CRV (cattle breeding company) , Arnhem (NL)

Genomic Prediction in agricultural species

Reference population:

1) Estimate effects for each SNP (w)

2) Generate a prediction equation that combines all the marker genotypes with their effects to predict the breeding value of each individual

Apply prediction equation to a group of individuals that have genotypes but not phenotypes

 Estimated genomic breeding values

 Select the best individuals for breeding

Each SNP represented by a variable (x), which takes the values

0 [ A A ]

1 [ A B]

2 [B B]

Advantages:

• Select at early age (before phenotypes available)

• Save costs to phenotype candidates

• Increase accuracy of predicted Breeding Values

Goddard & Hayes (2009)

Nature Reviews Genetics 10:381

One seminal paper on Genomic Prediction

Simulation Study

 Dense marker maps

 SNP markers at 1cM density

 Prediction Accuracy

Least Squares method:

Genomic BLUP method:

Bayesian methods(A,B):

0.32

0.73

0.85

 Conclusion:

“selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants , especially if combined with reproductive techniques to shorten the generation interval”

Another (seminal) paper on Genomic Prediction

“In the case of whole-genome sequence data, the polymorphisms that are causing the genetic differences between the individuals are among those being analyzed.”

Higher accuracy in genomic predictions since causal mutation is included (assumption)

No dependency on LD

Persistency across generations

Genomic prediction across breeds

Prediction of Total Genetic Value

Using Genome-Wide Dense Marker

Maps

T. H. E. Meuwissen,* B. J. Hayes† and

M. E. Goddard†,‡

“Only few SNPs were useful for predicting the trait [because they were in linkage disequilibrium (LD) with mutations causing variation in the trait] while many SNPs were not useful.”

Genomic predictions from whole-genome sequence data

Tremendous increase in number of SNPs (more noise)

Large (sequence) data are required

Solution

Sequence core set of individuals (e.g. founders)

Impute whole-genome sequence genotypes of other individuals

Accuracy of imputation to whole-genome sequence data was generally high for imputation from 777K SNP panel

Van Binsbergen, et al. Genet Sel Evol

2014 (in press)

This presentation:

First results of genomic prediction with imputed whole-genome sequence data for 5503 bulls with accurate phenotypes

Dataset: SNP genotypes & trait phenotypes

5503 Holstein Friesian bulls

777K SNP genotypes

(Illumina BovineHD BeadChip)

Imputation - Beagle v4 software

1000 bull genomes project

28M SNP genotypes

429 bulls

(multiple breeds)

5503 Holstein Friesian bulls

12M SNP genotypes

MAF > 0.005

Imputation accuracy > 0.05

De-regressed progeny based proofs (DRP 1 ) and associated effective daughter contributions (EDC 2 )

 Somatic cell score (SCS)

Interval fist and last insemination (IFL)

 Protein yield (PY)

1 VanRaden et al. 2009 (J Dairy Sci)

2 VanRaden and Wiggans 1991 (J Dairy Sci)

Prediction reliability

= squared correlation between original phenotype (DRP) and estimated genetic values (GEBV)

5503 Holstein Friesian bulls

777K SNP genotypes

(Illumina BovineHD BeadChip)

5503 Holstein Friesian bulls

12M SNP genotypes

MAF > 0.005

Imputation accuracy > 0.05

training population

4322 old bulls validation population

1181 young bulls training population

4322 old bulls validation population

1181 young bulls

Validation population

Youngest bulls with EDC  0

Mainly sons of bulls in training population

Mimics breeding practice

Genomic prediction – 2 methods

GBLUP

Genome-enabled best linear unbiased prediction

BSSVS

Bayes stochastic search variable selection

Distribution QTL effects to be close to infinitesimal model (all

SNPs equally small effect)

Build a genomic relationship matrix to model variancecovariance structure

3 chains of 60,000 cycles

(10,000 cycles burn-in)

Large number of SNPs with tiny

(close to zero) and a few SNPs with moderate effects (=mixture of two Normal distributions)

Implementation via

Markov chain Monte Carlo (MCMC) simulation algorithms (computer intensive)

Calus M (2014). Right-hand-side updating for fast computing of genomic breeding values.

Genetics Selection Evolution 46(1): 24.

Computation

GBLUP

777K

HPC – 1 node

SNP

12M

SNP

~ 3 hours

~ 32 GB RAM

HPC – 12 nodes

~ 6 hours

~ 600 GB RAM

3 chains of 60,000 cycles

(10,000 cycles burn-in)

BSSVS (per MCMC chain)

Windows – 1 CPU

~ 5 days

~ 1.6 GB RAM

HPC – 1 node

~ 50 days

~ 32 GB RAM

Windows 7 Enterprise desktop pc:

32 CPU – 8 GB RAM/CPU (clock speed 2.60 GHz)

HPC Linux cluster:

Normal nodes – 64 GB/node (2.60 GHz); 2 fat nodes – 1 TB RAM/node (2.20 GHz)

Results: Prediction Reliability

0,6

0,5

0,4

0,3

0,2

0,1

0,0

SCS IFL

BSSVS: Average over 3 chains of 60,000 cycles

(10,000 cycles burn-in)

PY

BovineHD GBLUP

BovineHD BSSVS

Sequence GBLUP

Sequence BSSVS *

* Based on

45,000 cycles

Results: Prediction Reliability

0,6

0,5

0,4

0,3

0,2

0,1

0,0

SCS IFL PY

BovineHD GBLUP

BovineHD BSSVS

Sequence GBLUP

Sequence BSSVS *

* Based on

45,000 cycles

BSSVS: Convergence & SNP effects

Trace of variance of SNP effects Bayes Factor for SNP effects

777K SNP

12M SNP

3 chains of 60,000 cycles

(10,000 cycles burn-in)

Sequence: 45,000 cycles

Suitability of BSSVS model?

Large number of SNPs with tiny and a few SNPs with moderate effects

Sequence data: Really large number of SNPs with tiny effects

Captures too much signal?

Another Bayesian Prediction Model: Bayes-C

Large number of SNPs with NO effect and a few SNPs with moderate effects

Concentrate on single chromosome (BTA 6)

MCMC convergence

BSSSVS

777K SNP

Bayes-C

12M SNP

Concentrate on single chromosome (BTA 6)

Signal of QTL effects

BSSSVS

777K SNP

Bayes-C

12M SNP

Reliability estimates

BSSVS

BovineHD 0.328

Sequence 0.324

BayesC

0.328

0.325

Conclusions

Genomic prediction using sequence data becomes reality

However, sequence data requires intensive computation

Need for faster algorithms

Use of Sequence Data did not improve Prediction reliability

Convergence issues with BSSVS

Longer chains may yield better results

BSSVS slightly better compared to GBLUP

Preliminary results BTA6 hint that Bayes-C method may work better (than BSSVS) for sequence data

Next Steps: Did we bet on the wrong horse - named BSSVS?

Review choice of priors in BSSVS model.

Apply Bayes-C model to whole genome sequence data

Thanks!

Acknowledgments

1000 bull genomes project

(www.1000bullgenomes.com)

De-regressed proofs (DRP)

Effective daughter contribution (EDC)

𝐷𝑅𝑃 = 𝑃𝐴 + 𝐸𝐵𝑉 − 𝑃𝐴 ∗

𝐸𝐷𝐶

𝐸𝐵𝑉

𝐸𝐷𝐶 𝑝𝑟𝑜𝑔

Parent average

Effective Daughter

Estimated breeding value

Contribution

𝐸𝐷𝐶

𝐸𝐵𝑉

= 𝛼 𝑅𝐸𝐿

𝐸𝐵𝑉

/ 1 − 𝑅𝐸𝐿

𝐸𝐵𝑉

(4 − ℎ 2 )/ℎ 2 Published reliability of EBV

𝐸𝐷𝐶 𝑝𝑟𝑜𝑔

= 𝐸𝐷𝐶

𝐸𝐵𝑉

− 𝐸𝐷𝐶

𝑃𝐴

VanRaden et al. 2009 (J Dairy Sci)

Based on reliability of parents

𝑅𝐸𝐿 𝑠𝑖𝑟𝑒

+ 𝑅𝐸𝐿 𝑑𝑎𝑚

/4

VanRaden and Wiggans 1991 (J Dairy Sci)

Download