Least Squares vs. Genomic Selecion - BDPorc

advertisement
EXCELMEAT Workshop
Biosensing Pork Quality
Lleida, 25 October 2012
Comparing methods for estimating gene effects: least squares vs.
genomic selection.
Hernández-Sánchez J1, Pong-Wong R2, Freyer G3, Vagenas D4
1
Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Animal Breeding and Genetics, Spain. 2The
Roslin Institute and The R(D)SVS, Genetics and Genomics, University of Edinburgh, UK. 3Leibniz
Institute for Farm Animal Biology (FBN), Unit Genetics and Biometry, Germany. 4Institute of Health
and Biomedical Innovation, Research Methods Group, Queensland University of Technology,
Australia.
Complex traits arise through direct gene effects, multiple interactions between genes and
interactions between genes and the environment. Genome wide association studies (GWAS) test
thousands of markers, one at a time, to locate those explaining phenotypic variation. Usually,
markers are not the causal mutations but may be in linkage disequilibrium (LD) with them. If many
causal mutations are in LD with a marker, GWAS may be biased. We call this model LSR1. The
magnitude of that bias is 𝑎𝑖′ = ∑𝑀
𝑗=1 𝑎𝑗 𝐷𝑖𝑗 ⁄𝑝𝑖 𝑞𝑖 , where i and j denote loci, there are M loci, aj is the
unbiased additive effect of locus j, pi=1-qi the allele frequency at locus i, and Dij the LD between loci i
and j. The biased would disappear if all causal mutations were included in a single model
simultaneously. We call this model LSRM. However, one does not know M. Moreover, LSRM cannot
handle overparameterised models where M > sample size, and may suffer from collinearity
problems. We explored alternatives such as 1) sampling a fixed number of random markers and
selecting the best model via AIC (AICLSRM), 2) genomic selection (GS) methods that assume random
marker effects and prior distributions with high density of null effects. Three GS models were tested:
Ridge Regression, Bayes-C and Bayesian Lasso. We compared all models in terms of accuracy (bias)
and precision (standard error) of estimated gene effects via simulations. As expected, LSR1 was the
most precise but least accurate method. LSRM was the most accurate method. GS methods ranked
between LSR1 and LSRM in terms of both accuracy and precision. AICLSRM rendered the most
parsimonious model, in which 25% of markers explained most of the genetic variation. More realistic
and complex situations are being investigated, for example testing SNPs not genes.
Download