6/16/2009 Genomic Prediction & Selection TRAINING & VALIDATION Agenda • A Bioinformatics System to Implement B Bayesian methods for Genomic Prediction i th d f G i P di ti • Results from various methods applied to real and simulated data • Training on EBVs, deregressed data and weighting information in training weighting information in training 1 6/16/2009 Bioinformatics Infrastructure • Research from analysis of high‐density genotypes t to predict merit has several objectives di t it h l bj ti – Determine predictive ability of • same‐density panels in validation/target populations closely related to the training population • same‐density panels in validation/target populations less related or unrelated to the training population related or unrelated to the training population • low‐density panels in populations closely related to the training population – Motivate other genomic selection research Bioinformatics Infrastructure • Identify informative regions for fine‐mapping and gene discovery d di • Provide a platform for collaborating (beef) researchers to undertake genomic training – eg US Meat Animal Research Center • Provide Provide a platform for delivering genomic a platform for delivering genomic predictions to (the beef) industry 2 6/16/2009 Various Methods y = Xb + ∑ Mi a i + e estimate σ ai2 and σ e2 BayesA y = Xb + ∑ Mi a iδ i + e estimate δ i , σ ai2 and σ e2 BayesB estimate δ i , σ a2 and σ e2 BayesC estimate π , δ i , σ a2 and σ e2 BayesCPi Various Methods Markers in Model Marker Effects All (π=0) Fraction (1‐π) Random ‐ Individual Variance (Normal) “Bayes A” (B0) “Bayes B” Random ‐ Constant Var (when in model) Bayes C (C0)=“BLUP” Bayes C Random – Constant Var (when in model) Fraction (1‐π) estimated from data=Bayes CPi Categorical Variants (threshold models) Other Variants (estimate scale, heavy tails) Fixed (LS MSE ) Fixed (LS PRESS) Any π options (Bayes Any π options (Bayes K and Bayes K and Bayes T) StepWise PRESS Bayes A & B are not really truly Bayesian – Bayes C and Bayes Cpi are (Gianola) 3 6/16/2009 Pi influences convergence Priors Influence Shrinkage • Influence varies with method and prior degrees of freedom 4 Bayes A df=4 Bayes A df=3 Bayes A df==4 Bayes A df==3 6/16/2009 9 Bayes B df=4 ∏=0.99 10 5 6/16/2009 Bayes C0 11 Bayes C df=4 ∏=0.99 12 6 6/16/2009 BayesB then BayesA (100 markers) “Heritability” for 100 markers chosen for trait in row, applied to trait in column 0.64 0.50 0.23 0.33 0.29 0.22 0.45 0.30 0.24 0.53 0.61 0.24 0.33 0.29 0.23 0.45 0.30 0.26 0.27 0.29 0.57 0.33 0.29 0.22 0.36 0.30 0.25 0.27 0.27 0.23 0.67 0.29 0.26 0.42 0.30 0.29 0.28 0.24 0.23 0.33 0.57 0.25 0.40 0.35 0.27 0.27 0.29 0.26 0.33 0.29 0.53 0.42 0.30 0.25 0.29 0.29 0.23 0.33 0.29 0.25 0.70 0.26 0.25 0.29 0.27 0.24 0.33 0.29 0.22 0.36 0.63 0.24 0.32 0.27 0.26 0.33 0.29 0.25 0.42 0.30 0.65 13 Bayes B then Bayes A (100 markers) Correlation in training data chosen for trait in row applied to trait in column 0.79 0.68 0.37 0.41 0.42 0.33 0.56 0.46 0.39 0.69 0.76 0.38 0.4 0.44 0.34 0.54 0.42 0.41 0.39 0.41 0.77 0.4 0.39 0.35 0.5 0.4 0.39 0.36 0.36 0.35 0.78 0.41 0.41 0.53 0.45 0.43 0 41 0.41 04 0.4 0 38 0.38 0 36 0.36 0 79 0.79 0 39 0.39 0 51 0.51 0 51 0.51 0 41 0.41 0.39 0.4 0.39 0.45 0.41 0.72 0.55 0.41 0.38 0.41 0.4 0.35 0.45 0.4 0.41 0.87 0.4 0.41 0.43 0.41 0.37 0.4 0.48 0.37 0.5 0.79 0.37 0.44 0.4 0.39 0.44 0.38 0.37 0.5 0.45 0.78 14 7 6/16/2009 1st attempt Cross Validation • Dataset 1 comprising 8 breeds • Select best 100 markers in all data using BayesB g y ✓ B1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ B3 ✓ ✓ B4 ✓ ✓ ✓ B5 ✓ ✓ ✓ ✓ B6 ✓ ✓ ✓ ✓ ✓ B7 ✓ ✓ ✓ ✓ ✓ ✓ B8 ✓ ✓ ✓ ✓ ✓ ✓ ✓ Validation B1 B2 B3 B4 B5 B6 B7 Trraining B2 ✓ B8 Bayes B then Bayes A (100 markers) markers in row chosen from Bayes B on all data, Bayes A trained in cross‐ validation for trait in column, predicting merit in omitted data 0.66 0.53 -0.02 0.09 0.02 -0.06 0.07 0.08 -0.03 0.53 0.65 0.01 0.03 0.1 -0.02 0.06 -0.02 0.06 0.01 0.03 0.68 0.02 -0.03 -0.02 -0.04 -0.01 -0.05 -0.05 -0.06 0.01 0.68 0.02 0.04 0.02 0.08 0.11 0.09 0.07 -0.02 0 0.68 0.04 0 0.2 0.04 -0.02 0.01 0.06 0.14 0.08 0.58 0.11 0.03 -0.03 -0.01 0.01 -0.04 0.14 0 0.1 0.74 -0.07 0.04 0.06 0.05 0.01 0.05 0.22 0.07 0.06 0.69 -0.05 0.08 -0.02 0.02 0.15 -0.08 -0.01 0.01 0.14 0.7 16 8 6/16/2009 StepWise then BayesA 17 StepWise then BayesA Successive datasets have previously best markers removed 18 9 6/16/2009 StepWise and BayesA 19 Proper cross‐validation • Marker subset selection and marker estimation are undertaken on each training ti ti d t k ht i i data subset and used to predict “virgin” data • Correlation dropped to 0.18 (at best) when properly (100 marker subset chosen in training data) cross‐validated g ) 20 10 6/16/2009 Training and Validation Purebred (PB) Purebred (PB) PB Î PB 50K SNP Validation • Almost always SNP that spuriously fit the data well – Having a model that fits the training data well provides relatively little information about how good the prediction will be in new data • Many world‐changing research discoveries are announced in news releases and then never‐to‐be‐ h d f heard‐of‐again i • Training & Validation can be done together to quantify the likely confidence in predictions 11 6/16/2009 Cross Validation Training • Partition the dataset (by sire) into say three groups G1 G2 ✓ G3 ✓ Validation Derive g‐EPD Compute the Compute the correlation between predicted genetic merit from g‐EPD and observed performance G1 Cross Validation Training • Every animal is in exactly one validation set ✓ G1 G2 ✓ G3 ✓ ✓ G1 G2 Validation ✓ ✓ G3 12 6/16/2009 Cross‐Validation • 1800 bulls with EPDs ‐ split into 3 – At random – By sire ID ‐ sire of bulls nested in subset – By sire ID ‐ sires also fitted as fixed effects – By time ‐ oldest, middle‐aged, youngest 25 Results 41028m Random Sire Sire+cg Time Bayes y A (B0) ( ) 0.745 0.726 0.646 0.732 Bayes B (.99) 0.722 0.700 0.618 0.712 Bayes C0 0.746 0.728 0.648 0.730 Bayes C(.50) 0.746 0.728 0.647 0.730 Bayes C(.99) 0.728 0.708 0.625 0.717 C.99/C100m 0.553 0.567 0.389 0.583 StepWise 0.547 0.558 0.393 0.542 PRESS 0.523 0.539 0.365 0.574 100m 26 13 6/16/2009 Simulated SNP Results ‐ 1184 QTL 52566 markers Number of training animals π=0.977 1000 2000 3000 4000 B(true) 0.65 0.76 0.82 0.84 C(true) 0.62 0.74 0.80 0.83 B(inflated) 0.63 0.75 0.80 0.83 C(inflated) 0.60 0.71 0.77 0.80 B(0.50) 0.62 0.74 0.79 0.82 C(0.50) 0.60 0.70 0.75 0.78 B(0) 0.64 0.74 0.79 0.81 C(0) 0.59 0.70 0.75 0.78 True=#QTL/#markers; inflated=0.9 true; heritability=0.5 (Christian Stricker for Swiss Cattle Breeders) 27 Simulated Results 2000 animals Number of QTL 171 493 1184 B(true) 0.88 0.82 0.76 C(true) 0.88 0.81 0.74 B(inflated) 0.84 0.79 0.75 C(inflated) 0.70 0.74 0.71 B(0.50) 0.81 0.78 0.74 C(0 50) C(0.50) 0 65 0.65 0 72 0.72 0 70 0.70 B(0) 0.82 0.77 0.74 C(0) 0.64 0.72 0.70 True=#QTL/#markers; inflated=0.9 true; heritability=0.5 (Christian Stricker for Swiss Cattle Breeders) 28 14 6/16/2009 50k within‐breed predictions 50k within‐breed predictions • These predictions are characterized by correlations between genomic merit and realized correlations between genomic merit and realized performance from 0.5 to 0.7 – They will account for 25 (0.52) to 50% (0.72) genetic variation – Compared to a trait with heritability of 25%, the genomic predictions would be equivalent to observing 6 t 15 ff i i 6 to 15 offspring in a progeny test t t • Correlations of 0.7 are similar to the performance of genomic predictions in dairy cattle 15 6/16/2009 50k within‐breed predictions • These predictions are not as highly accurate as can be achieved in a well designed and b hi d i ll d i d d managed progeny test, say with 100 or more offspring • However, for many traits they are much more reliable for animals of a young age (eg y g g ( gp prior to first selection) than is currently achievable from individual performance Across‐breed prediction • Refers to the process of predicting performance for a breed or cross that was not in the training dataset breed or cross that was not in the training dataset • Critical interest to those selecting breeds that are not well represented in the training populations • May not be as reliable as within‐breed predictions due to complexities associated with non‐additive genetic effects (dominance and epistasis) • Potential can be assessed by simulating the effects of P t ti l b d b i l ti th ff t f major genes using real SNP genotypes on various populations 16 6/16/2009 Introduction • Toosi et al.,(2008) simulated genotypic and phenotypic p yp data – Training in crossbred and MB populations – Successful selection of PB for MB performance • Linkage Disequilibrium (LD) – Simulated LD in pure and MB populations may not accurately reflect real LD in beef cattle populations Objective Training Populations Î Validation Populations Multi‐breed (MB) Purebred (PB) MB Î PB MB Î 50K SNP Purebred (PB) Multi‐breed (MB) PB Î MB 50K SNP 17 6/16/2009 50K SNP Datasets MB Population (N=924) PB Population (N=1086) Angus Angus 239 Brahman 10 Charolais 183 Hereford 78 Limousin 45 1086 Maine-Anjou 137 Shorthorn 97 South Devon 135 Simulation of Additive Genetic Merit and Phenotypic Performance 50K SNP 50K SNP X1 X2 X3 Xk X6 X50K 0, 1 or 2 SNP chosen at random QTL 50, 100, 250, 500 QTL1 QTL2 QTLj X6 Xk xijα j xi1α1 xi 2α 2 Additive Genetic Merit N Phenotypic performance yi = ∑ xijα j + ei j =1 18 6/16/2009 Marker Panels 50K SNP X1 X2 X3 X5 X6 Xk LD=r2 50K w/o QTL X50K Xk HLD1 HLD2 HLDj X5 X1 X2 X3 Xm QTLj X6 HLD50, 100, 250, 500 X50K LD=r2 QTL1 QTL2 QTL 50, 100, 250, 500 Xm Xm X5 N Bayesian Analysis y = 1μ + ∑ x jα jδ j + e j =1 Simulated Phenotypes/real 50k Data • Effect of number of available markers 50 QTL Just QTL QTL + Best markers Q QTL + 50k Train in Multibreed Validate in Purebreed 0.953 0 931 0.931 0.766 Train in Purebreed Validate in Multibreed 0.962 0 938 0.938 0.842 19 6/16/2009 Simulated Phenotypes/real 50k Data • Effect of number of available markers 50 QTL Just QTL QTL + Best markers Q QTL + 50k Just Best markers 50k w/o QTL (real life) Train in Multibreed Train in Purebreed Validate in Purebreed Validate in Multibreed 0.953 0 931 0.931 0.766 0.570 0.388 0.962 0 938 0.938 0.842 0.489 0.422 Kizilkaya et al, ASAS, 2009 Effect of number of available markers • Redundant markers reduce accuracy – Increased type I errors • Accuracy suffers greatly when QTL not on panel – Not enough markers of sufficiently high LD to act as good proxies on a one‐for‐one basis g p • Multibreed population generally inferior to purebred 20 6/16/2009 Purebred or Crossbred Highest LD markers for random QTL with Training in Purebred 1.2 y=1 1.2018x 2018x - 0.371 0 371 R² = 0.48309 Means (r) Purebred = 0.717 Multi-breed = 0.491 1 Multi-Breed (r) 0.8 Few QTL with LD <0.4 in training 0.6 0.4 Many markers erode in validation population 0.2 0 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Purebred (r) Purebred or Crossbred Highest LD markers for random QTL with Training in Crossbred 1.2 y = 0.8775x + 0.0406 R² = 0.39451 Means (r) Multi-breed = 0.625 Purebred = 0.589 1 Purebred (r) 0.8 0.6 Many QTL with LD <0.4 in training 0.4 Most markers still robust still robust in validation population 0.2 0 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Multi-Breed (r) 21 6/16/2009 Effect of number of available markers • Easier Easier to find high LD markers in purebreds than to find high LD markers in purebreds than multibreed populations because average LD is higher – Favors the use of purebred populations – Necessitates higher density SNP panels in multibreeds • Markers chosen in purebreds may be less informative in multibreed populations as they will have less LD • Markers that work well in multibreed populations seem p p to work just as well in purebred populations • Nice to have larger multibreed populations & denser panels Correlations between true and predicted genetic merits in validation population Panel: QTL QTL 50 100 250 500 MBÎPB 0.953 0.938 0 840 0.840 0.720 PBÎMB 0.962 0.941 0 853 0.853 0.786 22 6/16/2009 Simulated Phenotypes/real 50k Data • Effect of number of QTL Effect of number of QTL 50k w/o QTL 50 QTL Train in Multibreed Validate in Purebreed 0.388 0.289 0.247 0.200 100 QTL 250 QTL 500 QTL Train in Purebreed Validate in Multibreed 0.422 0.308 0.276 0.299 • Identical trends when panel comprises QTL only • These correlations a/c for < 20% variation at best Correlations between true and predicted genetic merits in validation population Panel: HLD QTL MBÎPB PBÎMB 50 0.570 0.486 100 0.513 0.480 250 0.510 0.429 500 0.372 0.391 23 6/16/2009 Average LD between QTL and HLD marker in PB or MB populations HLD to QTL chosen from PB HLD-QTL LD assessed in PB MB 0.549 0.322 MB 0.412 0.408 Conclusions • MB population – A good choice to carry out genomic selection – Reasonably R bl accurate t estimate ti t off genetic ti merits it of selection candidates in a PB population • Accuracy of genetic merit in genomic selection – Higher with fewer QTL – Erodes when more uninformative SNPs added • The extent of LD hence r2 are highly variable – Lower average r2 in MB than PB populations – No complete LD for all QTL with SNPs – Denser markers are needed 24 6/16/2009 Training and Validation Purebred (PB) Purebred (PB) PB Î PB Reduced Panel Reduced panel within‐breed selection • Two‐stage Bayesian analysis – Run all 50k markers • in each of the three training sets (2&3, 1&3, 1&2) – Select the best 600 markers on model frequency and genomic coverage – Rerun the training and validation analyses using only the markers on the 600 marker panel 25 6/16/2009 50k versus 600 markers 384 SNP Panels • Panels of 600 markers per trait for 8 traits would require a single panel of 4 800 markers would require a single panel of 4,800 markers • Technology is moving such that larger panels are costing the same as smaller panels used to, rather than reducing the cost of smaller panels • Significantly cheaper panels are currently Si ifi tl h l tl limited to 384 (or less) SNP – Allow 100 or so of the best SNP for 3‐4 key traits 26 6/16/2009 Even Smaller Panels Validation in New AI Bulls 27 6/16/2009 Summary – beef cattle in US • 50k 50k within breed (like 5 within breed (like 5‐15 15 progeny) progeny) • 50k across breed (like 1 individual record or 5 progeny) • Reduced panel within breed ( (varies up to 50k accuracy) p y) Validation Statistics • Proportion of additive variation accounted for by the genomic prediction the genomic prediction – Molecular BV used as an observation 1/Multivariate model using the MBV as a trait to estimate (eg ASREML) the genetic correlation 2/ Reduction in estimated sire variance when the MBV is included as a fixed effect in the model 3/ Regression of phenotype on MBV Thallman et al, 2009 BIF 28 6/16/2009 Thallman et al, 2009 BIF BVN BVN res cov estd res cov 0 heritability rg Data Simulated from Additive Model Only 0.1 0.04 0.11 0.08 0.1 0.16 0.21 0.23 0.1 0.36 0.38 0.44 0.1 0.64 0.54 0.64 0.3 0.04 0.06 0.05 0.3 0.16 0.17 0.19 0.3 0.36 0.35 0.40 0.3 0.64 0.64 0.68 0.5 0.04 0.05 0.05 0.5 0.16 0.16 0.18 0.5 0.36 0.35 0.39 0.5 0.64 0.63 0.66 Reduction Regression 0.02 0.17 1.40 0.29 0.04 0.15 0.35 0.66 0.04 0.16 0.36 0.63 0.05 0.21 6.62 -0.23 0.05 0.20 0.42 0.83 0.05 0.18 0.39 0.72 1,000 animals representing 100 sires Training on EBVs 29 6/16/2009 Ideal Model (Equation) g = 1μ + Ma M +ε g is (true) genetic merit (BV) M is columns of covariates (genotypes) a are substitution effects ε is lack-of-fit (hopefully small) Ideal Model g = 1μ + Ma + ε g is genetic merit (BV) var(g) = A? or G? var(Ma) = G genomic relationships var(ε ) = Iσ ε2 ? or cA? for c= σ ε2 σ g2 the fraction of var(g) unaccounted by markers 30 6/16/2009 Towards a Practical Model g + e = 1μ + Ma M + (ε + e ) g is (true) genetic merit (BV) e is usual e (g + e ) is phenotype (no fixed effects) var(ε + e) = cA + Iσ e2 since cov(ε , e') = 0 Or include a random polygenic effects explicitly, with var(u)=A etc Practical Model y = Xb + Ma + (ε + e ) Xb are usual fixed effects var(ε + e) = cA + Iσ e2 Not reasonable to assume ass me var( ar(ε + e) = Iσ e2 unless markers are fitting very well But we do all the time ! And the results are fairly similar 31 6/16/2009 Heterogeneous Residual If y has correlated or heterogeneous residual y = Xb + Ma + (ε + e ) var(ε + e) = cA + R R is the usual value that would be used in MME (eg y is means of varying k numbers of observations) y k = Xb + Ma M + (ε + ek ) Repeatability model σ e2 , var (ek ) = D (diagonal) ⎡⎣ k + (k − 1)rI ⎤⎦ with rI =intraclass correlation of residuals var(ek ) = Heterogeneous Residual (cont) y k = Xb + Ma + (ε + e ) var(ε + e) = cI + D Simplify by ignoring polygenic Then "weights" are the diagonals of R −1 1 c or (arbitrarily) c + var(e ) c + var(e ) so weights i ht approach h 1 as k → ∞ (if rI = 0) r ii = var(e ) = σ e2 ⎡⎣ k + (k − 1)rI ⎤⎦ with rI =intraclass correlation of residuals 32 6/16/2009 Family Data When means are from relatives, relatives rather than the same individuals, genetic relationships can contribute to the intraclass correlation EBVs as data g = Ma + ε g + (ĝ - g) = ĝ = Ma + ε + (ĝ - g) with var( ĝ − g) = PEV > 0 p Similar to previous g + e = Ma + ε + e where var(g+e) > var(g) 33 6/16/2009 EBVs as data g = Ma + ε g + (ĝ - g) = ĝ = Ma + ε + (ĝ - g) with var( ĝ − g) = PEV > 0 Generally var(ĝ − g) = var(g) + var( ĝ) − 2 cov(ĝ, g) p shrinkage g pproperties p But BLUP has special cov(ĝ, g) = var (ĝ) so that var(ĝ − g) = var(g) − var( ĝ) var( ĝ) r2 = ≤ 1 so 0 ≤ var( ĝ) ≤ var(g) var(g) Other Relevant Properties of BLUP cov( ĝ, ĝ − g) = var( ĝ) − cov( ĝ, g) var( ĝ) − var(g) var( ĝ) = 0 = var(g) So prediction errors are uncorrelated with estimated merit ĝ ĝ − g 34 6/16/2009 Other Relevant Properties of BLUP But cov(g, ĝ − g) = cov( ĝ, g) − var(g) = var(g) ( ˆ ) − var(g) ( )<0 Really good animals are underestimated Really bad animals are overestimated g ĝ − g Genomic Prediction usual linear regression of y on (fixed) x cov(y, x) , var(x) (random) regression of EBV on markers ĝ on Ma involves cov(ĝ, Ma) ≈ cov(ĝ, g) for small c But cov(ĝ, g)=var(ĝ) will differ for every animal β y.x = according to its accuracy r 2 35 6/16/2009 Need to “inflate” observations g = Ma + ε g + (kĝ - g) = kĝ = Ma + ε + (kĝ - g) Want to choose k so that cov(g kg cov(g, kĝ - g) = 0 cov(kĝ, g) to be constant Finding k Want cov(g, kĝ - g) = 0 cov(g, kĝ - g) = cov(g, ĝ - g) = kvar(ĝ) - var(g) so we want k = var(g) 1 = var(ĝ) r 2 Want cov(kĝ, g) to be constant cov(kĝ, g)=k var (ĝ)= var(g) var (ĝ)= var(g) var(ĝ) 36 6/16/2009 Implications g ĝ = d, d a deregressed d d "observation" " b " with h h2 = r 2 2 r d It is used in the right-hand side of Cĝ= 2 σe where C is such that the PEV is C−1 = (1 − r 2 )σ g2 Proof d = σ e2 Cĝ= σ e2 (1 − r 2 )σ g2 (1 − h ) ĝ= ĝ 2 ĝ= (1 − r 2 )h 2 r2 More Implications But deregressed observations have heterogeneous variance ε + (kg-g kĝ-g ) with k = r −2 so kr 2 = 1 var (ε +kĝ-g )= var (ε ) + var(kĝ-g) = var (ε ) + k 2 var (ĝ ) + var (g ) − 2k var(ĝ) = var (ε ) + k 2 r 2 var (g ) + var (g ) − 2kr 2 var(g) 1 − r2 r2 Therefore the weights representing diagonals of R -1 are = var (ε ) + (k − 1)var (g ) and k − 1 = w= 1 c + 1 − r2 / r2 ( ) or scaled to w ' = c c + 1 − r2 / r2 ( ) so w' → 1 as r 2 → 1 37 6/16/2009 Removing Parent Average During the deregression process, parent average effects should be removed Why? Animals with own and/or progeny information are shrunk towards the parent average Imagine if many bulls had no own/progeny info y should not contribute anything y g to training g They Imagine if some parents were segregating a major effect We dont want this effect shrunk in all the offspring Deregression is no problem if deregressed information is derived directly from animal models during evaluation Removing Parent Average Deregression and removal of parent average effects can be approximately achieved using only the EBV and r 2 values from trios of the training animal, its sire and dam, by setting up mixed model equations for the parent average and offspring, reconstructing the left-hand side to obtain th published the bli h d reliabilities, li biliti before b f reconstructing t ti the th implied i li d right-hand side to determine the deregressed observation and its appropriate r 2 ignoring the parental contribution 38 6/16/2009 Approximate Method First solve for Z ' Z ⎡ Z ' Z PA + 4 λ ⎢ −2 λ ⎢⎣ −2 λ Z ' Zindividual ⎤ ⎡ ĝPA ⎥⎢ + 2 λ ⎥ ⎢ ĝindividual ⎦⎣ ⎤ ⎡ yPA ⎥=⎢ ⎥⎦ ⎢⎣ yindividual ⎤ ⎥ ⎥⎦ −1 ⎤ ⎡ Z ' Z PA + 4 λ ⎡ C PA ⎤ −2 λ C off ⎢ ⎥ = ⎢ off ⎥ −2 λ Z ' Zindividual + 2 λ ⎥ C individual ⎥⎦ ⎢⎣ ⎢⎣ C ⎦ 2 2 to compute rPA = 0.5 − λC PA and rindividual = 1.0 − λC individual Then use [Z ' Zindividual + λ ][ĝindividual ] = [yindividual ] 2 so rderegressed = 1.0 − λ / (Z ' Zindividual + λ ) Substitution Effects Animal pperspective, p Falconer recognizes g a and d effects as the allelic contributions to phenotypic performance Parent perspective, the allelic contributions to breeding value are α =a + d(q - p) Mixed animal and offspring data will be estimating a mixture of a and α effects 39