BIOS 2063 2005 Ridge Regression -1- Ridge Regression Linear regression is sometimes unstable due to approximate collinearity of the predictors. Manifestations: Very large standard errors for coefficients Low standard errors for some linear comb’s of coeffs det( X T X ) is close to zero. X T X has a very small eigenvalue. X T X has a very large “condition number” (ratio of largest to smallest eigenvalue) The Principle: Add a “ridge” of size stabilize the matrix inverse. to the diagonal of XTX, to ˆridge ( X T X I )1 X T Y Another view: penalized likelihood ˆridge : arg min (Y X )T (Y X ) || ||2 Another view: maximizing a Bayesian posterior, where the prior is [ ] ~ N (0,(2 ) 1 I p ) . Implementation: data augmentation : X aug X Y , Y aug 0 . diag ( ) Then OLS on the augmented data will yield ˆridge . Yet another view: Solve a constrained optimization problem (restricting the model space): BIOS 2063 2005 Ridge Regression -2- ˆ : arg min (Y X ) (Y X ) T will yield restricted to the set { : || ||2 K } ˆridge . ##### Ridge regression ## make some data, with a nearly collinear design matrix sharedfactor = rnorm(50) x1 = sharedfactor + rnorm(50)* 0.001 x2 = sharedfactor + rnorm(50)* 0.001 b1 = 1 b2 = 1 y = b1*x1 + b2*x2 + rnorm(50) originaldata = data.frame(y=y, x1=x1, x2=x2) -1 0 1 2 0 2 4 -2 1 2 -6 -4 -2 y 0 ###### Standard linear model. original.lm = lm(y ~ x1 + x2, data = originaldata) original.lm pairs(originaldata) designmatrix = as.matrix(originaldata[, -1]) XTX=t(designmatrix)%*%designmatrix svd(XTX)$d #### Eigenvalues: Badly conditioned matrix. -4 > results ml 1e-006 1e-005 0.0001 0.001 0.01 0.1 1 10 100 b1hat -86.2259534 -84.8413923 -72.6883121 -29.4760827 -3.3865433 0.6014741 1.0210714 1.0529404 0.9613129 0.5010925 b2hat deviance eig1 88.3468528 41.06907 86.67610 86.9627434 41.09845 86.67610 74.8122337 41.21327 86.67611 31.6091417 41.62145 86.67620 5.5250951 41.87684 86.67710 1.5376947 41.96154 86.68610 1.1160084 42.25016 86.77610 1.0628409 44.48155 87.67610 0.9625357 62.95771 96.67610 0.5013048 148.94035 186.67610 0 -1 x2 -2 ##Ridge regression lambdalist = 10^(-6:2) results = matrix(NA, nrow=length(lambdalist)+1, ncol=5) dimnames(results) = list(c("ml", as.character(lambdalist)), words("b1hat b2hat deviance eig1 eig2")) -6 results[1,1:2] = coef(original.lm)[2:3] results[1,3] = deviance(original.lm) results[1,4:5] = svd(XTX)$d for(lambda in lambdalist) { pseudodata = data.frame(y=rep(0,2), x1=c(sqrt(lambda),0), x2=c(0,sqrt(lambda))) ridgedata = rbind(originaldata, pseudodata) ridge.lm = lm(y ~ x1 + x2, data = ridgedata) designmatrix = as.matrix(ridgedata[, -1]) XTX=t(designmatrix)%*%designmatrix print(svd(XTX)$d) results[as.character(lambda),1:2] = coef(ridge.lm)[2:3] results[as.character(lambda),3] = deviance(ridge.lm) results[as.character(lambda),4:5] = svd(XTX)$d } results 1 2 -2 -1 x1 eig2 5.407039e-005 5.507039e-005 6.407039e-005 1.540704e-004 1.054070e-003 1.005407e-002 1.000541e-001 1.000054e+000 1.000005e+001 1.000001e+002 -2 0 2 4 -2 -1 0 1 2