Topic 19: Remedies Outline • Review regression diagnostics • Remedial measures – Weighted regression – Ridge regression – Robust regression – Bootstrapping Regression Diagnostics Summary • Check normality of the residuals with a normal quantile plot • Plot the residuals versus predicted values, versus each of the X’s and (when appropriate) versus time • Examine the partial regression plots – Use the graphics smoother to see if there appears to be a curvilinear pattern Regression Diagnostics Summary • Examine – the studentized deleted residuals (RSTUDENT in the output) – The hat matrix diagonals – Dffits, Cook’s D, and the DFBETAS • Check observations that are extreme on these measures relative to the other observations Regression Diagnostics Summary • Examine the tolerance for each X • If there are variables with low tolerance, you need to do some model building – Recode variables – Variable selection Remedial measures • • • • • Weighted least squares Ridge regression Robust regression Nonparametric regression Bootstrapping Maximum Likelihood Yi 0 1X i i , Var( i ) i Yi ~ N 0 1X i , i 1 fi e 2 i 2 2 1 Yi 0 1X i 2 i 2 L f1 f 2 f n likelihood function 100 0 1st Qtr 3rd Qtr Ea W N Weighted regression • Maximization of L with respect to β’s is equivalent to minimization of 1 i 2 Y i 1 X i 1 ... p 1 X ip 1 2 0 • Weight of each observation: wi=1/σi2 Weighted least squares • Least squares problem is to minimize the sum of wi times the squared residual for case i • Computations are easy, use the weight statement in proc reg • bw = (X΄WX)-1(X΄WY) where W is a diagonal matrix of the weights • The problem now becomes determining the weights Determination of weights • Find a relationship between the absolute residual and another variable and use this as a model for the standard deviation • Similarly for the squared residual and another variable • Use grouped data or approximately grouped data to estimate the variance Determination of weights • With a model for the standard deviation or the variance, we can approximate the optimal weights • Optimal weights are proportional to the inverse of the variance KNNL Example • • • • KNNL p 427 Y is diastolic blood pressure X is age n = 54 healthy adult women aged 20 to 60 years old Get the data and check it data a1; infile ‘../data/ch11ta01.txt'; input age diast; proc print data=a1; run; Plot the relationship symbol1 v=circle i=sm70; proc gplot data=a1; plot diast*age / frame; run; Diastolic bp vs age Strong linear relationship but nonconstant variance Run the regression proc reg data=a1; model diast=age; output out=a2 r=resid; run; Regression output Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var Analysis of Variance Sum of Mean DF Squares Square F Value Pr > F 1 2374.96833 2374.96833 35.79 <.0001 52 3450.36501 66.35317 53 5825.33333 8.14575 R-Square 79.11111 Adj R-Sq 10.29659 0.4077 0.3963 Regression output Variable Intercept age Parameter Estimates Parameter Standard DF Estimate Error t Value Pr > |t| 1 56.15693 3.99367 14.06 <.0001 1 0.58003 0.09695 5.98 <.0001 Estimators still unbiased but no longer have minimum variance Prediction interval coverage often lower or higher than 95% Use the output data set to get the absolute and squared residuals data a2; set a2; absr=abs(resid); sqrr=resid*resid; Do the plots with a smooth proc gplot data=a2; plot (resid absr sqrr)*age; run; Absolute value of the residuals vs age absr 20 18 16 14 12 10 8 6 4 2 0 20 30 40 age 50 60 Squared residuals vs age Model the std dev vs age (absolute value of the residual) proc reg data=a2; model absr=age; output out=a3 p=shat; Note that a3 has the predicted standard deviations (shat) Compute the weights data a3; set a3; wt=1/(shat*shat); Regression with weights proc reg data=a3; model diast=age / clb; weight wt; run; Output Source Model Error Corrected Total Root MSE Analysis of Variance Sum of Mean DF Squares Square F Value Pr > F 1 83.34082 83.34082 56.64 <.0001 52 76.51351 1.47141 53 159.85432 1.21302 R-Square 0.5214 Dependent Mean 73.55134 Adj R-Sq 0.5122 Coeff Var 1.64921 Output Parameter Estimates Parameter Standard 95% Confidence Variable DF Estimate Error t Value Pr > |t| Limits Intercept 1 55.56577 2.52092 22.04 <.0001 50.5072 60.6244 age 1 0.59634 0.07924 7.53 <.0001 0.43734 0.75534 Reduction in std err of the age coeff Ridge regression • Similar to a very old idea in numerical analysis • If (X΄X) is difficult to invert (near singular) then approximate by inverting (X΄X+kI). • Estimators of coefficients are biased but more stable. • For some value of k ridge regression estimator has a smaller mean square error than ordinary least square estimator. • Can be used to reduce number of predictors • Ridge = k is an option for model statement . • Cross-validation used to determine k Robust regression • Basic idea is to have a procedure that is not sensitive to outliers • Alternatives to least squares, minimize – sum of absolute values of residuals – median of the squares of residuals • Do weighted regression with weights based on residuals, and iterate Nonparametric regression • • • • Several versions We have used i=sm70 Interesting theory All versions have some smoothing or penalty parameter similar to the 70 in i=sm70 Bootstrap • Very important theoretical development that has had a major impact on applied statistics • Based on simulation • Sample with replacement from the data or residuals and repeatedly refit model to get the distribution of the quantity of interest Background Reading • We used programs topic19.sas • This completes Chapter 11 • This completes the material for the midterm