Producing and Interpreting Residuals Plots in SPSS In a linear regression analysis it is assumed that the distribution of residuals, ˆ (Y Y ) , is, in the population, normal at every level of predicted Y and constant in variance across levels of predicted Y. I shall illustrate how to check that assumption. Although I shall use a bivariate regression, the same technique would work for a multiple regression. Start by downloading Residual-Skew.dat and Residual-Hetero.dat from my StatData page and ANOVA1.sav from my SPSS data page. Each line of data has four scores: X, Y, X2, and Y2. The delimiter is a blank space. Create new variable SQRT_Y2 this way: Transform, Compute, OK. First some descriptive statistics on the variables: Descriptive Statistics X Y X2 Y2 Y2_SQRT Valid N (listwise) N Statistic 200 200 200 200 200 200 Minimum Statistic 16 22 1 3 1.73 Maximum Statistic 74 77 144 163 12.77 Mean Statistic 48.59 49.66 45.68 48.43 6.6751 Std. Deviation Statistic 9.985 9.847 27.123 27.484 1.97299 Skewness Statistic Std. Error -.053 .172 -.069 .172 1.046 .172 .948 .172 .171 .172 Kurtosis Statistic Std. Error .161 .342 -.244 .342 1.038 .342 1.250 .342 -.196 .342 Notice that variables X and Y are not skewed – I generated them with a normal random number generator. Notice that X2 and Y2 are skewed and that taking the square root of Y2 reduces its skewness greatly. Here we predict Y from X, produce a residuals plot, and save the residuals. Copyright 2007, Karl L. Wuensch - All rights reserved. Residual-Plots-SPSS.doc 2 3 Model Summaryb Model 1 R .450a R Square .203 Adjusted R Square .199 Std. Error of the Estimate 8.815 a. Predictors: (Constant), X2 b. Dependent Variable: Y Here is a histogram of the residuals with a normal curve superimposed. The residuals look close to normal. Here is a plot of the residuals versus predicted Y. The pattern show here indicates no problems with the assumption that the residuals are normally distributed at each level of Y and constant in variance across levels of Y. SPSS does not automatically draw in the regression line (the horizontal line at residual = 0). I double clicked the chart and then selected Elements, Fit Line at Total to get that line. 4 SPSS has saved the residuals, unstandardized (RES_1) and standardized (ZRE_1) to the data file: Analyze, Explore ZRE_1 to get a better picture of the standardized residuals. The plots look fine. As you can see, the skewness and kurtosis of the residuals is about what you would expect if they came from a normal distribution: 5 De scri ptives ZRE_1 St atist ic .0000000 -2. 55481 2.65518 5.20999 -.074 -.264 Mean Minimum Maximum Range Sk ewness Kurtos is Now predict Y from the skewed X2. You conduct this analysis with the same plots and saved residuals as above. You will notice that the residuals plots and exploration of the saved residuals indicate no problems for the regression model. The skewness of X2 may be troublesome for the correlation model, but not for the regression model. Next, predict skewed Y2 from X. Model Summaryb Model 1 R .452a R Square .204 a. Predictors: (Constant), X b. Dependent Variable: Y2 Adjusted R Square .200 Std. Error of the Estimate 24.581 6 Notice that the residuals plots shows the residuals not to be normally distributed – they are pulled out (skewed) towards the top of the plot. Explore also shows trouble: 7 Descriptives ZRE_1 Mean Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic .0000000 -1.87474 3.61399 5.48873 1.34039 .803 .965 Notice the outliers in the boxplot. Maybe we can solve this problem by taking the square root of Y2. Predict the square root of Y from X. 8 Model Summaryb Model 1 R .459a R Square .211 Adjusted R Square .207 Std. Error of the Estimate 1.75738 a. Predictors: (Constant), X b. Dependent Variable: Y2_SQRT Descriptives ZRE_1 Mean Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic .0000000 -2.21496 2.82660 5.04156 1.42707 .133 -.240 Std. Error .07053279 .172 .342 Notice that the transformation did wonders, reducing the skewness of the residuals to a comfortable level. 9 We are done with the Residual-Skew data set now. Read into SPSS the ANOVA1.sav data file. Conduct a linear regression analysis to predict illness from dose of drug. Save the standardized residuals and obtain the same plots that we produced above. Model Summaryb Model 1 R .110a R Square .012 Adjusted R Square .002 Std. Error of the Estimate 12.113 a. Predictors: (Constant), dose b. Dependent Variable: illness Look at the residuals plot. Oh my. Notice that the residuals are not symmetrically distributed about zero. They are mostly positive with low and high values of predicted Y and mostly negative with medium values of predicted Y. If you were to find the means of the residuals at each level of Y and connect those means with the line you would get a curve with one bend. This strongly suggests that the relationship between X and Y is not linear and you should try a nonlinear model. Notice that the problem is not apparent when we look at the marginal distribution of the residuals. Produce the new variable Dose_SQ by squaring Dose, OK. 10 Now predict Illness from a combination of Dose and Dose_SQ. Ask for the usual plots and save residuals and predicted scores. Model Summary(b) Model 1 R .657(a) R Square .431 Adjusted R Square .419 Std. Error of the Estimate 9.238 a Predictors: (Constant), Dose_SQ, dose b Dependent Variable: illness Notice that the R has gone up a lot and is now significant, and the residuals plot looks fine. 11 Let us have a look at the regression line. We saved the predicted scores (PRE_1), so we can plot their means against dose of the drug: Click Graphs, Line, Simple, Define. Select Line Represents Other statistic and scoot PRE_1 into the variable box. Scoot Dose into the Category Axis box. OK. 12 Wow, that is certainly no straight line. What we have done here is a polynomial regression, fitting the data with a quadratic line. A quadratic line can have one bend in it. Let us get a scatter plot with the data and the quadratic regression line. Click Graph, Scatter, Simple Scatter, Define. Scoot Illness into the Y-axis box and Dose into the X-axis box. OK. Double-click the graph to open the graph editor and select Elements, Fit line at total. SPSS will draw a nearly flat, straight line. In the Properties box change Fit Method from Linear to Quadratic. Click Apply and then close the chart editor. 13 We are done with the ANOVA.sav data for now. Bring into SPSS the ResidualHETERO.dat data. Each case has two scores, X and Y. The delimiter is a blank space. Conduct a regression analysis predicting Y from X. Create residuals plots and save the standardized residuals as we have been doing with each analysis. 14 As you can see, the residuals plot shows clear evidence of heteroscedasticity. In this case, the error in predicted Y increases as the value of predicted Y increases. I have been told that transforming one the variables sometimes reduces heteroscedasticity, but in my experience it often does not help. Return to Wuensch's SPSS Lessons Page Copyright 2007, Karl L. Wuensch - All rights reserved.