Author: Brenda Gunderson, Ph.D., 2014 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-NonCommercial-Share Alike 3.0 Unported License: http://creativecommons.org/licenses/by-nc-sa/3.0/ The University of Michigan Open.Michigan initiative has reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The attribution key provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to attribute these materials visit: http://open.umich.edu/education/about/terms-of-use. Some materials are used with permission from the copyright holders. You may need to obtain new permission to use those materials for other uses. This includes all content from: Mind on Statistics Utts/Heckard, 4th Edition, Cengage L, 2012 Text Only: ISBN 9781285135984 Bundled version: ISBN 9780538733489 SPSS and its associated programs are trademarks of SPSS Inc. for its proprietary computer software. Other product names mentioned in this resource are used for identification purposes only and may be trademarks of their respective companies. Attribution Key For more information see: http:://open.umich.edu/wiki/AttributionPolicy Content the copyright holder, author, or law permits you to use, share and adapt: Creative Commons Attribution-NonCommercial-Share Alike License Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Make Your Own Assessment Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. Public Domain – Ineligible. Works that are ineligible for copyright protection in the U.S. (17 USC §102(b)) *laws in your jurisdiction may differ. Content Open.Michigan has used under a Fair Use determination Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act (17 USC § 107) *laws in your jurisdiction may differ. Our determination DOES NOT mean that all uses of this third-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should conduct your own independent analysis to determine whether or not your use will be Fair. Supplement 8: Regression Output in SPSS There are four parts to the default regression output. Use the scroll bar at the right edge of the Output Window to scroll up to the top of the regression output. The first section just reminds you which variable was entered as the explanatory x variable; for this example, the explanatory variable is DNA. The second section has the heading Model Summary. The Model Summary starts with the correlation between the two variables, R, which is the absolute value of the correlation coefficient, r. You need to look at the sign of the slope of the regression line to determine if you need to put a minus sign in front of this value to correctly report the correlation coefficient. (The actual value of the correlation coefficient is also reported in the last section of regression output, under the column heading Beta.) The correlation coefficient measures the strength of the linear association between the two variables. The closer it is to +1 or -1, the stronger the linear association. The square of the correlation, the R Square quantity, has a useful interpretation in regression. It is often called the coefficient of determination and measures the proportion of the variation in the response that can be explained by the linear regression of y on x. Thus, it is a measure of how well the linear regression model fits the data. The Std. Error of the Estimate gives the value of s, the estimate of the population standard deviation σ. Model Summary Model 1 R .856a R Square .732 Adjusted R Square .699 Std. Error of the Estimate 4.851 a. Predictors: (Constant), DNA The third part of the output contains the ANOVA table for regression, used for assessing if the slope is significantly different from 0 via an F test. The corresponding t-test will be discussed first and we return to this ANOVA part later. ANOVAb Model 1 Regres sion Residual Total Sum of Squares 515.141 188.228 703.369 df 1 8 9 a. Predic tors: (Constant), DNA b. Dependent Variable: PLAQUE 17 Mean Square 515.141 23.528 F 21.894 Sig. .002a The last portion of the output falls under the heading Coefficients. In this section, the least square estimates for the regression line are given. These estimated regression coefficients are found under the column labeled B. The estimated slope is next to the independent variable name (in this example it is DNA), and the estimated intercept is next to (Constant). So, b0 is the coefficient for the variable (Constant), and b1 is the coefficient for the independent variable x in the model. The next column heading is Std. Error, which provides the corresponding standard error of each of the least squares estimates. Also produced in this table, are the t-test statistics in the column labeled t and Sig., which reports the two-sided p-values for these t-test statistics. Coefficientsa Model 1 (Constant) DNA Unstandardized Coefficients B Std. Error -.548 8.193 .167 .036 Standardi zed Coefficien ts Beta .856 t -.067 4.679 Sig. .948 .002 a. Dependent Variable: PLAQUE The t-statistic for the slope, in the second row, is a test of the significance of the model with x versus the model without x, that is, for testing H0: 1 = 0 versus Ha: 1 0. The t-statistic for the y-intercept, in the first row, is a test of whether the y-intercept (o) is different from zero. This test is not often of interest unless a value of 0 for the y-intercept is meaningful and of interest. For example, if x = amount of soap used and y = height of the suds, then an intercept value of 0 is meaningful as no soap would lead to no suds. The column labeled Sig. gives the two-sided p-value for the corresponding hypothesis test. SPSS also provides the information to calculate confidence intervals for the parameter estimates. The column labeled Std. Error provides standard errors (estimated standard deviations) of the parameter estimates and is the quantity that is multiplied by the appropriate t* value in computing the half-width of the confidence interval. Recall that you can request SPSS to produce these confidence intervals for you using the Statistics button in the Regression dialog box. 18 Interpretation of estimated slope b1: According to our regression model, we estimate that increasing DNA by one unit has the effect of increasing the predicted plaque by .167 units. Interpretation of r2: According to our model, 73% of variation in plaque levels can be accounted for by its linear relationship with DNA. Decision for test of a significant linear relationship: Since the p-value = .002 is less than the significance level α = .05, we can reject the null hypothesis that the population slope, 1, equals 0. Conclusion: There is sufficient evidence to conclude that in the linear model for plaque based on DNA the population slope, 1, does not equal zero. Hence, it appears that DNA is a significant linear predictor of plaque. Let’s return to the ANOVA table in the middle of the regression output. ANOVAb Model 1 Regres sion Residual Total Sum of Squares 515.141 188.228 703.369 df 1 8 9 Mean Square 515.141 23.528 F 21.894 Sig. .002a a. Predic tors: (Constant), DNA b. Dependent Variable: PLAQUE The Regression Sum of Squares corresponds to the portion of the total variation in the data that is accounted for by the regression line. Everything that is left over and not accounted for by the regression line is placed in the Residual Sum of Squares category. Then, dividing the sum of squares by their respective df (degrees of freedom) yields the Mean Squares. Finally, the ratio of the Mean Squares provides the F statistic which tests if the slope is significantly different from zero (i.e. if there is a significant non-zero linear relationship between the two variables – H0: 1 = 0 versus Ha: 1 0.) The Sig. is the corresponding p-value for the F test of these hypotheses. In simple linear regression, the t-test in the Coefficients output for the slope is equivalent to the ANOVA F-test. Notice that the square of the t-statistic for testing about the slope is equal to the F-statistic in the ANOVA table, and the corresponding p-values are the same 19 Checking the Simple Linear Regression Assumptions Here is a summary of some graphical procedures that are useful in detecting departures from the assumptions underlying the simple linear regression model. 1. LINEARITY: Do a scatter plot of y versus x. The plot should appear to be roughly linear. 2. STABILITY: Do a sequence plot of the residuals. The plot should show no pattern indicating any trend in the mean or in the variance of the residuals. An example series plot is shown below. Remember that it is only appropriate to make sequence plots when there is some ordering present in the data. 20 3. NORMALITY: Examine a Q-Q plot of the residuals to check on the assumption of normality for the population (true) error terms. An example Q-Q plot is shown below. 4. CONSTANT STANDARD DEVIATION of the population (true) error terms: Make a plot of the residuals versus x. This plot is called a residual plot. The residuals represent what is left over after the linear model has been fit. The residual plot should be a random scatter of points in roughly a horizontal band, with no apparent pattern. An example residual plot is shown at the right. Sometimes this plot can also reveal departures from linearity (i.e. that the regression analysis is not appropriate due to lack of a linear relationship). 21 22