09-f14-bgunderson-wb-supplement8

advertisement
Author: Brenda Gunderson, Ph.D., 2014
License: Unless otherwise noted, this material is made available under the terms of the Creative
Commons Attribution-NonCommercial-Share Alike 3.0 Unported License:
http://creativecommons.org/licenses/by-nc-sa/3.0/
The University of Michigan Open.Michigan initiative has reviewed this material in accordance with U.S.
Copyright Law and have tried to maximize your ability to use, share, and adapt it. The attribution key
provides information about how you may share and adapt this material.
Copyright holders of content included in this material should contact open.michigan@umich.edu with any
questions, corrections, or clarification regarding the use of content.
For more information about how to attribute these materials visit:
http://open.umich.edu/education/about/terms-of-use. Some materials are used with permission from the
copyright holders. You may need to obtain new permission to use those materials for other uses. This
includes all content from:
Mind on Statistics
Utts/Heckard, 4th Edition, Cengage L, 2012
Text Only: ISBN 9781285135984
Bundled version: ISBN 9780538733489
SPSS and its associated programs are trademarks of SPSS Inc. for its proprietary
computer software. Other product names mentioned in this resource are used for identification purposes
only and may be trademarks of their respective companies.
Attribution Key
For more information see: http:://open.umich.edu/wiki/AttributionPolicy
Content the copyright holder, author, or law permits you to use, share and adapt:
Creative Commons Attribution-NonCommercial-Share Alike License
Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the
public domain.
Make Your Own Assessment
Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright.
Public Domain – Ineligible. Works that are ineligible for copyright protection in the
U.S. (17 USC §102(b)) *laws in your jurisdiction may differ.
Content Open.Michigan has used under a Fair Use determination
Fair Use: Use of works that is determined to be Fair consistent with the U.S.
Copyright Act (17 USC § 107) *laws in your jurisdiction may differ.
Our determination DOES NOT mean that all uses of this third-party content are Fair Uses and we DO
NOT guarantee that your use of the content is Fair. To use this content you should conduct your own
independent analysis to determine whether or not your use will be Fair.
Supplement 8: Regression Output in SPSS
There are four parts to the default regression output. Use the scroll bar at the
right edge of the Output Window to scroll up to the top of the regression output.
The first section just reminds you which variable was entered as the explanatory
x variable; for this example, the explanatory variable is DNA.
The second section has the heading Model Summary. The Model Summary starts
with the correlation between the two variables, R, which is the absolute value of
the correlation coefficient, r. You need to look at the sign of the slope of the
regression line to determine if you need to put a minus sign in front of this value to
correctly report the correlation coefficient. (The actual value of the correlation
coefficient is also reported in the last section of regression output, under the
column heading Beta.) The correlation coefficient measures the strength of the
linear association between the two variables. The closer it is to +1 or -1, the
stronger the linear association. The square of the correlation, the R Square
quantity, has a useful interpretation in regression. It is often called the coefficient
of determination and measures the proportion of the variation in the response
that can be explained by the linear regression of y on x. Thus, it is a measure of
how well the linear regression model fits the data. The Std. Error of the Estimate
gives the value of s, the estimate of the population standard deviation σ.
Model Summary
Model
1
R
.856a
R Square
.732
Adjusted
R Square
.699
Std. Error of
the Estimate
4.851
a. Predictors: (Constant), DNA
The third part of the output contains the ANOVA table for regression, used for
assessing if the slope is significantly different from 0 via an F test. The
corresponding t-test will be discussed first and we return to this ANOVA part later.
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
515.141
188.228
703.369
df
1
8
9
a. Predic tors: (Constant), DNA
b. Dependent Variable: PLAQUE
17
Mean Square
515.141
23.528
F
21.894
Sig.
.002a
The last portion of the output falls under the heading Coefficients. In this section,
the least square estimates for the regression line are given. These estimated
regression coefficients are found under the column labeled B. The estimated slope
is next to the independent variable name (in this example it is DNA), and the
estimated intercept is next to (Constant). So, b0 is the coefficient for the variable
(Constant), and b1 is the coefficient for the independent variable x in the model.
The next column heading is Std. Error, which provides the corresponding standard
error of each of the least squares estimates. Also produced in this table, are the
t-test statistics in the column labeled t and Sig., which reports the two-sided
p-values for these t-test statistics.
Coefficientsa
Model
1
(Constant)
DNA
Unstandardized
Coefficients
B
Std. Error
-.548
8.193
.167
.036
Standardi
zed
Coefficien
ts
Beta
.856
t
-.067
4.679
Sig.
.948
.002
a. Dependent Variable: PLAQUE
The t-statistic for the slope, in the second row, is a test of the significance of the
model with x versus the model without x, that is, for testing H0: 1 = 0 versus
Ha: 1  0. The t-statistic for the y-intercept, in the first row, is a test of whether
the y-intercept (o) is different from zero. This test is not often of interest unless a
value of 0 for the y-intercept is meaningful and of interest. For example, if
x = amount of soap used and y = height of the suds, then an intercept value of 0 is
meaningful as no soap would lead to no suds. The column labeled Sig. gives the
two-sided p-value for the corresponding hypothesis test.
SPSS also provides the information to calculate confidence intervals for the
parameter estimates. The column labeled Std. Error provides standard errors
(estimated standard deviations) of the parameter estimates and is the quantity
that is multiplied by the appropriate t* value in computing the half-width of the
confidence interval. Recall that you can request SPSS to produce these confidence
intervals for you using the Statistics button in the Regression dialog box.
18
Interpretation of estimated slope b1:
According to our regression model, we estimate that increasing DNA by one unit
has the effect of increasing the predicted plaque by .167 units.
Interpretation of r2:
According to our model, 73% of variation in plaque levels can be accounted for by
its linear relationship with DNA.
Decision for test of a significant linear relationship:
Since the p-value = .002 is less than the significance level α = .05, we can reject the
null hypothesis that the population slope, 1, equals 0.
Conclusion: There is sufficient evidence to conclude that in the linear model for
plaque based on DNA the population slope, 1, does not equal zero. Hence, it
appears that DNA is a significant linear predictor of plaque.
Let’s return to the ANOVA table in the middle of the regression output.
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
515.141
188.228
703.369
df
1
8
9
Mean Square
515.141
23.528
F
21.894
Sig.
.002a
a. Predic tors: (Constant), DNA
b. Dependent Variable: PLAQUE
The Regression Sum of Squares corresponds to the portion of the total variation in
the data that is accounted for by the regression line. Everything that is left over
and not accounted for by the regression line is placed in the Residual Sum of
Squares category. Then, dividing the sum of squares by their respective df
(degrees of freedom) yields the Mean Squares.
Finally, the ratio of the Mean Squares provides the F statistic which tests if the
slope is significantly different from zero (i.e. if there is a significant non-zero linear
relationship between the two variables – H0: 1 = 0 versus Ha: 1  0.) The Sig. is
the corresponding p-value for the F test of these hypotheses.
In simple linear regression, the t-test in the Coefficients output for the slope is
equivalent to the ANOVA F-test. Notice that the square of the t-statistic for testing
about the slope is equal to the F-statistic in the ANOVA table, and the
corresponding p-values are the same
19
Checking the Simple Linear Regression Assumptions
Here is a summary of some graphical procedures that are useful in detecting
departures from the assumptions underlying the simple linear regression model.
1. LINEARITY: Do a scatter plot of y versus x.
The plot should appear to be roughly linear.
2. STABILITY: Do a sequence plot of the residuals.
The plot should show no pattern indicating any trend in the mean or in the
variance of the residuals. An example series plot is shown below. Remember
that it is only appropriate to make sequence plots when there is some ordering
present in the data.
20
3. NORMALITY: Examine a Q-Q plot of the residuals to check on the assumption
of normality for the population (true) error terms. An example Q-Q plot is
shown below.
4. CONSTANT STANDARD DEVIATION of the population (true) error terms: Make
a plot of the residuals versus x. This plot is called a residual plot. The residuals
represent what is left over after the linear model has been fit. The residual
plot should be a random scatter of points in roughly a horizontal band, with no
apparent pattern. An example residual plot is shown at the right. Sometimes
this plot can also reveal departures from linearity (i.e. that the regression
analysis is not appropriate due to lack of a linear relationship).
21
22
Download