The Linear Regression Equation

Chapter Eight
Linear Regression
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Linear Regression
Linear regression is a statistical procedure
that uses relationships to predict unknown Y
scores based on the X scores from a
correlated variable.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 2
Linear Regression Line
The linear regression line is the straight
line that summarizes the linear relationship
in the scatterplot by, on average, passing
through the center of the Y scores at each X.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 3
Predicted Y Scores
• The symbol Y  stands for a predicted Y
score
• Each Y  is our best prediction of the Y
score at a corresponding X, based on the
linear relationship that is summarized by
the regression line
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 4
Slope and Intercept
• The slope is a number that indicates how
slanted the regression line is and the
direction in which it slants
• The Y intercept is the value of Y at the
point where the regression line intercepts,
or crosses, the Y axis (that is, when X
equals 0)
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 5
Regression Lines Having Different
Slopes and Y Intercepts
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 6
The Linear Regression Equation
• The linear regression equation indicates
the predicted Y values are equal to the
slope (b) times a given X value and this
product then is added to the Y intercept (a)
Y   bX  a
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 7
Computing the Slope
• The formula for the slope (b) is
N (XY )  (X )(Y )
b
2
2
N (X )  (X )
• Since we usually first compute r, the
values of the elements of this formula
already are known
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 8
Computing the Y-Intercept
• The formula for the Y-intercept (a) is
a  Y  (b)( X )
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 9
Summary of Computations for the
Linear Regression Equation
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 10
Describing Errors in Prediction
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 11
The Variance of Y Scores
Y
Around
• The variance of the Y scores around Y 
is the average squared difference between
the actual Y scores and their
corresponding predicted Y  scores
• The formula for defining the variance of
the Y scores around Y  is
S  S (1  r )
2
Y
2
Y
2
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 12
The Standard Error of the Estimate
The standard error of the estimate is the
clearest way to describe the “average” error
when using Y  to predict Y scores.
SY   SY 1  r
2
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 13
Interpreting the Standard
Error of the Estimate
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 14
Assumption 1 of Linear Regression
• The first assumption of linear regression is
that the data are homoscedastic
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 15
Homoscedasticity
• Homoscedasticity occurs when the Y scores are
spread out to the same degree at every X
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 16
Heteroscedasticity
• Heteroscedasticity occurs when the spread in Y
is not equal throughout the relationship
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 17
Assumption 2 of Linear Regression
• The second assumption of linear
regression is that the Y scores at each X
form an approximately normal distribution
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 18
Scatterplot Showing Normal
Distribution of Y Scores at Each X
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 19
The Strength of a Relationship
and Prediction Error
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 20
The Strength of a Relationship
As the strength of the relationship
increases, the actual Y scores are closer
to their corresponding Y  scores,
producing less prediction error and
2
smaller values of SY  and SY  .
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 21
Scatterplot of a Strong Relationship
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 22
Scatterplot of a Weak Relationship
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 23
Computing the Proportion of
Variance Accounted For
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 24
Proportion of Variance
Accounted For
The proportion of variance accounted for is
the proportional improvement in the
accuracy of our predictions produced by
using a relationship to predict Y scores,
compared to our accuracy when we do not
use the relationship.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 25
Proportion of Variance
Accounted For
• When we do not use the relationship, we
use the overall mean of the Y scores (Y )
as everyone’s predicted Y
• The error here is the difference between
the actual Y scores and the Y we predict
they got (Y  Y )
• When we do not use the relationship to
2
predict scores, our error is SY
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 26
Proportion of Variance
Accounted For
• When we do use the relationship, we use
the corresponding Y  as determined by
the linear regression equation as our
predicted value
• The error here is the difference between
the actual Y scores and the Y  that we
predict they got (Y  Y )
• When we do use the relationship to predict
scores, our error is SY2
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 27
Proportion of Variance
Accounted For
The computational formula for the proportion
of variance in Y that is accounted for by a
2
linear relationship with X is r .
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 28
Proportion of Variance
Not Accounted For
The computational formula for the proportion
of variance in Y that is not accounted for by
a linear relationship with X is 1  r .
2
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 29
Example 1
• For the following data
set, calculate the linear
regression equation.
X
Y
1
8
2
6
3
6
4
5
5
1
6
3
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 30
Example 1
Linear Regression Equation
N (XY )  (X )(Y )
b
2
2
N (X )  (X )
6(81)  (21)( 29)

2
6(91)  (21)
123

  1.171
105
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 31
Example 1
Linear Regression Equation
a  Y  (b)( X )
 4.833  (1.171)(3.500)
 8.933
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 32
Example 1
Linear Regression Equation
Y   bX  a
  1.17 X  8.93
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 33
Example 2
Predicted Y Value
• Using the linear regression equation from
example 1, determine the predicted Y
score for X = 4.
Y    1.17 X  8.93
  1.17(4)  8.93
 4.25
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 34
Example 3
• Using the same data set,
calculate the standard error
of the estimate.
X
Y
1
8
2
6
3
6
4
5
5
1
6
3
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 35
Example 3
Standard Error of the Estimate
SY   SY 1  r
2
2
2
)
29
(
)
Y

(
171 
Y 2 
6  5.139
N 
SY2 
6
N
SY  5.139  2.267
r   0.88
SY   2.267 1  (0.88)  1.08
2
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 36
Example 4
• Using the same data set,
calculate the proportion
of variance accounted for
and the proportion of
variance not accounted
for.
X
Y
1
8
2
6
3
6
4
5
5
1
6
3
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 37
Example 4—Proportion of Variance
Accounted For
The proportion of variance in Y that is accounted
2
for by a linear relationship with X is r . The
proportion of variance in Y that is not accounted
for by a linear relationship with X is 1  r 2.
r   0.88
r 2  0.774
1  r 2  0.226
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 38
Key Terms
• coefficient of alienation
• coefficient of
determination
• criterion variable
• heteroscedasticity
• homoscedasticity
• linear regression
equation
• linear regression line
• multiple correlation
coefficient
• multiple regression
equation
• predicted Y score
• predictor variable
• proportion of the
variance accounted
for
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 39
Key Terms (Cont’d)
• slope
• standard error of the
estimate
• variance of the Y
scores around Y 
• Y intercept
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website for classroom use.
Chapter 8 - 40