V10_REGR

advertisement
A continuation of
regression analysis
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 1
Lesson Objectives

Continue to build on
regression analysis.

Learn how residual plots
help identify problems with the analysis.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 2
Example 1:
continued …
Case X
Sample of n = 5 students,
Y = Weight in pounds,
X = Height in inches.
Y
1
73 175
2
68 158
3
67 140
4
72 207
5
62 115
Prediction equation:
^
Wt = – 332.73 + 7.189 Ht
r-square =
?
Std. error = ?
 Department of ISM, University of Alabama, 1992-2003
To be
found
later.
M23- Residuals & Minitab 3
Example 1, continued
220
WEIGHT
200
^
Y = – 332.7 + 7.189X


180
160

140

120
100

60
64
68
HEIGHT
 Department of ISM, University of Alabama, 1992-2003
Residuals =
distance from
point to line,
measured
parallel to
Y- axis.
72
76
M23- Residuals & Minitab 4
Calculation: For each case,
residual = observed value
estimated mean
For the ith case,
^
ei = yi - yi
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 5
Example 1, continued
Compute the fitted value and
residual for the 4th person in the
sample; i.e., X = 72 inches, Y = 207 lbs.
fitted value = ^
y 4 = -332.73 + 7.189(
= _________
^
residual = e4 = y4 - y
4
=
= __________
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 6
)
Scatterplot of residuals vs.
^
the predicted means of Y, Y;
or an X-variable.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 7
Example 1, continued
e4 = +22.12.
220
WEIGHT
200
^
Y = – 332.7 + 7.189X


180
160

140

120
100

60
64
68
HEIGHT
 Department of ISM, University of Alabama, 1992-2003
Residuals =
distance from
point to line,
measured
parallel to
Y- axis.
72
76
M23- Residuals & Minitab 8
Example 1, continued

24
Residuals
16
Residual Plot
8

0
-8
-16
-24

Regression line
from previous
plot is rotated
to horizontal.
60
e4 is the
residual
for the
4th case,
= +22.12.

64
68
HEIGHT
 Department of ISM, University of Alabama, 1992-2003

72
76
M23- Residuals & Minitab 9
Residual Plot
Scatterplot of residuals versus
^
the predicted means of Y, Y;
or an X-variable, or Time.
random
Expect
dispersion
around a horizontal line at zero.
Problems occur if:
• Unusual patterns
• Unusual cases
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 10
Residuals
Residuals versus X
0
l
l
l
l
l
l l l l
l l
ll
l
l l
l l l l
l
Good random pattern
 Department of ISM, University of Alabama, 1992-2003
X, or time
M23- Residuals & Minitab 11
Residuals versus X
Residuals
l
0
l
l
l l ll l l
l l
ll
ll
l
l
l
l
l
ll
l
l
Outliers?
 Department of ISM, University of Alabama, 1992-2003
Next step:
________ to determine
if a recording error
has occurred.
X, or time
M23- Residuals & Minitab 12
Residuals
Residuals versus X
Next step: Add a
“quadratic term,”
or use “______.”
ll l
ll
l
lll l l l
l l l l
ll
0
l
ll l
ll
l
l
ll
Nonlinear relationship
 Department of ISM, University of Alabama, 1992-2003
X, or time
M23- Residuals & Minitab 13
Residuals versus X
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
0 ll l l l
l
ll l
l l l
l l l
l
l l l l
l
Residuals
Next step:
Stabilize variance
by using “________.”
Variance is increasing
 Department of ISM, University of Alabama, 1992-2003
X, or time
M23- Residuals & Minitab 14
Residual Plots help identify
Unusual patterns:
 Possible curvature in the data.
 Variances that are
not constant as X changes.
Unusual cases:
 Outliers
 High leverage cases
 Influential cases
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 15
Three properties of
Residuals
illustrated with some
computations.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 16
Y = Weight
X = Height
X
73
68
67
72
62
Property 1.
^
Y = – 332.73 + 7.189 X
^
Y
Residuals
^
e=Y–Y
Y
175 192.07 –17.07
158 156.12
1.88
140
.
207
.
115
.
.01
Find
the sum
of the
residuals.
 round-off error
Properties of Least Squares Line
1. Residuals always sum to zero.
Sei = 0.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 18
Y = Weight
X = Height
X Y
73 175
68 158
67 140
72 207
62 115
Property 2.
^
Y = – 332.73 + 7.189 X
^
Y
^
e=Y–Y
192.07 –17.07
156.12
1.88
148.93 –8.93
184.88
22.12
112.99
2.01
Find the sum of squares
of the residuals.
.01
e2
291.38
3.53
79.74
489.29
4.04
867.98
Properties of Least Squares Line
1. Residuals always sum to zero.
2. This “least squares” line
produces a smaller “Sum of
squared residuals” than any
other straight line can.
Sei2 = SSE = 867.98 <
 Department of ISM, University of Alabama, 1992-2003
“SSE for
any other
line”.
M23- Residuals & Minitab 20
Property 3.
220
WEIGHT
200
X = 68.4, Y = 159


180
160
Y

140

120
100

60
64
68
HEIGHTX
 Department of ISM, University of Alabama, 1992-2003
72
76
M23- Residuals & Minitab 21
Properties of Least Squares Line
1. Residuals always sum to zero.
2. This “least squares” line
produces a smaller “Sum of
squared residuals” than any
other straight line can.
3. Line always passes through
the point ( x, y ).
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 22
Illustration of unusual cases:

Outliers

Leverage

Influential
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 23
Y
l
outlier
l
l
lll
l ll
ll
l
ll l
l
X
“Unusual point”
does not follow pattern.
It’s near the X-mean; the
entire line pulled toward it.
 Department of ISM, University of Alabama, 1992-2003
X
M23- Residuals & Minitab 24
l l
l l
l
l l
l
l l
ll
l
l l
Y
“Unusual point” does
not follow pattern. The
line is pulled down and
twisted slightly.
outlier
l
X 1992-2003
 Department of ISM, University of Alabama,
X
M23- Residuals & Minitab 25
Y
“Unusual point” is
far from the X-mean, but
still follows the pattern.
l
ll
l
l ll
ll
l
ll l
l
X
 Department of ISM, University of Alabama, 1992-2003
l
High
leverage
X
M23- Residuals & Minitab 26
Y
“Unusual point” is
far from the X-mean, but
does not follow the pattern.
Line really
twists!
l
l
ll
l
l ll
l
l
ll l l
l
l
leverage & outlier,
influential
X
 Department of ISM, University of Alabama, 1992-2003
X
M23- Residuals & Minitab 27
Definitions:
Outlier:
An unusual y-value relative to
the pattern of the other cases.
Usually has a large residual.
High Leverage Case:
An extreme X value relative
to the other X values.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 28
Definitions: continued
Influential Case
has an
unusually
large effect
on the slope of the
least squares line.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 29
Definitions: continued
Conclusion:
High leverage
potentially influential.
&
High leverage
Outlier
influential!!
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 30
Why do we care about
identifying unusual cases?
The least squares
regression line is
not resistant
to unusual cases.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 31
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 32
Lesson Objectives

Learn two ways to use
Minitab to run
a regression analysis.

Learn how to read output from Minitab.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 33
Example 3, continued …
Can height be predicted
using shoe size?
Step 1?
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 34
Example 3, continued …
Can height be predicted using shoe size?
84
Graph
Plot …
80
Scatterplot
Height
76
72
Female
Male
68
64
“Jitter” added
in X-direction.
60
The scatter for
56each
subpopulation is 5
about the same;
i.e., there is
“constant variance.”
6 7
8
9 10 11 12 13 14 15
Shoe Size
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 35
Example 3, continued …
Stat
Method 1
Regression
Regression …
Y = a + bX
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 36
Example 3, continued …
Copied from “Session Window.”
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
S = 1.947
Coef
50.5230
1.87241
SE Coef
0.5912
0.06033
T
85.45
31.04
R-Sq = 79.1%
P
0.000
0.000
R-Sq(adj) = 79.0%
Analysis of Variance
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 37
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
S = 1.947
Coef
50.5230
1.87241
SE Coef
0.5912
0.06033
Least squares
estimated
T
P
85.45 coefficients.
0.000
31.04
R-Sq = 79.1%
0.000
R-Sq(adj) = 79.0%
Total “Degrees of Freedom”
Analysis of Variance = Number of cases - 1
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 38
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
S = 1.947
Coef
50.5230
1.87241
SSR 3650.0
SE Coef
R-Sq = T = P
TSS 0.000
4616.3
0.5912
85.45
0.06033
31.04
R-Sq = 79.1%
0.000
R-Sq(adj) = 79.0%
Analysis of Variance
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 39
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Standard
Error+of1.87
Regression.
Height
= 50.5
Shoe Size
Measure of variation around
Predictor
Coef
SE Coef
Constant
50.5230 line.
0.5912
the regression
Shoe Siz
1.87241
0.06033
T
85.45
31.04
S = 1.947
R-Sq(adj) = 79.0%
R-Sq = 79.1%
P
0.000
0.000
S = MSE =
3.8
Analysis
SumofofVariance
squared residuals
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F Squared
P
Mean
963.26 0.000
Error
MSE
M23- Residuals & Minitab 40
Example 3, continued …
Can height be predicted using shoe size?
Are there any
problems visible
in this plot?
___________
No “Jitter” added.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 41
Example 3, continued …
Can height be predicted using shoe size?
Least squares regression equation:
Height = 50.52 + 1.872 Shoe
r-square = 79.1%, Std. error = 1.947 inches
The two summary measures
that should always
be
given with the equation.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 42
Example 3, continued …
Can height be predicted using shoe size?
Stat
Method 2
Regression
This program gives a scatterplot with
the regression superimposed on it.
Fitted Line Plot …
Y = a + bX
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 43
Example 3, continued …
Can height be predicted using shoe size?
Regression Plot
Height = 50.5230 + 1.87241 Shoe Size
S = 1.94659
R-Sq = 79.1 %
R-Sq(adj) = 79.0 %
80
Height
The fit looks
70
60
5
6
7
8
9
10
11
12
13
14
15
Shoe Size
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 44
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
What information do
these values provide?
The regression equation is
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
S = 1.947
Coef
50.5230
1.87241
SE Coef
0.5912
0.06033
T
85.45
31.04
R-Sq = 79.1%
P
0.000
0.000
R-Sq(adj) = 79.0%
Analysis of Variance
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 45
How do you determine if the
1
X-variable is a useful predictor?
Use the
“t-statistic” or the F-stat.
“t” measures how many standard
errors the estimated coefficient
is from “zero.”
“F” = t2 for simple regression.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 46
How do you determine if the
2
X-variable is a useful predictor?
A “P-value” is associated
with “t” and “F”.
The further “t” and “F” are from zero,
in either direction, the smaller the
corresponding P-value will be.
P-value: a measure of the “likelihood
that the true coefficient IS ZERO.”
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 47
If the P-value IS SMALL (typically “< 0.10”),
then conclude:
3
1. It is unlikely that the true coefficient
is really zero, and therefore,
2. The X variable IS a useful predictor
for the Y variable. Keep the variable!
If the P-value is NOT SMALL (i.e., “> 0.10”),
then conclude:
1. For all practical purposes the
true coefficient MAY BE ZERO; therefore
2. The X variable IS NOT a useful
predictor of the Y variable. Don’t use it.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 48
Example 3, continued …
Can height be predicted using shoe size?
Could
“shoe Analysis:
size”
Regression
Height versus Shoe Size
have a true
“t” measures how many standard
The regression
is the estimated coefficient
coefficient
that equation
errors
= “zero”?
50.5 + 1.87 Shoe
Size
is from
“zero.”
isHeight
actually
Predictor
Constant
Shoe Siz
S = 1.947
Coef
50.5230
1.87241
SE Coef
0.5912
0.06033
T
85.45
31.04
P
0.000
0.000
R-Sq = P-value:
79.1% a measure
R-Sq(adj)
= 79.0%
of the likelihood
that the true coefficient is “zero.”
Analysis
of Variance
The P-value
for Shoe Size IS SMALL (< 0.10).
Conclusion:
Source
DF
SS
MS
F
P
The “shoe
coefficient
is NOT
zero!
Regression
1 size”
3650.0
3650.0 963.26
0.000
Error
255
966.3
3.8
“Shoe
size”
IS
a
useful
predictor
Total
256
4616.3
of the mean of “height”.
M23- Residuals & Minitab
 Department of ISM, University of Alabama, 1992-2003
49
The logic just explained
is
statistical inference.
This will be covered in
more detail during the
last three weeks
of the course.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 50
Download