M23- Residuals & Minitab Handout

advertisement
ST 260, M23
Residuals & Minitab
Lesson Objectives
ls
a
u
d
i
Res
q
q
Continue to build on
regression analysis .
Learn how residual plots
help identify problems with the analysis.
A continuation of
regression analysis
Example 1:
continued …
Case X
M23- Residuals & Minitab 1
Sample of n = 5 students,
Y = Weight in pounds,
X = Height in inches.
Y
1
73 175
2
68 158
3
67 140
4
5
72 207
62 115
Prediction equation:
^ = – 332.73 + 7.189 Ht
Wt
r-square =
?
Std. error = ?
 Department of ISM, University of Alabama, 1992-2003
To be
found
later.
M23- Residuals & Minitab 3
M23- Residuals & Minitab 2
 Department of ISM, University of Alabama, 1992-2003
Example 1, continued
220
200
WEIGHT
 Department of ISM, University of Alabama, 1992-2003
^
Y = – 332.7 + 7.189X
7.189X
•
•
180
160
•
140
•
120
100
60
•
64
68
HEIGHT
 Department of ISM, University of Alabama, 1992-2003
Residuals =
distance from
point to line,
measured
parallel to
Y- axis.
72
76
M23- Residuals & Minitab 4
Example 1, continued
Calculation: For each case,
residual = observed value
estimated mean
Compute the fitted value and
residual for the 4th person in the
sample; i.e., X = 72 inches, Y = 207 lbs.
fitted value = ^
y 4 = -332.73 + 7.189(
For the ith case,
)
= _________
ei = yi - ^
yi
^4
residual = e4 = y4 - y
=
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 5
= __________
 Department of ISM, University of Alabama, 1992-2003
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 6
1
ST 260, M23
Residuals & Minitab
Example 1, continued
e4 = +22.12.
220
al
u
d
i
s
Re
Plots
WEIGHT
200
Example 1, continued
•
24
Residuals
16
Residual Plot
8
•
0
-8
-16
-24
Regression line
from previous
plot is rotated
to horizontal.
60
e4 is the
residual
for the
4th case,
= +22.12.
•
 Department of ISM, University of Alabama, 1992-2003
•
140
•
60
•
64
68
HEIGHT
 Department of ISM, University of Alabama, 1992-2003
Residuals =
distance from
point to line,
measured
parallel to
Y- axis.
72
76
M23- Residuals & Minitab 8
Residual Plot
Scatterplot of residuals versus
^
the predicted means of Y, Y;
or an X-variable , or Time .
random
Expect
dispersion
around a horizontal line at zero.
•
64
68
HEIGHT
160
100
M23- Residuals & Minitab 7
 Department of ISM, University of Alabama, 1992-2003
•
180
120
Scatterplot of residuals vs.
the predicted means of Y, ^
Y;
or an X-variable.
•
^
Y = – 332.7 + 7.189X
•
72
76
M23- Residuals & Minitab 9
Problems occur if:
• Unusual patterns
• Unusual cases
M23- Residuals & Minitab 10
 Department of ISM, University of Alabama, 1992-2003
Residuals versus X
Residuals versus X
0
l l
l
l
l l l l
l l
ll
l
l l
l l l l
l
l
Residuals
Residuals
l
0
l ll l
l ll l l l
ll l
l
l
l
l l l ll l
l
Good random pattern
 Department of ISM, University of Alabama, 1992-2003
X, or time
M23- Residuals & Minitab 11
Outliers?
 Department of ISM, University of Alabama, 1992-2003
 Department of ISM, University of Alabama, 1992-2003
Next step:
________ to determine
if a recording error
has occurred.
X, or time
M23- Residuals & Minitab 12
2
ST 260, M23
Residuals & Minitab
Residuals versus X
Residuals
l l
l l ll l
lll l l
l l l l
0
l
ll l l
l
ll
l
l
ll
Nonlinear relationship
 Department of ISM, University of Alabama, 1992-2003
X, or time
M23- Residuals & Minitab 13
Next step:
Stabilize variance
by using “________.”
l l
l
l ll l
l
l
l l
l l l
l
l
0 ll l l l
l l
ll l
ll l
l l l
l l l ll
l
Residuals
Next step: Add a
“quadratic term,”
or use “ ______.”
Residuals versus X
Variance is increasing
 Department of ISM, University of Alabama, 1992-2003
X, or time
M23- Residuals & Minitab 14
Residual Plots help identify
Unusual patterns:
q Possible curvature in the data.
q Variances that are
not constant as X changes.
Three properties of
Residuals
Unusual cases:
q Outliers
q High leverage cases
q Influential cases
 Department of ISM, University of Alabama, 1992-2003
Y = Weight
X = Height
M23- Residuals & Minitab 15
illustrated with some
computations.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 16
Property 1.
^
Y = – 332.73 + 7.189 X
Properties of Least Squares Line
Residuals
X
73
68
67
72
62
^
e= Y–^
Y
Y
Y
175 192.07 –17.07 Find
sum
158 156.12
1.88 the
of the
residuals.
140
.
207
.
115
.
.01 ç round -off error
1. Residuals always sum to zero.
Σ ei = 0.
 Department of ISM, University of Alabama, 1992-2003
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 18
3
ST 260, M23
Residuals & Minitab
Y = Weight
X = Height
X Y
73 175
68 158
67 140
72 207
62 115
Property 2.
^
Y = – 332.73 + 7.189 X
Properties of Least Squares Line
^
Y
1. Residuals always sum to zero.
192.07
156.12
148.93
184.88
112.99
Find the sum of squares
of the residuals.
WEIGHT
X = 68.4, Y = 159
•
Σ ei2 = SSE = 867.98 <
 Department of ISM, University of Alabama, 1992-2003
“SSE for
any other
line”.
M23- Residuals & Minitab 20
160
Y
140
•
120
60
•
•
64
68
HEIGHTX
 Department of ISM, University of Alabama, 1992-2003
Properties of Least Squares Line
•
180
100
2. This “least squares” line
produces a smaller “Sum of
squared residuals” than any
other straight line can.
Property 3.
220
200
e= Y–^
Y e2
–17.07 291.38
1.88
3.53
–8.93
79.74
22.12 489.29
2.01
4.04
.01 867.98
72
1. Residuals always sum to zero.
76
M23- Residuals & Minitab 21
2. This “least squares” line
produces a smaller “Sum of
squared residuals” than any
other straight line can.
3. Line always passes through
the point ( x, y ).
 Department of ISM, University of Alabama, 1992-2003
Y
Illustration of unusual cases:
q
Outliers
q
Leverage
q
Influential
 Department of ISM, University of Alabama, 1992-2003
l
outlier
ll
ll
l ll l
l
llll l
l
M23- Residuals & Minitab 23
M23- Residuals & Minitab 22
X
“Unusual point”
does not follow pattern
pattern.
It’s near the XX -mean
mean; the
entire line pulled toward it.
 Department of ISM, University of Alabama, 1992-2003
 Department of ISM, University of Alabama, 1992-2003
X
M23- Residuals & Minitab 24
4
ST 260, M23
Residuals & Minitab
l l
l l
l
l l
l
l l
ll
l
l l
Y
“Unusual point” does
not follow pattern
pattern. The
line is pulled down and
twisted slightly
slightly.
outlier
Y
“Unusual point” is
far from the X-mean
mean, but
still follows the pattern
pattern.
l
ll
ll
l
l
l
l
llll l
l
High
leverage
l
X
 Department of ISM, University of Alabama, 1992-2003
Y
X
M23- Residuals & Minitab 25
“Unusual point” is
far from the XX-mean
mean, but
does not follow the pattern
pattern.
X
M23- Residuals & Minitab 26
Definitions:
Outlier:
Line really twists!
ll
ll
l ll l
l
llll l
l
X
 Department of ISM, University of Alabama, 1992-2003
An unusual y-value relative to
the pattern of the other cases.
l
Usually has a large residual.
leverage & outlier,
influential
X
 Department of ISM, University of Alabama, 1992-2003
X
M23- Residuals & Minitab 27
High Leverage Case:
An extreme X value relative
to the other X values.
 Department of ISM, University of Alabama, 1992-2003
Definitions: continued
Definitions: continued
Influential Case
Conclusion:
has an
unusually
large effect
on the slope of the
least squares line.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 29
M23- Residuals & Minitab 28
High leverage
potentially influential.
&
High leverage
Outlier
influential!!
 Department of ISM, University of Alabama, 1992-2003
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 30
5
ST 260, M23
Residuals & Minitab
Why do we care about
identifying unusual cases?
The least squares
regression line is
not resistant
to unusual cases.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 31
n
o
i
s
s
e
Regr ysis
Anal tab
i
in Min
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 32
Example 3, continued …
Lesson Objectives
Can height be predicted
using shoe size?
Step 1?
q
q
Learn two ways to use
Minitab to run
a regression analysis.
Learn how to read output from Minitab.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 33
DTDP
 Department of ISM, University of Alabama, 1992-2003
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 34
6
ST 260, M23
Residuals & Minitab
Example 3, continued …
Can height be predicted using shoe size?
84
Graph
80
Plot …
Scatterplot
Height
76
72
Female
Male
68
64
“Jitter” added
in X-direction.
60
The scatter for
56each
subpopulation is 5
about the same;
i.e., there is
“constant variance.”
6 7
8 9 10 11 12 13 14 15
Shoe Size
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 35
Example 3, continued …
Stat
Method 1
Regression
Regression …
Y = a + bX
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 36
 Department of ISM, University of Alabama, 1992-2003
7
ST 260, M23
Residuals & Minitab
Example 3, continued …
Copied from “Session Window.”
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
Coef
50.5230
1.87241
S = 1.947
SE Coef
0.5912
0.06033
R-Sq = 79.1%
T
85.45
31.04
P
0.000
0.000
R-Sq(adj) = 79.0%
Analysis of Variance
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 37
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
Coef
50.5230
1.87241
S = 1.947
R-Sq = 79.1%
Analysis of Variance
Source
DF
Regression
1
Error
255
Total
256
SE Coef
0.5912
0.06033
Least squares
estimated
T
P
coefficients.
85.45 0.000
31.04
0.000
R-Sq(adj) = 79.0%
Total “Degrees of Freedom”
= Number of cases - 1
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 38
 Department of ISM, University of Alabama, 1992-2003
8
ST 260, M23
Residuals & Minitab
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
Coef
50.5230
1.87241
S = 1.947
SSR
3650.0
SE Coef
R-Sq = T = P
0.5912
85.45
TSS 0.000
4616.3
0.06033
31.04 0.000
R-Sq = 79.1%
R-Sq(adj) = 79.0%
Analysis of Variance
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 39
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
The regression equation is
Standard
Error+of1.87
Regression.
Height
= 50.5
Shoe Size
Measure of variation
around
Predictor
Coef
SE
Coef
Constant
50.5230 line.
0.5912
the regression
Shoe Siz
1.87241
0.06033
T
85.45
31.04
S = 1.947
R-Sq(adj) = 79.0%
R-Sq = 79.1%
P
0.000
0.000
S = MSE = 3.8
Analysis
SumofofVariance
squared residuals
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F Squared
P
Mean
963.26 0.000
Error
MSE
M23- Residuals & Minitab 40
 Department of ISM, University of Alabama, 1992-2003
9
ST 260, M23
Residuals & Minitab
Example 3, continued …
Can height be predicted using shoe size?
Residuals Versus Shoe Siz
(response is Height)
Are there any
problems visible
in this plot?
___________
Residual
5
0
-5
5
10
15
Shoe Siz
 Department of ISM, University of Alabama, 1992-2003
No “Jitter” added.
M23- Residuals & Minitab 41
Example 3, continued …
Can height be predicted using shoe size?
Least squares regression equation:
Height = 50.52 + 1.872 Shoe
r-square = 79.1%, Std. error = 1.947 inches
The two summary measures
always be
given with the equation.
that should
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 42
 Department of ISM, University of Alabama, 1992-2003
10
ST 260, M23
Residuals & Minitab
Example 3, continued …
Can height be predicted using shoe size?
Stat
Method 2
This program gives a scatterplot with
the regression superimposed on it.
Regression
Fitted Line Plot …
Y = a + bX
M23- Residuals & Minitab 43
 Department of ISM, University of Alabama, 1992-2003
Example 3, continued …
Can height be predicted using shoe size?
Regression Plot
Height = 50.5230 + 1.87241 Shoe Size
S = 1.94659
R-Sq = 79.1 %
R-Sq(adj) = 79.0 %
80
Height
The fit looks
70
60
5
6
7
8
9
10
11
12
13
14
15
Shoe Size
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 44
 Department of ISM, University of Alabama, 1992-2003
11
ST 260, M23
Residuals & Minitab
Example 3, continued …
Can height be predicted using shoe size?
Regression Analysis: Height versus Shoe Size
What information do
values provide?
The regression equation is these
Height = 50.5 + 1.87 Shoe Size
Predictor
Constant
Shoe Siz
Coef
50.5230
1.87241
S = 1.947
SE Coef
0.5912
0.06033
R-Sq = 79.1%
T
85.45
31.04
P
0.000
0.000
R-Sq(adj) = 79.0%
Analysis of Variance
Source
DF
Regression
1
Error
255
Total
256
SS
3650.0
966.3
4616.3
MS
3650.0
3.8
 Department of ISM, University of Alabama, 1992-2003
F
P
963.26 0.000
M23- Residuals & Minitab 45
How do you determine if the
1
X-variable is a useful predictor?
Use the
“t-statistic” or the F-stat.
“t” measures how many standard
errors the estimated coefficient
is from “zero.”
“F” = t2 for simple regression.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 46
 Department of ISM, University of Alabama, 1992-2003
12
ST 260, M23
Residuals & Minitab
How do you determine if the
2
X-variable is a useful predictor?
A “P-value” is associated
with “t” and “F”.
The further “t” and “F” are from zero,
in either direction, the smaller the
corresponding P-value will be.
P-value: a measure of the “likelihood
that the true coefficient IS ZERO.”
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 47
If the P-value IS SMALL (typically “< 0.10”),
then conclude:
3
1. It is unlikely that the true coefficient
is really zero, and therefore,
2. The X variable IS a useful predictor
for the Y variable. Keep the variable!
If the P-value is NOT SMALL (i.e., “> 0.10”),
then conclude:
1. For all practical purposes the
true coefficient MAY BE ZERO; therefore
2. The X variable IS NOT a useful
predictor of the Y variable. Don’t use it.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 48
 Department of ISM, University of Alabama, 1992-2003
13
ST 260, M23
Residuals & Minitab
Example 3, continued …
Can height be predicted using shoe size?
Could
“shoeAnalysis:
size”
Regression
Height versus Shoe Size
have a true
“t” measures how many standard
The regression
is the estimated coefficient
coefficient
that equation
errors
Height = 50.5 + 1.87 Shoe
Size
is from
“zero.”
is actually “zero”?
Predictor
Constant
Shoe Siz
S = 1.947
Coef
50.5230
1.87241
SE Coef
0.5912
0.06033
T
85.45
31.04
P
0.000
0.000
R-Sq = P-value:
79.1% a measure
R-Sq(adj)
79.0%
of the =
likelihood
that the true coefficient is “zero.”
Analysis
of Variance
The P-value
for Shoe
Size IS SMALL (< 0.10).
Conclusion:
Source
DF
SS
MS
F
P
Regression
1 size”
3650.0
3650.0 963.26
0.000
The “shoe
coefficient
is NOT
zero!
Error
966.3
“Shoe 255
size” 4616.3
IS
a useful3.8
predictor
Total
256
of the mean of “height”.
M23- Residuals & Minitab
 Department of ISM, University of Alabama, 1992-2003
49
The logic just explained
is
statistical inference.
This will be covered in
more detail during the
last three weeks
of the course.
 Department of ISM, University of Alabama, 1992-2003
M23- Residuals & Minitab 50
 Department of ISM, University of Alabama, 1992-2003
14
Download