Stat 301 Regression Diagnostics

advertisement
Stat 301 Regression Diagnostics
Overview
In this lab you will be introduced to using JMP for calculating and evaluating residuals,
leverage and influence in regression.
Warm-up Exercise
In order to be able to save residuals, leverage h (hat) values, Cook’s D and studentized
residuals you should use JMP’s Fit Model platform. We will demonstrate the use of Fit
Model for simple linear regression but it will also work for multiple regression.
In class we introduced the example involving the brain weight (g) and gestation (Days)
for a random sample of 50 mammals. The response variable, Y, is Gestation (days). The
explanatory variable is the BrainWgt (g). The data is available on the course web page
www.public.iastate.edu/~wrstephe/stat301.html
Go to the course web page and open the data for the mammal example with JMP.
1. In JMP go to Analyze and Fit Model. Cast Gestation (Days) into the role of Y. Add
BrainWgt (g) to the Construct Model Effects box. Be sure to change the emphasis to:
Minimal Report. Click on Run. The output consists of a plot of the data. You may
wish to rescale the horizontal and vertical axes. You can also hide the parts of the
output that you do not need.
2. From the red triangle pull down next to Response Gestation (Days). Select Save
Columns: Residuals. This will create a new column in your JMP data table called
Residual Gestation (Days).
3. Add a new column to your JMP data table. Give it the Column Name: Std Resid, z.
Click on the Std Resid column and select Cols – Formula. Enter the formula:
Residual Gestation (Days) divided by the RMSE found in your Fit Model output.
You can Round the final answer. You can use Analyze – Distribution to look at the
distribution of the standardized residuals.
4. Add a new column to your JMP data table. Give it the Column Name: P-value z.
Enter the formula: (1 – Normal Distribution( |Std Resid, z| )) * 2. You can Round the
probability.
5. From the red triangle pull down next to Response Gestation (Days). Select Save
Columns: Hats. This will create a new column in your JMP data table called h
Gestation (Days). These are the leverage values. You can Analyze – Distribution of
π‘˜+1
the leverage values and include a cut off value 2 ( 𝑛 ).
6. In order to evaluate statistical significance of leverage, you need JMP to calculate the
F statistic and P-value for each data point. Add two columns to the JMP data table.
Name one column: F and the second column: P-value F. You will need to enter
formulas for F and P-value F. The formulas given on the next page are general
formulas. You will have to put in the values of n and k that are appropriate for your
regression model.
1
For the column named F, enter the formula:
1
(β„Ž πΊπ‘’π‘ π‘‘π‘Žπ‘‘π‘–π‘œπ‘› (π·π‘Žπ‘¦π‘ ) − 𝑛)
[
]
π‘˜
[
(1 − β„Ž πΊπ‘’π‘ π‘‘π‘Žπ‘‘π‘–π‘œπ‘› (π·π‘Žπ‘¦π‘ ))
]
𝑛−π‘˜−1
For the column named P-value F, enter the formula:
1 – F Distributon (F, k, n – k – 1)
7. From the red triangle pull down next to Response Gestation (Days). Select Save
Columns: Cook’s D Influence. This will create a new column in your JMP data table
called Cook’s D Influence. You can Analyze – Distribution of the Cook’s D values
and include a cut off value 1.
8. From the red triangle pull down next to Response Gestation (Days) select Save
Columns: Studentized Residuals. This will create a new column in your JMP data
table called Studentized Resid Gestation (Days). You can Analyze – Distribution of
the Studentized Resid Gestation (Days).
9. Add a new column to your JMP data table. Name this column: P-value Studentized
Resid. Enter the formula: (1 – t Distribution( |Studentized Resid Gestation (Days)|, n
– k – 1 )) * 2. You can Round the probability. Remember that you have to put in the
actual numerical value for n – k – 1 into the formula.
2
JMP Output for Regression Diagnostics
Standardized Residuals
Five Number Summary
100.0%
maximum
3.01
75.0%
quartile
0.4315
50.0%
median
–0.366
25.0%
quartile
–0.6808
0.0%
minimum
–2.516
Standardized Residuals
Species
Brazilian Tapir
Man
Okapi
BrainWgt (g)
Gestation (Days)
169
1320
490
392
267
440
Residual
Gestation (Days)
256.074
–214.073
207.817
Std Resid, z
P-value z
3.010
–2.516
2.443
0.0026
0.0119
0.0146
Leverage, h
Leverage, h
Species
Man
Okapi
BrainWgt
(g)
1320
490
Gestation
(Days)
267
440
h Gestation
(Days)
0.661
0.084
F
P-value F
90.84
3.35
1.190 e-12
0.07357
3
Cook’s D Influence
Five Number Summary
100.0%
maximum
18.2338
75.0%
quartile
0.00903
50.0%
median
0.00445
25.0%
quartile
0.00178
0.0%
minimum
1.74e-6
Cook’s D
Species
BrainWgt
(g)
1320
Man
Gestation
(Days)
267
Std Resid, z
–2.516
h Gestation
(Days)
0.661
Cook’s D
18.23
Studentized Residual
Studentized Residual
Species
Brazilian Tapir
Man
Okapi
BrainWgt
(g)
169
1320
490
Gestation
(Days)
392
267
440
Std Resid,
z
3.010
–2.516
2.443
h Gestation
(Days)
0.02166
0.66120
0.08387
Studentized
Resid
3.04
–4.32
2.55
P-value
0.0019
0.0000
0.0070
4
Download