Stat 301 Regression Diagnostics Overview In this lab you will be introduced to using JMP for calculating and evaluating residuals, leverage and influence in regression. Warm-up Exercise In order to be able to save residuals, leverage h (hat) values, Cook’s D and studentized residuals you should use JMP’s Fit Model platform. We will demonstrate the use of Fit Model for simple linear regression but it will also work for multiple regression. In class we introduced the example involving the brain weight (g) and gestation (Days) for a random sample of 50 mammals. The response variable, Y, is Gestation (days). The explanatory variable is the BrainWgt (g). The data is available on the course web page www.public.iastate.edu/~wrstephe/stat301.html Go to the course web page and open the data for the mammal example with JMP. 1. In JMP go to Analyze and Fit Model. Cast Gestation (Days) into the role of Y. Add BrainWgt (g) to the Construct Model Effects box. Be sure to change the emphasis to: Minimal Report. Click on Run. The output consists of a plot of the data. You may wish to rescale the horizontal and vertical axes. You can also hide the parts of the output that you do not need. 2. From the red triangle pull down next to Response Gestation (Days). Select Save Columns: Residuals. This will create a new column in your JMP data table called Residual Gestation (Days). 3. Add a new column to your JMP data table. Give it the Column Name: Std Resid, z. Click on the Std Resid column and select Cols – Formula. Enter the formula: Residual Gestation (Days) divided by the RMSE found in your Fit Model output. You can Round the final answer. You can use Analyze – Distribution to look at the distribution of the standardized residuals. 4. Add a new column to your JMP data table. Give it the Column Name: P-value z. Enter the formula: (1 – Normal Distribution( |Std Resid, z| )) * 2. You can Round the probability. 5. From the red triangle pull down next to Response Gestation (Days). Select Save Columns: Hats. This will create a new column in your JMP data table called h Gestation (Days). These are the leverage values. You can Analyze – Distribution of π+1 the leverage values and include a cut off value 2 ( π ). 6. In order to evaluate statistical significance of leverage, you need JMP to calculate the F statistic and P-value for each data point. Add two columns to the JMP data table. Name one column: F and the second column: P-value F. You will need to enter formulas for F and P-value F. The formulas given on the next page are general formulas. You will have to put in the values of n and k that are appropriate for your regression model. 1 For the column named F, enter the formula: 1 (β πΊππ π‘ππ‘πππ (π·ππ¦π ) − π) [ ] π [ (1 − β πΊππ π‘ππ‘πππ (π·ππ¦π )) ] π−π−1 For the column named P-value F, enter the formula: 1 – F Distributon (F, k, n – k – 1) 7. From the red triangle pull down next to Response Gestation (Days). Select Save Columns: Cook’s D Influence. This will create a new column in your JMP data table called Cook’s D Influence. You can Analyze – Distribution of the Cook’s D values and include a cut off value 1. 8. From the red triangle pull down next to Response Gestation (Days) select Save Columns: Studentized Residuals. This will create a new column in your JMP data table called Studentized Resid Gestation (Days). You can Analyze – Distribution of the Studentized Resid Gestation (Days). 9. Add a new column to your JMP data table. Name this column: P-value Studentized Resid. Enter the formula: (1 – t Distribution( |Studentized Resid Gestation (Days)|, n – k – 1 )) * 2. You can Round the probability. Remember that you have to put in the actual numerical value for n – k – 1 into the formula. 2 JMP Output for Regression Diagnostics Standardized Residuals Five Number Summary 100.0% maximum 3.01 75.0% quartile 0.4315 50.0% median –0.366 25.0% quartile –0.6808 0.0% minimum –2.516 Standardized Residuals Species Brazilian Tapir Man Okapi BrainWgt (g) Gestation (Days) 169 1320 490 392 267 440 Residual Gestation (Days) 256.074 –214.073 207.817 Std Resid, z P-value z 3.010 –2.516 2.443 0.0026 0.0119 0.0146 Leverage, h Leverage, h Species Man Okapi BrainWgt (g) 1320 490 Gestation (Days) 267 440 h Gestation (Days) 0.661 0.084 F P-value F 90.84 3.35 1.190 e-12 0.07357 3 Cook’s D Influence Five Number Summary 100.0% maximum 18.2338 75.0% quartile 0.00903 50.0% median 0.00445 25.0% quartile 0.00178 0.0% minimum 1.74e-6 Cook’s D Species BrainWgt (g) 1320 Man Gestation (Days) 267 Std Resid, z –2.516 h Gestation (Days) 0.661 Cook’s D 18.23 Studentized Residual Studentized Residual Species Brazilian Tapir Man Okapi BrainWgt (g) 169 1320 490 Gestation (Days) 392 267 440 Std Resid, z 3.010 –2.516 2.443 h Gestation (Days) 0.02166 0.66120 0.08387 Studentized Resid 3.04 –4.32 2.55 P-value 0.0019 0.0000 0.0070 4