Influential Observations in Regression Measurements on Heat Production as a Function of Body Mass and Work Effort. M. Greenwood (1918). “On the Efficiency of Muscular Work,” Proc. Roy. Soc. Of London, Series B, Vol. 90, #627, pp. 199-214 Data Description • Study involved Algerians accustomed to heavy labor. Experiment consisted of several hours on stationary bicycle. • Dependent (Response) Variable: – Heat Production (Calories) • Independent (Explanatory/Predictor) Variables: – Work Effort (Calories) – Body Mass (kg) • Model: – H = b0 + b1 W + b2 M + e Raw Data (Table III, p.203) M 76.2 71.3 69.6 58.0 74.6 68.9 69.1 62.1 68.7 65.4 70.4 69.1 63.7 62.1 73.5 61.3 70.1 79.8 61.3 W 156.8 114.1 142.6 142.6 142.6 128.3 142.6 156.8 128.3 142.6 128.3 142.6 142.6 114.1 142.6 129.4 137.5 121.3 129.4 H 3398 2988 3048 2781 2912 3135 3261 3030 3139 2996 3248 3117 2891 2667 3403 2999 3318 2989 3936 M 64.8 60.2 72.4 68.9 70.1 70.8 66.5 66.7 71.2 72.4 69.3 67.4 69.6 66.2 74.5 67.7 57.5 70.4 W 137.5 129.7 97.1 129.4 129.4 161.7 129.4 137.5 129.4 145.6 129.4 145.6 161.7 121.3 121.3 97.1 97.1 113.2 H 3020 2812 2962 3236 3214 3389 2908 3063 2956 3023 3001 2841 3117 2733 2808 2813 2615 2814 Estimated Regression Coefficients Intercept W M Coefficients 1536.5098 6.1563 10.1409 Standard Error 584.4994 2.3659 7.6826 t Stat 2.6288 2.6022 1.3200 P-value 0.0128 0.0136 0.1957 Lower 95% 348.6642 1.3483 -5.4720 Upper 95% 2724.3554 10.9643 25.7538 ^ H 1536.51 + 6.16W + 10.14M Note that that we can conclude, controlling for the other factor: Work Effort increase Heat Production increases (p = .0136) Body Mass increase does not Heat Production increases (p = .1957) Plot of Residuals versus Fitted Values 1200 1000 Huge, Positive, Residual 800 600 400 200 0 -200 -400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 Influential Measures (I) Note: n=37, p*=3 Parameters Standardiz ed and Studentize d Residuals : Outlier ri t .05 , 373 2 ( 37) 3.74 or ri* t .05 , 374 2 ( 37) 3.75 Leverage (Hat) Values : Potentiall y influentia l wrt to X - levels ii 2(3) 0.162 37 DFFITS : Highly influentia l on OWN fitted value DDFFITS i 2 3 0.569 37 DFBETAS (One for each regression coefficien t) : Highly influentia l on coefficien t DFBETAS j (i ) 2 0.329 37 Cook' s D : Highly influentia l on group of coefficien ts Di F.50,3,35 0.804 Covariance Ratio : Highly influentia l on Std Errors of Reg Coeffs COVRi outside 1 3(3) 0.76,1.24 37 Standardized / Studentized Residuals Obs# e(i) r(i) r*(i) Obs# e(i) r(i) r*(i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 123.44 26.01 -72.21 -221.58 -258.91 109.92 145.86 -101.57 115.95 -81.62 207.71 1.86 -169.38 -201.70 243.24 44.22 224.12 -103.52 981.22 0.5856 0.1184 -0.3222 -1.0582 -1.1808 0.4880 0.6505 -0.4796 0.5146 -0.3659 0.9247 0.0083 -0.7654 -0.9280 1.1010 0.2016 0.9968 -0.5067 4.4726 0.5799 0.1167 -0.3179 -1.0601 -1.1879 0.4824 0.6448 -0.4741 0.5090 -0.3612 0.9226 0.0082 -0.7607 -0.9260 1.1045 0.1987 0.9967 -0.5011 6.8679 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 -20.14 -133.47 93.51 204.15 169.98 139.03 -99.51 3.60 -99.17 -144.07 -34.90 -275.37 -120.80 -221.60 -230.77 -7.83 -102.39 -133.33 -0.0900 -0.6145 0.4546 0.9059 0.7557 0.6487 -0.4420 0.0160 -0.4424 -0.6506 -0.1549 -1.2335 -0.5626 -0.9905 -1.0581 -0.0373 -0.5215 -0.6062 -0.0887 -0.6087 0.4492 0.9034 0.7509 0.6431 -0.4367 0.0158 -0.4371 -0.6449 -0.1527 -1.2433 -0.5568 -0.9903 -1.0600 -0.0368 -0.5158 -0.6005 Influential Measures (II) Obs# df.b0 df.b(M) df.b(W) df.fit cov.r cook.d hat inf 1 -0.2080 0.1545 0.1423 0.2440 1.2488 2.02E-02 0.1505 2 0.0008 0.0151 -0.0243 0.0339 1.1846 3.95E-04 0.0779 3 0.0252 -0.0124 -0.0327 -0.0645 1.1283 1.42E-03 0.0395 4 -0.2907 0.4070 -0.1609 -0.4655 1.1799 7.20E-02 0.1616 5 0.2717 -0.2554 -0.1046 -0.3516 1.0491 4.07E-02 0.0806 6 0.0042 0.0143 -0.0219 0.0843 1.1036 2.43E-03 0.0296 7 -0.0418 0.0141 0.0674 0.1291 1.0956 5.65E-03 0.0385 8 -0.0353 0.1168 -0.1394 -0.1931 1.2493 1.27E-02 0.1422 9 0.0073 0.0116 -0.0228 0.0884 1.1006 2.67E-03 0.0293 10 -0.0153 0.0381 -0.0425 -0.0817 1.1361 2.28E-03 0.0487 11 -0.0319 0.0747 -0.0467 0.1760 1.0501 1.04E-02 0.0351 12 -0.0005 0.0002 0.0009 0.0016 1.1375 9.19E-07 0.0385 13 -0.0704 0.1258 -0.0945 -0.1984 1.1087 1.33E-02 0.0637 14 -0.2602 0.1802 0.1650 -0.3030 1.1211 3.07E-02 0.0967 15 -0.2151 0.1934 0.1007 0.2953 1.0509 2.89E-02 0.0667 16 0.0453 -0.0471 -0.0017 0.0585 1.1841 1.17E-03 0.0797 17 -0.0691 0.0609 0.0471 0.1853 1.0352 1.14E-02 0.0334 18 0.1503 -0.2257 0.0858 -0.2521 1.3397 2.17E-02 0.2020 * 19 1.5659 -1.6273 -0.0604 2.0203 0.0829 5.77E-01 0.0797 * Influential Measures (III) Obs# df.b0 df.b(M) df.b(W) df.fit cov.r cook.d hat inf 20 -0.0074 0.0107 -0.0058 -0.0190 1.1429 1.24E-04 0.0437 21 -0.1593 0.1696 0.0011 -0.2004 1.1723 1.36E-02 0.0978 22 0.0270 0.0890 -0.1893 0.2181 1.3271 1.62E-02 0.1908 23 0.0031 0.0257 -0.0306 0.1555 1.0465 8.10E-03 0.0288 24 -0.0234 0.0521 -0.0285 0.1379 1.0746 6.42E-03 0.0326 25 -0.1374 0.0406 0.2021 0.2392 1.1994 1.94E-02 0.1216 26 -0.0317 0.0233 0.0113 -0.0778 1.109 2.07E-03 0.0307 27 0.0005 -0.0009 0.0009 0.0029 1.1307 2.88E-06 0.0327 28 0.0275 -0.0469 0.0183 -0.0881 1.1186 2.65E-03 0.0391 29 0.1138 -0.0860 -0.0817 -0.1661 1.1232 9.36E-03 0.0622 30 0.0012 -0.0064 0.0054 -0.0267 1.1248 2.45E-04 0.0297 31 0.0372 0.0494 -0.1772 -0.2762 1.0004 2.50E-02 0.0470 32 0.0986 -0.0112 -0.1770 -0.2041 1.2063 1.42E-02 0.1184 33 -0.1189 0.0552 0.1097 -0.2100 1.0468 1.47E-02 0.0430 34 0.1308 -0.2493 0.1507 -0.3341 1.0875 3.71E-02 0.0904 35 -0.0075 -0.0008 0.0146 -0.0160 1.301 8.82E-05 0.1595 * 36 -0.2862 0.1937 0.1984 -0.3081 1.4485 3.23E-02 0.2629 * 37 -0.0225 -0.0592 0.1286 -0.1711 1.1445 9.94E-03 0.0751 * Diagnosing Influential Observations • Clearly, Observation #19 exerts a huge influence (although it has a small hat or leverage value, so it must be near center of Mass/Work observations • Upon further review to author’s original calculations provided in paper, the mean and S.D. are much to high for H (but exactly the same for M and W). • Could observation been a “typo”? • Try replacing H19=3936 with H19=2936 • Note: Do not do this arbitrarily, check your data sources in practice Analysis with Corrected Data Point Intercept W M Coefficients 977.4254 6.2436 17.7777 Standard Error 376.0531 1.5221 4.9428 t Stat 2.5992 4.1019 3.5967 P-value 0.0137 0.0002 0.0010 Lower 95% Upper 95% 213.1935 1741.6572 3.1503 9.3370 7.7327 27.8227 ^ H 977.43 + 6.24W + 17.78M Note that both factors are significant, and that the intercept and body mass coefficients have changed drastically Plot of Residuals versus Predicted Values 300 200 100 0 -100 -200 -300 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400