Multiple Linear Regression - Efficiency of Muscle Work (Diagnostics)

advertisement
Influential Observations in Regression
Measurements on Heat Production as
a Function of Body Mass and Work
Effort.
M. Greenwood (1918). “On the Efficiency of Muscular Work,” Proc. Roy. Soc.
Of London, Series B, Vol. 90, #627, pp. 199-214
Data Description
• Study involved Algerians accustomed to heavy
labor. Experiment consisted of several hours on
stationary bicycle.
• Dependent (Response) Variable:
– Heat Production (Calories)
• Independent (Explanatory/Predictor) Variables:
– Work Effort (Calories)
– Body Mass (kg)
• Model:
– H = b0 + b1 W + b2 M + e
Raw Data (Table III, p.203)
M
76.2
71.3
69.6
58.0
74.6
68.9
69.1
62.1
68.7
65.4
70.4
69.1
63.7
62.1
73.5
61.3
70.1
79.8
61.3
W
156.8
114.1
142.6
142.6
142.6
128.3
142.6
156.8
128.3
142.6
128.3
142.6
142.6
114.1
142.6
129.4
137.5
121.3
129.4
H
3398
2988
3048
2781
2912
3135
3261
3030
3139
2996
3248
3117
2891
2667
3403
2999
3318
2989
3936
M
64.8
60.2
72.4
68.9
70.1
70.8
66.5
66.7
71.2
72.4
69.3
67.4
69.6
66.2
74.5
67.7
57.5
70.4
W
137.5
129.7
97.1
129.4
129.4
161.7
129.4
137.5
129.4
145.6
129.4
145.6
161.7
121.3
121.3
97.1
97.1
113.2
H
3020
2812
2962
3236
3214
3389
2908
3063
2956
3023
3001
2841
3117
2733
2808
2813
2615
2814
Estimated Regression Coefficients
Intercept
W
M
Coefficients
1536.5098
6.1563
10.1409
Standard Error
584.4994
2.3659
7.6826
t Stat
2.6288
2.6022
1.3200
P-value
0.0128
0.0136
0.1957
Lower 95%
348.6642
1.3483
-5.4720
Upper 95%
2724.3554
10.9643
25.7538
^
H  1536.51 + 6.16W + 10.14M
Note that that we can conclude, controlling for the other factor:
 Work Effort increase  Heat Production increases (p = .0136)
 Body Mass increase does not  Heat Production increases (p = .1957)
Plot of Residuals versus Fitted Values
1200
1000
Huge,
Positive,
Residual
800
600
400
200
0
-200
-400
2500
2600
2700
2800
2900
3000
3100
3200
3300
3400
Influential Measures (I)
Note: n=37, p*=3 Parameters
Standardiz ed and Studentize d Residuals :
Outlier  ri  t
.05
, 373
2 ( 37)
 3.74 or ri*  t
.05
, 374
2 ( 37)
 3.75
Leverage (Hat) Values :
Potentiall y influentia l wrt to X - levels   ii 
2(3)
 0.162
37
DFFITS :
Highly influentia l on OWN fitted value  DDFFITS i  2
3
 0.569
37
DFBETAS (One for each regression coefficien t) :
Highly influentia l on coefficien t  DFBETAS j (i ) 
2
 0.329
37
Cook' s D :
Highly influentia l on group of coefficien ts  Di  F.50,3,35  0.804
Covariance Ratio :
Highly influentia l on Std Errors of Reg Coeffs  COVRi outside 1 
3(3)
 0.76,1.24 
37
Standardized / Studentized Residuals
Obs#
e(i)
r(i)
r*(i)
Obs#
e(i)
r(i)
r*(i)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
123.44
26.01
-72.21
-221.58
-258.91
109.92
145.86
-101.57
115.95
-81.62
207.71
1.86
-169.38
-201.70
243.24
44.22
224.12
-103.52
981.22
0.5856
0.1184
-0.3222
-1.0582
-1.1808
0.4880
0.6505
-0.4796
0.5146
-0.3659
0.9247
0.0083
-0.7654
-0.9280
1.1010
0.2016
0.9968
-0.5067
4.4726
0.5799
0.1167
-0.3179
-1.0601
-1.1879
0.4824
0.6448
-0.4741
0.5090
-0.3612
0.9226
0.0082
-0.7607
-0.9260
1.1045
0.1987
0.9967
-0.5011
6.8679
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-20.14
-133.47
93.51
204.15
169.98
139.03
-99.51
3.60
-99.17
-144.07
-34.90
-275.37
-120.80
-221.60
-230.77
-7.83
-102.39
-133.33
-0.0900
-0.6145
0.4546
0.9059
0.7557
0.6487
-0.4420
0.0160
-0.4424
-0.6506
-0.1549
-1.2335
-0.5626
-0.9905
-1.0581
-0.0373
-0.5215
-0.6062
-0.0887
-0.6087
0.4492
0.9034
0.7509
0.6431
-0.4367
0.0158
-0.4371
-0.6449
-0.1527
-1.2433
-0.5568
-0.9903
-1.0600
-0.0368
-0.5158
-0.6005
Influential Measures (II)
Obs#
df.b0
df.b(M)
df.b(W)
df.fit
cov.r
cook.d
hat
inf
1
-0.2080
0.1545
0.1423
0.2440
1.2488
2.02E-02
0.1505
2
0.0008
0.0151
-0.0243
0.0339
1.1846
3.95E-04
0.0779
3
0.0252
-0.0124
-0.0327
-0.0645
1.1283
1.42E-03
0.0395
4
-0.2907
0.4070
-0.1609
-0.4655
1.1799
7.20E-02
0.1616
5
0.2717
-0.2554
-0.1046
-0.3516
1.0491
4.07E-02
0.0806
6
0.0042
0.0143
-0.0219
0.0843
1.1036
2.43E-03
0.0296
7
-0.0418
0.0141
0.0674
0.1291
1.0956
5.65E-03
0.0385
8
-0.0353
0.1168
-0.1394
-0.1931
1.2493
1.27E-02
0.1422
9
0.0073
0.0116
-0.0228
0.0884
1.1006
2.67E-03
0.0293
10
-0.0153
0.0381
-0.0425
-0.0817
1.1361
2.28E-03
0.0487
11
-0.0319
0.0747
-0.0467
0.1760
1.0501
1.04E-02
0.0351
12
-0.0005
0.0002
0.0009
0.0016
1.1375
9.19E-07
0.0385
13
-0.0704
0.1258
-0.0945
-0.1984
1.1087
1.33E-02
0.0637
14
-0.2602
0.1802
0.1650
-0.3030
1.1211
3.07E-02
0.0967
15
-0.2151
0.1934
0.1007
0.2953
1.0509
2.89E-02
0.0667
16
0.0453
-0.0471
-0.0017
0.0585
1.1841
1.17E-03
0.0797
17
-0.0691
0.0609
0.0471
0.1853
1.0352
1.14E-02
0.0334
18
0.1503
-0.2257
0.0858
-0.2521
1.3397
2.17E-02
0.2020
*
19
1.5659
-1.6273
-0.0604
2.0203
0.0829
5.77E-01
0.0797
*
Influential Measures (III)
Obs#
df.b0
df.b(M)
df.b(W)
df.fit
cov.r
cook.d
hat
inf
20
-0.0074
0.0107
-0.0058
-0.0190
1.1429
1.24E-04
0.0437
21
-0.1593
0.1696
0.0011
-0.2004
1.1723
1.36E-02
0.0978
22
0.0270
0.0890
-0.1893
0.2181
1.3271
1.62E-02
0.1908
23
0.0031
0.0257
-0.0306
0.1555
1.0465
8.10E-03
0.0288
24
-0.0234
0.0521
-0.0285
0.1379
1.0746
6.42E-03
0.0326
25
-0.1374
0.0406
0.2021
0.2392
1.1994
1.94E-02
0.1216
26
-0.0317
0.0233
0.0113
-0.0778
1.109
2.07E-03
0.0307
27
0.0005
-0.0009
0.0009
0.0029
1.1307
2.88E-06
0.0327
28
0.0275
-0.0469
0.0183
-0.0881
1.1186
2.65E-03
0.0391
29
0.1138
-0.0860
-0.0817
-0.1661
1.1232
9.36E-03
0.0622
30
0.0012
-0.0064
0.0054
-0.0267
1.1248
2.45E-04
0.0297
31
0.0372
0.0494
-0.1772
-0.2762
1.0004
2.50E-02
0.0470
32
0.0986
-0.0112
-0.1770
-0.2041
1.2063
1.42E-02
0.1184
33
-0.1189
0.0552
0.1097
-0.2100
1.0468
1.47E-02
0.0430
34
0.1308
-0.2493
0.1507
-0.3341
1.0875
3.71E-02
0.0904
35
-0.0075
-0.0008
0.0146
-0.0160
1.301
8.82E-05
0.1595
*
36
-0.2862
0.1937
0.1984
-0.3081
1.4485
3.23E-02
0.2629
*
37
-0.0225
-0.0592
0.1286
-0.1711
1.1445
9.94E-03
0.0751
*
Diagnosing Influential Observations
• Clearly, Observation #19 exerts a huge influence
(although it has a small hat or leverage value, so
it must be near center of Mass/Work
observations
• Upon further review to author’s original
calculations provided in paper, the mean and
S.D. are much to high for H (but exactly the
same for M and W).
• Could observation been a “typo”?
• Try replacing H19=3936 with H19=2936
• Note: Do not do this arbitrarily, check your data
sources in practice
Analysis with Corrected Data Point
Intercept
W
M
Coefficients
977.4254
6.2436
17.7777
Standard Error
376.0531
1.5221
4.9428
t Stat
2.5992
4.1019
3.5967
P-value
0.0137
0.0002
0.0010
Lower 95% Upper 95%
213.1935 1741.6572
3.1503
9.3370
7.7327
27.8227
^
H  977.43 + 6.24W + 17.78M
Note that both factors are significant, and that the intercept and body mass
coefficients have changed drastically
Plot of Residuals versus Predicted Values
300
200
100
0
-100
-200
-300
2500
2600
2700
2800
2900
3000
3100
3200
3300
3400
Download