Document 11287098

advertisement
Stat479
Assignment #6
Solution Key
Fall 2013
Problem 1
(a)
Source
Regression
Error
Corrected Total
d.f.
1
14
15
SS
2059.78145
974.65605
3034.43750
MS
2059.78145
69.61829
F p-value
29.59
<.0001
(b) 𝛽̂0 =148.05068, s.e.( 𝛽̂0 ) = 11.56292; 𝛽̂1 = -1.02359, s.e.( 𝛽̂1 ) = 0.18818
�= 148.05068−1.02359 𝒙
𝒚
(c) Expected loss in mean muscle mass, E(y) for 1 year increase in age=1.02359. Thus
expected loss in mean muscle mass, for 5-year increase in age= 5 X 1.02359 = 5.11795
(d) 𝑅 2 =. 𝟔𝟕𝟖𝟖 = 𝟔𝟕. 𝟖𝟖%
This means that 67.88% of the variability in muscle mass is explained by the predicted value from a
linear regression model using age as the explanatory variable.
(e) 95% C.I. for 𝛽1 : (−1.42720,−0.61998)
We have 95% confidence that the expected increase in mean muscle mass, E(y) for 1-year) increase in
age lies inside the above interval.
(f) A t-test for 𝐻0 : 𝛽1 = 0 against 𝐻1 : 𝛽1 ≠ 0 is:
t-value= −5.44 for which the p-value is <.0001; Thus we reject the null hypothesis at 𝛼 = .05
(g) From the SAS output the point estimate E(y) at x= 60 i.e. 𝜇(60) is 86.6353
A 95% confidence interval for the mean muscle mass , E(y) at x= 60 is (82.1579, 91.1127)
(h)
I.
II.
III.
See SAS Output attached.
See attached plot: Assumption of constant variance as x increases appear to be satisfied as the
residuals are evenly spread around the zero line as x increases.
See attached plots: The above is also true of the plot of residuals against the predicted values.
The normal probability plot of the studentized residuals does not show a pattern to indicate that
the distribution of the errors deviates from a normal distribution.
05:47 Monday, December 02, 2013 1
Simple Linear Regression of Horsepower on Speed
The REG Procedure
Model: MODEL1
Dependent Variable: y Muscle Mass
Number of Observations Read
17
Number of Observations Used
16
Number of Observations with Missing Values
1
Analysis of Variance
Source
Sum of
Squares
DF
Mean
Square F Value Pr > F
1 2059.78145 2059.78145
Model
Error
14
974.65605
Corrected Total
15 3034.43750
29.59 <.0001
69.61829
8.34376 R-Square 0.6788
Root MSE
Dependent Mean 86.18750 Adj R-Sq 0.6559
9.68094
Coeff Var
Parameter Estimates
Variable Label
Intercept Intercept
x
Age
DF
Parameter Standard
Estimate
Error t Value Pr > |t|
95% Confidence
Limits
1
148.05068
11.56292
12.80
<.0001 123.25067 172.85068
1
-1.02359
0.18818
-5.44
<.0001
-1.42720
-0.61998
05:47 Monday, December 02, 2013 2
Simple Linear Regression of Horsepower on Speed
The REG Procedure
Model: MODEL1
Dependent Variable: y Muscle Mass
Output Statistics
Obs
Dependent Predicted
Std Error
Variable
Value Mean Predict 95% CL Mean
Residual
Std Error Student
Residual Residual
Cook's
D
-2-1 0 1 2
1
82.0000
75.3758
2.8813 69.1960
81.5556
6.6242
7.830
0.846 |
|*
|
0.048
2
91.0000
82.5410
2.1910 77.8417
87.2402
8.4590
8.051
1.051 |
|**
|
0.041
3
100.0000
104.0363
3.8883 95.6968 112.3759
-4.0363
7.382
-0.547 |
*|
|
0.041
4
68.0000
79.4702
2.4241 74.2710
84.6694
-11.4702
7.984
-1.437 |
**|
|
0.095
5
87.0000
90.7297
2.2469 85.9106
95.5488
-3.7297
8.036
-0.464 |
|
|
0.008
6
73.0000
73.3287
3.1527 66.5667
80.0906
-0.3287
7.725
-0.0425 |
|
|
0.000
7
78.0000
78.4466
2.5252 73.0307
83.8625
-0.4466
7.952
-0.0562 |
|
|
0.000
8
80.0000
90.7297
2.2469 85.9106
95.5488
-10.7297
8.036
-1.335 |
**|
|
0.070
9
65.0000
70.2579
3.5955 62.5463
77.9695
-5.2579
7.529
-0.698 |
*|
|
0.056
10
84.0000
81.5174
2.2557 76.6793
86.3554
2.4826
8.033
0.309 |
|
|
0.004
11
116.0000
101.9892
3.5764 94.3186 109.6597
14.0108
7.538
1.859 |
|***
|
0.389
12
76.0000
88.6825
93.2633
-12.6825
8.066
-1.572 |
***|
|
0.087
13
97.0000
101.9892
3.5764 94.3186 109.6597
-4.9892
7.538
-0.662 |
*|
|
0.049
14
100.0000
93.8004
2.5120 88.4128
99.1881
6.1996
7.957
0.779 |
|*
|
0.030
15
105.0000
97.8948
2.9973 91.4663 104.3233
7.1052
7.787
0.912 |
|*
|
0.062
16
77.0000
68.2107
3.9082 59.8285
76.5929
8.7893
7.372
1.192 |
|**
|
0.200
17
.
86.6353
2.0876 82.1579
91.1127
.
.
2.1358 84.1017
Sum of Residuals
Sum of Squared Residuals
Predicted Residual SS (PRESS)
0
974.65605
1277.86656
.
.
05:47 Monday, December 02, 2013 3
Simple Linear Regression of Horsepower on Speed
The REG Procedure
Model: MODEL1
05:47 Monday, December 02, 2013 4
Simple Linear Regression of Horsepower on Speed
The REG Procedure
Model: MODEL1
05:47 Monday, December 02, 2013 5
Simple Linear Regression of Horsepower on Speed
The REG Procedure
Model: MODEL1
05:47 Monday, December 02, 2013 6
Simple Linear Regression of Horsepower on Speed
The REG Procedure
Model: MODEL1
Problem 2
The case statistics and the plots shown (see attached SAS outputs for this part) show clearly that
(a) Car O is an x-outlier. The Hat Diag for this case is 0.27 which is markedly larger than the other hats
(as well as it is larger than the cutoff 4/16= .25). It stands well away from the other cars in the xdirection in the plots MPG vs. Weight. Clearly, several plots shown in the diagnostics panel are affected
by this case.
(b) Car A is a possible y-outlier. Its observed value is much smaller than the value predicted by the fitted
line. The RStudent for case A is 3.91 which is larger than the 5% critical value of 3.62 from Table B.10.
(c) The two largest Cooks’D values are the case A and O above. For Car O, this statistic is large primarily
because it is a high leverage case (i.e. the Hat Diag is large) and not because it is a y-outlier. Thus it fits
the model well but has very high influence. For Car A, Cooks D is large clearly because it is a y-outlier,
and therefore does not fit the model very well at all.
(d) The following is a summary of statistics resulting from fitting the model to three different data sets:
Model
All data
A deleted
O deleted
•
•
Estimated β 0
Estimated β1
41.57
43.38
39.21
-.00681
-.00725
-.00608
MSE
8.57
4.24
8.43
R2
.69
.84
.60
The case statistics for model fitted with A deleted improves the model significantly. indicate
the Car A is still influential but not a y-outlier and the plots also support this.
The case statistics for model fitted with Car O deleted does not give a better fitting model
overall.
(e) Clearly, the case statistics in Parts a), b) and c) for the model fitted for the complete data set
indicated the outcome of part d) . That is, removing a case that is highly influential affects the fit of the
model. If the influential case is a y-outlier, the model fit is expected to “improve.” Thus instead of
refitting models with cases deleted, the user can use the case statistics from the original fit to make
similar conclusions. This is the way these statistics are meant to be used. Also the other statistics like
DFFITS and DFBETAS can be used to determine how each of the suspected cases affect the overall model
fit.
05:51 Monday, December 02, 2013 1
The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: y MPG
Number of Observations Read 16
Number of Observations Used
16
Analysis of Variance
Source
Sum of
Squares
DF
Mean
Square F Value Pr > F
1 262.30984 262.30984
Model
Error
14 120.01453
Corrected Total
15 382.32437
30.60 <.0001
8.57247
2.92788 R-Square 0.6861
Root MSE
Dependent Mean 21.71875 Adj R-Sq 0.6637
13.48087
Coeff Var
Parameter Estimates
Variable Label
Intercept Intercept
x
Weight(lbs.)
DF
Parameter Standard
Estimate
Error t Value Pr > |t|
1
41.57248
3.66300
11.35
<.0001
1
-0.00681
0.00123
-5.53
<.0001
05:51 Monday, December 02, 2013 2
Output Statistics
The SAS System
Dependent Predicted
Std Error
Std Error Student
Obs Car
Variable
Value Mean Predict Residual Residual Residual
1
A
16.0000
23.7375
2
B
21.0000
22.0017
The REG Procedure
-7.7375
2.811
-2.752
Model: MODEL1
0.7338
-1.0017Variable:
2.834y MPG
-0.353
Dependent
3
C
22.8000
25.7797
1.0367
-2.9797
4
D
21.4000
19.6872
0.8189
5
E
18.7000
18.1556
6
F
19.1000
7
G
8
Cook's
D RStudent
-2-1 0 1 2
| *****|
|
0.321
-3.9150
|
|
|
0.004
-0.3421
2.738
-1.088 |
**|
|
0.085
-1.0960
1.7128
2.811
0.609 |
|*
|
0.016
0.5951
0.9750
0.5444
2.761
0.197 |
|
|
0.002
0.1903
17.6791
1.0340
1.4209
2.739
0.519 |
|*
|
0.019
0.5047
14.3000
17.2706
1.0874
-2.9706
2.718
-1.093 |
|
0.096
-1.1010
H
24.4000
22.5803
0.7484
1.8197
2.831
0.643 |
|*
|
0.014
0.6288
9
I
22.8000
20.1297
0.7863
2.6703
2.820
0.947 |
|*
|
0.035
0.9431
10
J
19.2000
19.5170
0.8332
-0.3170
2.807
-0.113 |
|
|
0.001
-0.1089
11
K
16.4000
16.5899
1.1813
-0.1899
2.679
-0.0709 |
|
|
0.000
-0.0683
12
L
17.3000
16.1815
1.2401
1.1185
2.652
0.422 |
|
|
0.019
0.4090
13
M
30.4000
26.5966
1.1460
3.8034
2.694
1.412 |
|**
|
0.180
1.4689
14
N
25.5000
24.7926
0.9190
0.7074
2.780
0.254 |
|
|
0.004
0.2458
15
O
31.9000
29.1493
1.5298
2.7507
2.496
1.102 |
|**
|
0.228
1.1110
16
P
26.3000
27.6517
1.2985
-1.3517
2.624
-0.515 |
|
0.032
-0.5011
0.8179
**|
*|
Output Statistics
DFBETAS
Hat Diag Cov
Obs Car
H Ratio DFFITS Intercept
x
1
A
0.0780 0.2649
-1.1390
-0.7017
0.5082
2
B
0.0628 1.2155
-0.0886
-0.0237
0.0062
3
C
0.1254 1.1112
-0.4149
-0.3465
0.2938
4
D
0.0782 1.1924
0.1734
-0.0452
0.0777
5
E
0.1109 1.2972
0.0672
-0.0334
0.0444
6
F
0.1247 1.2746
0.1905
-0.1049
0.1346
7
G
0.1379 1.1256
-0.4404
0.2599 -0.3257
8
H
0.0653 1.1686
0.1662
0.0664 -0.0346
9
I
0.0721 1.0950
0.2629
10
J
0.0810 1.2597
-0.0323
0.0095 -0.0154
11
K
0.1628 1.3843
-0.0301
0.0194 -0.0236
12
L
0.1794 1.3776
0.1912
13
M
0.1532 1.0074
0.6248
0.5508 -0.4807
14
N
0.0985 1.2746
0.0812
0.0611 -0.0491
15
O
0.2730 1.3306
0.6808
0.6509 -0.5978
16
P
0.1967 1.3895
-0.2480
-0.0452
-0.1287
-0.2286
0.0961
0.1544
0.2048
05:51 Monday, December 02, 2013 3
The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: y MPG
05:51 Monday, December 02, 2013 4
02:44 Wednesday, November 06, 2013 1
The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: y MPG
Number of Observations Read 15
Number of Observations Used
15
Analysis of Variance
Source
Sum of
Squares
DF
Mean
Square F Value Pr > F
1 292.36215 292.36215
Model
Error
13
55.07785
Corrected Total
14 347.44000
69.01 <.0001
4.23676
2.05834 R-Square 0.8415
Root MSE
Dependent Mean 22.10000 Adj R-Sq 0.8293
9.31375
Coeff Var
Parameter Estimates
Variable Label
Intercept Intercept
x
Weight(lbs.)
DF
Parameter Standard
Estimate
Error t Value Pr > |t|
1
43.37935
2.61617
16.58
<.0001
1
-0.00725 0.00087239
-8.31
<.0001
02:44 Wednesday, November 06, 2013 4
02:46 Wednesday, November 06, 2013 1
The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: y MPG
Number of Observations Read 15
Number of Observations Used
15
Analysis of Variance
Source
Sum of
Squares
DF
Mean
Square F Value Pr > F
1 162.14910 162.14910
Model
Error
13 109.60690
Corrected Total
14 271.75600
19.23
0.0007
8.43130
2.90367 R-Square 0.5967
Root MSE
Dependent Mean 21.04000 Adj R-Sq 0.5656
13.80071
Coeff Var
Parameter Estimates
Variable Label
Intercept Intercept
x
Weight(lbs.)
DF
Parameter Standard
Estimate
Error t Value Pr > |t|
1
39.20810
4.21015
9.31
<.0001
1
-0.00608
0.00139
-4.39
0.0007
02:46 Wednesday, November 06, 2013 4
Download