STAT 401D ... Question 1

advertisement
STAT 401D
LAB #8
ANSWER KEY
Spring 2016
Question 1
1a)
It appears that a straight line model would be a good fit although there is some curvature shown
in the relationship.
1b)
βˆ1 =
∑ xy − (∑ x∑ y) / n =
∑ x − (∑ x ) / n
2
2
1098.628 − (11.35)(1415.2) /13
= −23.9890373
15.6183 − (11.35 ^ 2) /13
βˆ0 = y − βˆ1 x = 1415.2 /13 − (−23.9890373)(11.35/13) = 129.805813
The prediction equation is 𝑦𝑦� =129.80058 -23.98904x
1c) Analysis of Variance
You can use Excel or the formulas
SSTot
= S yy ,
SS Re=
g
S xy2
S yy2
, SSE
= S yy −
S xy2
S yy2
Using Excel to calculate the sums of squares we have:
Analysis of Variance
Source
Regression
Error
Corrected Total
df
1
11
12
SS
MS
3285.30973 3285.30973
101.54104
9.23100
3386.85077
F-ratio
355.9
p-value
<.0001
Reject π‘―π‘―πŸŽπŸŽ :𝜷𝜷𝟏𝟏 = 𝟎𝟎 at 𝜢𝜢 =. 𝟎𝟎𝟎𝟎 since the p-value <. 𝟎𝟎𝟎𝟎. Note also the very high R-squared of 97%
1d) Lack Of Fit Test
π‘₯π‘₯
.01
.48
.71
.95
1.19
1.44
1.96
𝑦𝑦
127.6, 130.2, 128.0
124.0, 122.0
110.8, 113.2
103.9
101.5
92.3, 91.7
83.7, 86.3
𝑦𝑦�
128.6
123.0
112.0
103.9
101.5
92.0
85.0
οΏ½οΏ½οΏ½2
(𝑦𝑦 − 𝑦𝑦)
-1.0, 1.6, -.6
1.0, -1.0
-1.2, 1.2
0.0
0.0
0.3, -0.3
-1.3, 1.3
οΏ½οΏ½οΏ½2
οΏ½(𝑦𝑦 − 𝑦𝑦)
df
3.92
2.0
2.88
0.0
0.0
0.18
3.38
2
1
1
0
0
1
1
Totals 12.36
6
Analysis of Variance for Lack-of-fit Test
Source
Lack of fit
Pure Error
Total Error
df
5
6
11
SS
89.18
12.36
101.54
MS
17.836
2.06
F-ratio
8.66
F(.05,5,6) = 4.39. Reject H0. Conclude that there is significant lack of fit for the linear regression model.
e) JMP output (See attached)
There is obvious curve pattern in both rsidual plots indicating that we need to fit a higher
order regression (quadratic or cubic) here.
If one ignores the horizontal parts (due to replicated values) the normal probability plot is
ok.
Bivariate Fit of y, Weightloss(mg/dm) By x, Fe %
13
y,
Weightloss(mg/dm)
12
11
10
90
80
0
1
0.5
2
1.5
x, Fe %
Linear Fit
Linear Fit
y, Weightloss(mg/dm) = 129.80581 - 23.989037*x, Fe %
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.970019
0.967293
3.038257
108.8615
13
Lack Of Fit
Source
Lack Of Fit
Pure Error
Total Error
DF
5
6
11
Sum of Squares
89.18104
12.36000
101.54104
Mean Square
17.8362
2.0600
F Ratio
8.6584
Prob > F
0.0103*
Max RSq
Analysis of Variance
Source
Model
Error
C. Total
DF
1
11
12
Sum of Squares
3285.3097
101.5410
3386.8508
Mean Square
3285.31
9.23
F Ratio
355.8995
Prob > F
<.0001*
Parameter Estimates
Term
Intercept
x, Fe %
Estimate
129.80581
-23.98904
Std Error
1.39378
1.271596
t Ratio
93.13
-18.87
Prob>|t|
<.0001*
<.0001*
Lower 95%
126.73812
-26.7878
Upper 95%
132.8735
-21.19027
Diagnostics Plots
Residual by Predicted Plot
6
dm) Residual
y, Weightloss(mg/
4
2
0
-2
-4
80
90
100
110
120
130
y, Weightloss(mg/
dm) Predicted
Residual by X Plot
6
dm) Residual
y, Weightloss(mg/
4
2
0
-2
-4
0.0
0.5
1.0
1.5
2.0
x, Fe %
Residual Normal Quantile Plot
6
2
0
Normal Quantile
0.9
0.8
0.7
0.6
0.5
0.4
0.3
-4
0.2
-2
0.1
dm) Residual
y, Weightloss(mg/
4
Stat 401D
Lab#8 Problem#2 Part I (extracted from Excel Sheet)
21.78275595 -0.439044462 -0.228898883 0.503209019
(X'X)^-1= -0.439044462 0.162046029
0.00404678 -0.178259035
-0.228898883
0.00404678 0.002927964 -0.008225088
0.503209019 -0.178259035 -0.008225088 0.220709129
1237.03
-102.7620126
X'y= 19659.1047 (X'X)^-1 X'y= 1.462968881
118970.1884
0.663365427
17516.935
5.678808862
y'y=
SSE=
80256.5195
219.5180305
s.e.(beta1_hat)=
s.e.(beta2_hat)=
s.e.(beta3_hat)=
1
1
1
1
1
1
1
1
X= 1
1
1
1
1
1
1
1
1
1
1
1
10.2
13.72
15.43
14.37
15
15.02
15.12
15.24
15.24
15.28
13.78
15.67
15.67
15.98
16.5
16.87
17.26
17.28
17.87
19.13
89
90.07
95.08
98.03
99
91.05
105.6
100.8
94
93.09
89
102
99
89.02
95.09
95.02
91.02
98.06
96.01
101
beta_hat' X'y=
MSE=
1.4910572
0.2004278
1.740144269
9.3
24.01271538
12.1
45.77283166
13.3
58.41253987
13.4
59.38660176
13.5
61.5196175
12.8
52.29995553
14
68.91279002
13.5
63.0647878
14 yhat= 61.39330733
13.8
59.71240177
12.6
47.99021322
14
67.32930736
13.7
63.63556842
13.9
58.60446359
14.9
69.07064441
14.9
69.56550732
14.3
64.07531815
14.3
68.77467014
16.9
83.04282569
17.3
90.46788351
80037.00147
13.71987691
Spring 2016
Stat 401D
Lab#8
r=0.3663
Problem#2
r=0.9484
r=0.9060
r=0.4285
r=0.5895
Part II
Spring 2016
18
x1
16
14
12
10
105
r=0.3663
100
x2
95
90
r=0.9484
17
r=0.4285
r=0.9450
15
x3
13
11
9
r=0.9060
r=0.5895
r=0.9450
80
y
60
40
20
10
13
15
17
90
95
100
9
11
13
15
20
40
60
80
Summary of Fit for Full Model (y,x1,x2,x3)
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.941138
0.930102
3.711464
61.8515
20
Analysis of Variance
Source
Model
Error
C. Total
DF
3
16
19
Sum of Squares
3523.9590
220.3995
3744.3585
Mean Square
1174.65
13.77
F Ratio
85.2745
Prob > F
<.0001*
Parameter Estimates
Term
Intercept
x1
x2
x3
Estimate
-102.762
1.4629701
0.6633643
5.6787649
Std Error
17.32215
1.494048
0.20083
1.743634
Residual by Predicted Plot
t Ratio
-5.93
0.98
3.30
3.26
Prob>|t|
<.0001*
0.3421
0.0045*
0.0049*
Lower 95%
-139.4833
-1.70427
0.2376242
1.982425
Upper 95%
-66.0407
4.6302099
1.0891045
9.3751047
Normal Probability Plot Of Studentized
Residuals
2.5
-1.64 -1.28
-0.67
0.0
0.67
1.28 1.64
2
10
1.5
1
5
y Residual
0.5
0
0
-0.5
-1
-5
-1.5
20
30
40
50
60
y Predicted
70
80
90
100
-2
0.03
0.1
0.2
Normal Quantile Plot
0.5
0.8
0.9
0.97
VIF
.
10.453922
1.3022167
11.273271
Full Model Residual and Diagnostic Statistics
x1
x2
x3
10.2
13.72
15.43
14.37
15
15.02
15.12
15.24
15.24
15.28
13.78
15.67
15.67
15.98
16.5
16.87
17.26
17.28
17.87
19.13
15.5
89
90.07
95.08
98.03
99
91.05
105.6
100.8
94
93.09
89
102
99
89.02
95.09
95.02
91.02
98.06
96.01
101
90
9.3
12.1
13.3
13.4
13.5
12.8
14
13.5
14
13.8
12.6
14
13.7
13.9
14.9
14.9
14.3
14.3
16.9
17.3
14.1
y Predicted Residual
25.93
45.87
56.2
58.6
63.36
46.35
68.99
62.91
58.13
59.79
56.2
66.16
62.18
57.01
65.62
65.03
66.74
73.38
82.87
95.71
.
24.0122
45.7722
58.4119
59.3859
61.5189
52.2993
68.9121
63.0641
61.3926
59.7117
47.9896
67.3286
63.6349
58.6038
69.0699
69.5648
64.0746
68.7740
83.0420
90.4670
59.6874
1.91778
0.09779
-2.21187
-0.78592
1.84107
-5.94931
0.07793
-0.15410
-3.26260
0.07829
8.21043
-1.16860
-1.45487
-1.59377
-3.44990
-4.53476
2.66539
4.60605
-0.17200
5.24297
.
Lower
95%
Mean
18.4769
42.8583
56.0450
56.3945
59.0808
49.3621
64.3017
60.0190
58.8574
57.4889
44.4731
64.1637
61.0419
55.2570
66.6967
67.2466
59.6011
64.2301
77.7911
86.2333
56.2892
Upper
95%
Mean
29.5476
48.6861
60.7787
62.3774
63.9570
55.2365
73.5224
66.1092
63.9278
61.9345
51.5061
70.4935
66.2278
61.9506
71.4431
71.8829
68.5481
73.3178
88.2929
94.7008
63.0856
Lower
95%
Indiv
14.3922
37.3820
50.1956
50.9685
53.2819
43.9010
59.7929
54.6274
53.1263
51.5358
39.3715
58.8480
55.3507
50.0536
60.8518
61.3624
55.0238
59.6882
73.5828
81.5323
51.1170
Upper Studentized hats Cook's D
95%
Resid
Influence
Indiv
33.6322
0.72709 0.495
0.130
54.1624
0.02836 0.137
0.000
66.6281
-0.62490 0.090
0.010
67.8034
-0.22895 0.145
0.002
69.7560
0.52173 0.096
0.007
60.6976
-1.72787 0.139
0.121
78.0313
0.02591 0.343
0.000
71.5008
-0.04503 0.150
0.000
69.6589
-0.92859 0.104
0.025
67.8876
0.02199 0.080
0.000
56.6076
2.47292 0.200
0.382
75.8092
-0.34391 0.162
0.006
71.9191
-0.41519 0.109
0.005
67.1540
-0.47449 0.181
0.012
77.2880
-0.97493 0.091
0.024
77.7671
-1.27858 0.087
0.039
73.1254
0.87299 0.323
0.091
77.8597
1.52017 0.334
0.289
92.5012
-0.06223 0.445
0.001
99.4017
1.67596 0.290
0.286
68.2578
. 0.187
.
Summary of Fit for the Reduced Model (y,x2,x3)
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.937611
0.930271
3.706967
61.8515
20
Analysis of Variance
Source
Model
Error
C. Total
DF
2
17
19
Sum of Squares
3510.7511
233.6073
3744.3585
Mean Square
1755.38
13.74
F Ratio
127.7416
Prob > F
<.0001*
Parameter Estimates
Term
Intercept
x2
x3
Estimate
-98.79827
0.6268295
7.2881078
Std Error
16.82213
0.197094
0.581592
Residual by Predicted Plot
t Ratio
-5.87
3.18
12.53
Prob>|t|
<.0001*
0.0055*
<.0001*
Lower 95%
-134.2899
0.2109967
6.0610565
Upper 95%
-63.30669
1.0426623
8.5151591
VIF
.
1.2572699
1.2572699
Test of H 0 : β 1 = 0 vs. H a : β 1 ≠ 0
10
F={(SSReg(Full)-SSReg(Reduced)/(k-g)}/MSE(Full)
y Residual
5
F=
0
(3523.959 − 3510.7511) /(3 − 2)
=13.208/13.77=.96
13.77
-5
20
30
40
50
60
70
80
90
100
F.05,1,17 = 4.45 Thus the F-statistic is not in the RR.
y Predicted
We fail to rej. H 0 : β 1 = 0 in the full model.
Download