STATISTICS 401D Spring 2016 Laboratory Assignment 8

advertisement
STATISTICS 401D
Spring 2016
Laboratory Assignment 8
1. Thirteen specimens of Cu-Ni alloys with varying degrees of iron content in percent were submerged
in sea water for 60 days and the weightloss due to corrosion recorded in units of milligrams per square
decimeter per day. In a study to examine the dependency of corrosion (y) on iron content (x), a
simple linear regression model was fitted to the data.
Specimen
1
2
3
4
5
6
7
8
9
10
11
12
13
x, Fe %
0.01
0.48
0.71
0.95
1.19
0.01
0.48
1.44
0.71
1.96
0.01
1.44
1.96
y, Weight Loss (mg/dm)
127.6
124.0
110.8
103.9
101.5
130.2
122.0
92.3
113.2
83.7
128.0
91.7
86.3
Answer the following questions. For parts a) to d), do all computations by hand using
P
P
these quantities computed from the data:
y = 1, 415.2, y 2 = 157, 447.7,
P
P 2
P
x = 11.35, x = 15.6183, xy = 1, 098.628
a) Plot the data in a y vs. x scatter plot. Does it appear that a simple linear regression model would
be a good fit?
b) Use the LS method to fit a a simple linear regression model. What is your prediction equation?
c) Construct an analysis of variance table for the regression. Use the F-ratio to perform a test of
H0 : β1 = 0 vs. Ha : β1 6= 0 using α = .05
d) Compute a lack of fit test for this model and report the results in an anova table where SSLack
and SSPexp are shown as a partition of SSE as demonstrated in the class example. What is
your conclusion from this test? Use α = .05.
For the rest of the problem you may use a JMP analysis using the data table corrosion.jmp. Attach
the output to your solution.
e) Use JMP to obtain the answers to parts a) to d).
f ) Compute the predicted values and residuals. Obtain plots of the residuals against Fe % and the
predicted values (ŷ), respectively. Do these two plots suggest any inadequacies of this model ?
Explain why you reached your conclusion.
2. One task assigned to foresters is to estimate the potential lumber harvest of a forest. This is typically
done by selecting a sample of trees, making some nondestructive measures of these trees, then using
a prediction formula to estimate lumber yield. The prediction formula is obtained from a previous
study involving a sample of trees for which actual lumber yields were available from harvesting. The
data, shown on page 3, are from such a study and includes the volume of lumber in cubic feet (y) and
1
several predictor variables: the diameter of the trunk at breast height (about 4 feet), in inches (x1 ),
the height, in feet (x2 ), and the diameter of the trunk at 16 feet of height, in inches (x3 ), measured
on a random sample of 20 trees. Use JMP to perform a multiple regression analysis to fit the full
model
y = β0 + β1 x1 + β2 x2 + β3 x3 + to this data.
Part I
The following are the X 0 X matrix, X 0 y vector, y 0 y, and the inverse of the X 0 X matrix, respectively,
computed for this data (results rounded to manageable number of digits):




X 0X = 
20
310.63
1910.94
276.5
310.63 4889.0619 29743.6588 4348.945
1910.94 29743.6588 183029.3358 26486.925
276.5
4348.945
26486.925
3873.69




and (X 0 X)−1 = 






 , X 0y = 


1237.03
19659.1047
118970.1884
17516.935



 , y 0 y = 80256.5195,


21.782755949 −0.439044462 −0.228898883
0.503209019
−0.439044462 0.1620460292 0.0040467803 −0.178259035 


−0.228898883 0.0040467803 0.0029279638 −0.008225088 
0.503209019 −0.178259035 −0.008225088 0.2207091285
Use these to perform the following calculations using matrix algebra. You may recalculate (X 0 X)−1
if you want more accuracy in your answers. The files X.txt, XPX.txt, XPy.txt, and XPXI.txt are
available in the Downloads folder at the course website for you to use by other software. Show work.
(a) Construct the normal equations.
(b) Calculate the estimate β̂ by forming the product of (X 0 X)−1 and X 0 y.
0
(c) Calculate s2 using SSE= y 0 y − β̂ X 0 y. Use
(d) Calculate the standard errors sβ̂1 , sβ̂2 , and sβ̂3 of β̂1 , β̂2 , and β̂3 , respectively.
(e) Calculate the predicted values ŷ using the fact that ŷ = X β̂
Part II
Write your answers to the following questions on separate pages using numbers extracted from a JMP
analysis of the data performed on the data table lumber.JMP. No hand calculations needed for this
part. Attach the JMP outputs to your answer.
(a) Obtain the correlations and the scatter plot matrix for y, x1 , x2 , and x3 .
(b) Report β̂0 , β̂1 , β̂2 , and β̂3 .
(c) Report s2 , sβ̂0 , sβ̂1 , sβ̂2 , and sβ̂3 .
(d) Report 95% confidence intervals for β1 , β2 and β3 , respectively.
(e) Construct an analysis of variance for the above regression. Report the coefficient of determination.
(f) Use the F -test statistic to test H0 : β1 = β2 = β3 = 0 vs. Ha : at least one β is not zero, and
report the p-value for the test. State your decision.
(g) Use the t-test statistic to test H0 : β2 = 0 vs. Ha : β2 6= 0, and report the p-value for the test.
State your decision.
(h) Obtain a 95% confidence interval for β2 . Use this interval to test H0 : β2 = 0 vs. Ha : β2 6= 0.
What is the α level of this test.
2
(i) Obtain a 95% confidence interval for the mean lumber volume for a population of trees (of the
same variety) with x1 = 15.5, x2 = 90, x3 = 14.1. Describe in words what this interval tells
you.
(j) Obtain a 95% prediction interval for the lumber volume y21 of a tree (of the same variety) with
x1 = 15.5, x2 = 90, x3 = 14.1. Describe in words what this interval tells you.
(k) Is there evidence in your analysis to indicate any multicollinearity problems in the estimation
of the coefficients? Discuss how you determined your answer.
(l) Obtain plots of the residuals vs. predicted values, x1 , x2 , and x3 , respectively. Does any pattern
of the types discussed in class observed in these plots? Give your interpretation.
(m) Obtain a normal probability plot of the studentized residuals. State the model assumption that
can be examined using this plot. Does this assumption appear to be plausible here?
(n) Fit the regression model
y = β0 + β2 x2 + β3 x3 + to the above data. Use values from the resulting output and the previous output to construct
an F-statistic to test H0 : β1 = 0 vs. Ha : β1 6= 0 in the full model. Perform the test at α = .05.
(o) Examine to what extent multicollinearity problems affect the fit of the model in part (n) compared to the full model. If the model in part (n) is better, discuss reasons for the improvement.
Data:
Diameter
at Breast Height
x1
10.20
13.72
15.43
14.37
15.00
15.02
15.12
15.24
15.24
15.28
13.78
15.67
15.67
15.98
16.50
16.87
17.26
17.28
17.87
19.13
Height
x2
89.00
90.07
95.08
98.03
99.00
91.05
105.60
100.80
94.00
93.09
89.00
102.00
99.00
89.02
95.09
95.02
91.02
98.06
96.01
101.00
Diameter
at 16 feet
x3
9.3
12.1
13.3
13.4
13.5
12.8
14.0
13.5
14.0
13.8
12.6
14.0
13.7
13.9
14.9
14.9
14.3
14.3
16.9
17.3
Volume
y
25.93
45.87
56.20
58.60
63.36
46.35
68.99
62.91
58.13
59.79
56.20
66.16
62.18
57.01
65.62
65.03
66.74
73.38
82.87
95.71
Due Tuesday, April 19, 2016 (turn-in during the first 15 min.
3
of the lab)
Download