Stat 301 HW 7 Due: 30 Oct. / 2 Nov. 2015

advertisement
Stat 301
HW 7
Due: 30 Oct. / 2 Nov. 2015
Midterm II will cover course material from correlations through extrapolation (lecture on 23 Oct.
with a little bit on 26 Oct.). This is the content on Homeworks 5, 6, and 7. Text chapters to be
covered are: 3.7, Chapter 8 (but not 8.6), Chapter 4 (but not 4.12), Chapter 7.
1. Text problem 7.10 (p. 378-379), with new questions. The data are in steer.txt. Again, there
is a typo in one value that I have corrected in the data set on the class web page. Remember,
the context for this analysis is to determine whether the consumer bringing in a 300 lb steer
(live weight) was shorted, i.e. received less meat (lower dressed weight) than they should have.
(a) Fit the linear model E Y = β0 + β1 X, where Y is the dressed weight and X is the live
weight. Report the estimated regression coefficients.
(b) Use the linear model to predict the dressed weight for two steers, one with 300lb live
weight and one with 400 lb live weight.
(c) What is the 95% prediction interval (i.e., for one observation) for a steer with 300lb live
weight?
(d) Based on the results of the last few questions, does it seem that the customer receiving
only 150 lbs was shorted? Briefly explain why or why not.
(e) Fit a quadratic model E Y = β0 + β1 X + β2 X 2 . Report the estimated regression coefficients.
(f) Based on results from fitting the quadratic model (not just the estimated coefficients),
what can you say about lack of fit of the linear model in question 1a?
(g) Use the quadratic model to predict the dressed weight for two steers, one with 300lb live
weight and one with 400 lb live weight.
(h) Does the choice of model (linear or quadratic) make a substantial difference to the prediction for the 400 lb live weight steer? For the 300lb live weight steer?
(i) Briefly explain why what you found in question 1h should be expected.
(j) What is the 95% prediction interval for a steer with 300lb live weight when you use the
quadratic model?
(k) Based on everything you have done, does it seem that the customer receiving only 150
lbs was shorted? Briefly explain why or why not.
2. The data in store.txt were collected as part of a management review of a large metropolitan
department store. The variable of interest, hours worked, is the total number of hours worked
by the clerical staff. So, if 3 people worked 8 hours and one person worked 6 hours, the total
number of hours worked for that day is 3*8 + 6 = 30. The other variables are counts of
the numbers of various types of documents processed by the clerical department that day:
number of pieces of mail, gift certificates, charge account inquiries, change orders, checks
cashed, miscellaneous mail items, and bus tickets.
Fit a multiple regression model that predicts the total number of hours worked from numbers
of each type of document (i.e., the other seven variables). Use results and other information
from that fit to answer the following questions:
1
(a) Even though you fit a regression model with seven X variables, you are most interested in
two of them: the number of checks or the number of miscellaneous items. Which of those
two variables is more important at predicting the number of hours worked? Explain your
choice. Some additional information that may be useful:
Variable
Checks
Misc. items
minimum
334
30
maximum
1081
86
std. dev.
184
13.8
Note: We talked about this a while ago.
(b) Which of the seven X variables has the highest standardized beta? Which of the seven
X variables has the lowest standardized beta?
Notes: Lowest means “closest to zero”.
Again, we talked about this a while ago. The book discusses standardized beta’s =
standardized regression coefficients on p. 362.
(c) I am surprised by the negative coefficient for change orders. You wonder whether there
is large multicollinearity in these X variables. Is there an issue with multicollinearity
for any of the seven variables? If so, which variables are you concerned about? Briefly
explain your answer.
(d) The dept. store plans to use this model to predict workload (i.e., hours that will need to
be worked) when the clerical department has to process various combinations of types
of documents. Do you have any concerns about extrapolating outside the range of the
data if you have to use this model to predict at three new observations, labeled X1,
X2, and X3 in the store.txt data set? For your convenience, I have also included an
observation labeled mean that has the mean value of each explanatory variable. State
which observations (X1, X2, or X3) you have concerns about and briefly explain why
you have a concern.
2
Download