Homework 6

advertisement
Stat 462 Homework due Wed. Apr. 7
1. This problem reviews indicator variables. Use the dataset marketshare.txt that can be linked
at www.stat.psu.edu/~rho/462data/. The data are from Appendix dataset C3 on page 679 of the
book, and the variables are described there. I changed the dataset structure in two ways – I
eliminated the ID column and converted there two time columns to a single column, x5_time, that
goes from 1 to 36.
A. Use Minitab’s Best Subsets Regression to identify the best subset of the five x-variables.
What model is “best?” Why?
B. Fit the regression model identified in part A. What is the estimated equation? What is the value
of R2?
C. For the model in part C, plot residuals versus fits. Write a brief interpretation.
D. Based on the regression results, which is more effective for increasing market share – discount
pricing or a package promotion? Explain.
E. In general, what is the most effective strategy for getting a high market share? Explain based
on the regression results.
F. There are four possible combinations of values for discount pricing and package promotion.
Use the regression results to determine the regression equation relating y = market share to the
other x-variable(s) present in the model identified in part A.
2. This problem is mainly concerned with Chapter 10. Use the ch09pr13.txt dataset that can be
linked from the data set web site. The data are described in exercise 9.13 on pages 377-378 of the
text.
A. Do a regression in which Y is predicted from the two variables x2 and x3. Store the residuals.
Then, do a regression in which the predictor x1 is the response and is predicted from x2. Store the
residuals. Plot the first set of residuals versus the second set of residuals. Briefly interpret the
plot. Note: This is a variable added plot (as described in Section 10.1). It shows the relationship
between y and x1 after “controlling for” x2 and x3.
B. Calculate columns that contain the three possible interactions between pairs of x-variables,
Then use the Best Subsets procedure to identify the best subset of the six predictors x1, x2, x3,
x1*x2, x1*x3, x2*x3. Using the Cp criterion, what is the best model? What is the Cp value for
that model?
C. Fit the regression model identified in part B. Using the Storage button, store, DFITS and
Cook’s D. Also, using the Options button ask Minitab for the Variance Inflation Factors. What
are the VIF values? What do these values indicate about this situation?
D. In the Minitab output for part C, what observations are listed as unusual? In each case, what is
the reason?
E. Inspect the column of the worksheet that contains the DFITS. What’s the largest value? Which
observation has that value? What does an excessively large DFITS value indicate? (See page 401
of the text. In the text it’s DFFITS rather than DFITS as in Minitab.)
F. Inspect the column of the worksheet that contains the Cook’s D values. What’s the largest
value? Which observation has that value? What does an excessively large Cook’s distance value
indicate? (See page 402 of the text.)
G. Turn the y-value for the most influential case into a missing value by replacing the y-value
with an asterisk in the worksheet. Fit the model identified in part B. Write the estimated equation.
Then, write the estimated equation found in part C in which all data were used. Use each
equation to predict y for the case that you deleted. For each situation (point deleted or included),
determine the predicted value for the observation that was deleted. Note: The difference between
the two values is an unstandardized version of the DFIT.
3. Use this data set:
X 1 2 3 4 5 6
Y 2 4 5 7 10 20
a. Find an unstandardized deleted residual for the 6th observation. Show how you found this.
b. Find an unstandardized deleted residual for the 3rd observation. Show how you found this.
Note: You can use Minitab for assistance.
Download