Activity 2: Use the dataset peru

advertisement
Stat 462 Nov. 5
Ideally, answers due at the end of the period. If you can’t get done, I’ll take them by Monday Nov.
10 class.
Activity 1: Use the dataset peru.txt. It can be linked at www.stat.psu.edu/~rho/462data/ . Copy and
paste the dataset to Minitab. The dataset gives physical measurements for 39 Peruvian Indians who have
migrated from a rural village to an urban area.
The y-variable (response variable) is systolic blood pressure in C9.
Predictor variables in the dataset are age, years since migrating, weight (kg), height (mm), skinfold
measurements of the chin, forearm, and calf, and pulse rate.
First, create a new predictor variable using Calc>Calculator in Minitab. Call the new variable fraclife,
and make it equal to Years/Age . This is the fraction of a person’s life they’ve lived in the urban area.
A. Run a best subsets regression in which y = systolic blood pressure and all 9 predictor variables
(including fraclife) are potential predictors of y.
What are the best two models for predicting y? Specify a model by listing the variables present. Explain
why you think these are the best two models.
B. Run a stepwise regression (with forward and backward steps allowed) in which y = systolic blood
pressure and all 9 predictor variables (including fraclife) are potential predictors of y. What model was
selected?
C. Run a backwards elimination procedure in which y = systolic blood pressure and all 9 predictor
variables (including fraclife) are potential predictors of y. What model was selected?
D. Run a multiple regression (to predict systolic blood pressure) using the predictor variables in the
model that you think is the best for predicting y.
Write the equation (with estimated coefficients.) Also, comment on the statistical significance of each
variable in the model.
E. For the model in part D, graph residuals versus fits. Interpret the plot. As an answer you can submit a
“by hand” sketch of the plot and your interpretation.
F. For the model in part D, perform a test of normality on the residuals. Interpret the result.
Activity 2: Use the dataset plywood.txt at the same website we used for Activity 1. The data are from
an experiment done to explore factors that affect the sawing of wood for making plywood. Y =
maximum circular force (torque) before failure that can be applied to spin a log during cutting,
X1 = diameter of log being cut, X2 = depth of inserting mechanism that holds the log, and X3 =
operating temperature of the saw. In addition, calculate (using Minitab) the interactions X1X2,
X1X3, and X2X3. Use the set of six variables as potential predictors of Y. Identify the best
model for predicting Y. As an answer,
A. Give the estimated “final model” (i.e., write the estimated equation).
B. Briefly indicate how you identified this model
C. Briefly discuss a plot of residuals versus fits and a normality test of residuals for your final
model.
Download