Statistics 401B Exam 2 Name: November 9, 2004

advertisement
Statistics 401B
November 9, 2004
Exam 2
Name:
INSTRUCTIONS: Read the questions carefully and completely. Answer each question and show
work in the space provided. Partial credit will not be given if work is not shown. When asked
to explain, describe, interpret, or comment, do so within the context of the problem. Be sure to
include units where appropriate.
1. [40 pts] In a paper by Atkinson (1986) in Statistical Science, data are given on 35 hill races in
Scotland. The record-winning time (minutes) is the response variable. Distance run (miles)
and the height climbed (feet) are explanatory variables. Refer to the JMP output “Scottish
Hill Races.”
(a) [3] Give the simple linear regression equation that relates time to distance.
(b) [5] Give a 95% confidence interval on the mean time taken to run an additional mile.
(c) [2] Adding the variable for the height climbed (Climb) to the model that contains Distance explains how much additional variation in the Time?
(d) [5] Is the additional explained variation in (c) statistically significant? Support your
answer with an appropriate test of hypothesis.
1
(e) [5] In the model with both Distance and Climb, give an interpretation of the estimated
slope coefficient for Climb.
(f) [3] What would be the value of R 2 for the simple linear regression of Time on Climb?
(g) [5] Is there a significant interaction between Distance and Climb? Support your answer
with the appropriate test of hypothesis.
(h) [3] For the model that includes Distance, Climb and Distance*Climb, give the prediction
equation for a race with a 1000 foot Climb.
(i) [3] Describe the plot of residuals versus distance.
2
(j) [6] Describe the distribution of residuals and indicate what this tells you about the conditions necessary for multiple regression analysis.
2. [40 pts] Data are collected on the Age (years) and Height (inches) for 279 females between
the ages of 3 and 20. Refer to the JMP output entitled “Age and Height of Females.”
(a) [3] For the simple linear regression model of Height on Age, why is the estimated intercept not interpretable within the context of the problem?
(b) [5] Is the simple linear regression model of Height on Age useful? Support your answer
with the appropriate test of hypothesis.
(c) [5] For the simple linear model of Height on Age, describe the plot of residuals versus
Age. What does this tell you about the adequacy of the simple linear regression model
of Height on Age?
3
(d) [3] Give the equation that predicts Height from Age and Age 2 .
(e) [5] Does Age2 add significantly to the quadratic (Degree=2) model? Support your answer with the appropriate test of hypothesis.
(f) [5] A more complicated quartic (Degree=4) model is fit. Is this model useful? Support
your answer with the appropriate test of hypothesis.
(g) [5] Which variables add significantly to the quartic (Degree=4) model? Explain briefly.
4
(h) [4] Compare the quadratic (Degree=2) model to the quartic (Degree=4) model by filling
in the appropriate values in the table below. Round numerical values to 3 decimal places.
Quadratic (Degree=2)
Quartic (Degree=4)
R2
adjR2
RMSE
Predicted Height
for Age = 15
Predicted Height
for Age = 20
(i) [5] Which model, quadratic or quartic, does a better job predicting the Height of females
with Ages from 3 to 20? Explain briefly.
3. [20 pts] The taste of cheese is affected by chemical processes that occur as the cheese ages.
Thirty samples of mature (aged) cheese are submitted to chemical analysis. A panel of tasters
evaluates the samples and a taste score is obtained. The variables are; Taste: the panel’s
taste score, higher is better, Acetic: acetic acid concentration index, H2S: hydrogen sulfide
concentration index, Lactic: lactic acid concentration index. Refer to the JMP output entitled
“The Taste of Cheese.”
(a) [2] Of the three variables; Acetic, H2S and Lactic, which has the highest correlation
with Taste? What is that correlation?
5
(b) [3] If a simple linear regression of Taste on Acetic is fit, how much of the variation in
Taste would be explained by the explanatory variable, Acetic?
(c) [5] Is the model that contains all three variables, Acetic, H2S and Lactic, the best model?
Explain your answer briefly.
(d) [5] There are three possible 1-variable models, three possible 2-variable models and one
3-variable model. Of these seven models, how can you be sure that the model with H2S
and Lactic is the best?
(e) [2] Use the best model to predict the Taste of cheese that has Acetic=5.0, H2S=3 and
Lactic=1.5. Round your final answer to 2 decimal places.
(f) [3] Is the condition of normally distributed errors satisfied? Explain briefly.
6
Download