STATISTICS 479 Assignment #6 (40 points)

advertisement
Fall 2013
STATISTICS 479
Assignment #6 (40 points)
Instructions: Turn in the programs, the output, the plots and the written answers (when required to
do so) for each part of both questions.
1. A person’s muscle mass is expected to decrease with age. To explore this relationship in women, a
nutritionist selected a random sample of women between the ages of 40 and 80. In the data table
below, x is the age in years and y is a measure of muscle mass. Use a SAS program to fit a simple
linear regression model to this data. You must use SAS statements to produce output necessary to
answer each part when necessary. Must attach your SAS output but write the answers in the form
discussed in the text to each of the questions using information from the output.
x 71
y 82
64
91
43 67 56 73 68 56 76 65
100 68 87 73 78 80 65 84
45
116
58 45 53
76 97 100
49
105
78
77
Use the data and regression model y = β0 + β1 x + from Problem 1 in proc reg step(s) to produce
a Normal quantile plot of residuals, a residuals versus predicted values plot, a plot containing a
regression line, confidence limits, and prediction limits overlaid on a scatter plot of data, and a plot
of residuals versus the explanatory variable. Use the plots= option to select plots to be output.
Also use the anova table, the estimates table, and the output statistics table as necessary to answer
all questions given below.
(a) Using numbers output from the SAS output, construct an analysis of variance table including
a column for the F-statistic to test H0 : β1 = 0 against Ha : β1 6= 0 at α = .05. State your
decision using the p-value.
(b) Give the least squares estimates of β0 and β1 and their standard errors, respectively. What is
your prediction equation?
(c) Use your prediction equation to estimate the expected loss in mean muscle mass associated
with a 5-year increase in age? Show your work clearly.
(d) What is the coefficient of determination for your regression equation? In you own words,
explain what this number means to you in terms of variability in muscle mass.
(e) Construct a 95% confidence interval for β1 . State in words what this interval says about the
expected decrease in muscle mass.
(f) Test the hypothesis H0 : β1 = 0 against Ha : β1 6= 0 at α = .05 using the p-value from the
estimates table. State your decision.
(g) Find the point estimate of the mean muscle mass for all women aged 60 years. Obtain a
95% confidence interval for the mean muscle mass for all women aged 60 years. Remember to
modify the data to include a case to make SAS compute these.
(h) The following plots must be produced as part of the graphical output from your SAS program
as described at the beginning of this question. Attach these plots to your solution and answer
the questions relating to them, if any.
1
i. Obtain a graph with plots of the 95% confidence interval and 95% prediction interval
curves for the fitted regression line overlaid on a scatter plot of the original data.
ii. Obtain a scatter plot of residuals against the x variable. Does this plot to explain whether
the straight line model is adequate or not? Explain.
iii. Obtain a normal probability plot residuals and a plot of residuals against the predicted values variable. Do these plots indicate that any of the model assumptions are not plausible?
In particular, is the assumption of normal errors reasonable? Explain.
2. It is reasonable to expect that heavier an automobile is it will be less efficient as reflected by the
miles per gallon (MPG) rating of the vehicle. The following data give MPG ratings (y) under city
driving conditions and the weight (x) of a random sample of 16 new vehicles.
Automobile Weight(lbs.)
ID
x
A
2620
B
2875
C
2320
D
3215
E
3440
F
3510
G
3570
H
2790
I
3150
J
3240
K
3670
L
3730
M
2200
N
2465
O
1835
P
2045
MPG
y
16.0
21.0
22.8
21.4
18.7
19.1
14.3
24.4
22.8
19.2
16.4
17.3
30.4
25.5
31.9
26.3
Use a SAS program to fit a single variable regression model and obtain all residual case statistics
and diagnostic plots discussed in class. Also use proc sgplot to obtain a scatterplot of the data with
each point labeled using the Vehicle Id.
(a) Are there any cases that are x-outliers? Explain.
(b) Are there any cases that are y-outliers? Explain.
(c) The Cook’s D statistic for some of these cases are ’large’. Explain reasons for this by using
the fact Cook’s D is a product of functions of studentized residuals and hat diagonals.
(d) Suppose that the model is refitted after the vehicles labeled A and O are deleted from the data
set one at a time. Discuss the model fit for each of these compared with the fit of the original
model.
(e) Explain the effect of these cases on the model fit by using the appropriate case statistics output
from the model fit to the original data.
Due Thursday, November 7, 2013
2
Download