Stat 401G Lab 6: Solution Fall 2012

advertisement
Stat 401G
Lab 6: Solution
Fall 2012
1. What is the relationship between the distance walked and the time it takes to deliver soft drinks
to vending machines? An industrial engineer collected 20 observations on the delivery time (in
minutes), number of cases delivered and distance walked (in feet). Delivery time includes time
to unload the cases from the delivery truck, walk to the machine and load the machine with the
soft drinks. The data are given below.
Cases
7
3
3
4
6
7
2
7
5
10
Distance
560
220
340
80
150
330
110
210
605
215
Time
16.68
11.50
12.03
14.88
13.75
18.11
8.00
17.83
21.50
21.00
Cases
4
6
9
6
7
3
10
9
8
4
Distance
255
462
448
200
132
36
140
450
635
150
Time
13.50
19.75
24.00
15.35
19.00
9.50
17.90
18.75
19.83
10.75
a) Fit a simple linear model with Time as your response variable and Distance as your
explanatory variable. Use the JMP output to answer the following questions. Be sure to turn
in JMP output with your answers.
 What is the least squares regression equation for this model?
Predicted Time = 12.06 + 0.0144*Distance

Predict the delivery time for 6 cases and 200 feet.
Predicted Time = 12.06 + 0.0144(200) = 12.06 + 2.88 = 14.94 minutes

Give an interpretation of the estimated slope within the context of the problem.
For each additional foot the delivery person walks, the time increases by 0.0144
minutes, on average.

Give and interpretation of the estimated intercept within the context of the problem.
If the delivery person does not have to walk any distance (the machine is right next
to the truck) the predicted average time is 12.06 minutes.

How much of the variation in Time is explained by the linear relationship with Distance?
R2 = 0.362. Only 36.2% of the variation in Time is explained by the linear
relationship with Distance.

Is Distance a statistically significant variable for predicting Time? Support your answer
statistically.
Yes. F = 10.2335 or t = 3.20, with P-value = 0.0050. The small P-value indicates that
the Distance is a statistically significant variable for predicting Time.
1
b) Fit a multiple regression model with Time as your response variable and Distance and Cases
as explanatory variables. Use the JMP output to answer the following questions. Be sure to
turn in JMP output with your answers.
 What is the least squares regression equation for this model?
Predicted Time = 6.30 + 1.221*Cases + 0.0089*Distance

Predict the delivery time for 6 cases and 200 fett. How does this compare to the
prediction in a)?
Predicted Time = 6.30 + 1.221(6) + 0.0089(200) = 6.30 + 7.326 + 1.78 = 15.41
minutes. This predicted time is about a half a minute longer than the prediction in
a).

Give an interpretation of the estimated slope for Distance within the context of the
problem.
Holding the number of cases constant, the average delivery time increases by 0.0089
minutes for each additional foot of distance.

Give an interpretation of the estimated slope for Cases within the context of the problem.
Holding the distance constant, the average delivery time increases by 1.221 minutes
for each additional case.

Why is there not an interpretation of the estimated intercept within the context of the
problem for this model?
The estimated intercept gives the predicted value of time when the distance is zero
(the truck is next to the machine) and there are zero cases delivered. Distance
being zero makes sense but cases being zero does not. If there are no cases to
deliver there is no delivery.

Does Cases add significantly to the model with Distance?
statistically.
Support your answer
Yes. F = 33.2253 or t = 5.76 with a P-value < 0.0001. The small P-value indicates
that Cases adds significantly to the model with Distance.
c) Fit a multiple regression model with Time as your response variable and Distance, Cases,
and Distance*Cases as explanatory variables. Be careful! Remember to turn off the Center
Polynomial option before you fit the model. Use the JMP output to answer the following
questions. Be sure to turn in JMP output with your answers.
 What is the least squares regression equation for this model?
Predicted Time = 3.99 + 1.619*Cases + 0.0199*Distance – 0.00173*Cases*Distance

Predict the delivery time for 6 cases and 200 feet. How does this compare to the
prediction in b)?
Predicted time = 3.99 + 1.619(6) + 0.0199(200) – 0.00173(6)(200) = 3.99 + 9.714 +
3.98 – 2.076 = 15.61 minutes. This predicted time is 0.20 minutes longer than in b).
2

Does Distance*Cases add significantly to the model with Distance and Cases? Support
your answer statistically.
No. F = 1.5255 or t = –1.24 with a P-value = 0.2346. The P-value is not small,
therefore the variable Distance*Cases does not add significantly to the model with
Distance and Cases.

What does the result above tell you about interaction between Distance and Cases?
Because the interaction term is not statistically significant there is no statistically
significant interaction between Distance and Cases.

What does the result above tell you about the relationship between Time and Distance as
you change the number of delivered?
Because there is not statistically significant interaction between Distance and Cases,
the linear relationship between Time and Distance does not change as we change
the number of cases.
d) Fit a multiple regression model with Time as your response variable and Distance, Cases,
and (Distance – Mean Distance)*(Cases – Mean Cases) i.e. leave the Center Polynomial
option turned on. Use the JMP output to answer the following questions. Be sure to turn in
JMP output with your answers.
 What is the least squares regression equation for this model?
Predicted Time = 6.96 + 1.124*Cases + 0.0095*Distance – 0.00173*(Cases –
6)*(Distance – 286.4)

Predict the delivery time for 6 cases and 200 feet. How does this compare to the
prediction in c)?
Predicted Time = 6.96 + 1.124(6) + 0.0095(200) – 0.00173(6 – 6)(200 – 286.4) = 6.96
+ 6.744 + 1.9 = 15.61 minutes. This is virtually the same prediction as in c).

Does (Distance – Mean Distance)*(Cases – Mean Cases) add significantly to the model
with Distance and Cases? Support your answer statistically.
No. F = 1.5255 or t = –1.24 with a P-value = 0.2346. The P-value is not small,
therefore the variable (Distance – 286.4)*(Cases – 6) does not add significantly to
the model with Distance and Cases.

How does this compare to your result in c) about the interaction between Distance and
Cases?
This is the same result as in part c), there is no statistically significant interaction
between Cases and Distance.
3
e) For the “best” model for these data, compute the residuals. Plot the residuals versus
Distance. Plot the residuals versus Cases. Analyze the distribution of the residuals. Use the
JMP output to answer the following questions. Be sure to turn in JMP output with your
answers.
Note: The “best” model would be the model that contains only Distance and Cases.

What do the plots of residuals versus the explanatory variables tell you about the
adequacy of the model you have chosen as “best”? What do they tell you about the
equal standard deviation condition? Be sure to support your answers by referring to the
plots.
In plot of residuals versus distance is a bit strange. There appears to be a butterfly
type pattern, e.g. wide, narrow, wide spread. The linear relationship is probably ok
but the equal standard deviation may be somewhat in doubt.
The plot of residuals versus cases is a random scatter and so the linear relationship
with cases is the best we can do. Also the spread around the zero line is constant
for different values of cases indicating that the equal standard deviation condition
is met.

What does the analysis of the distribution of residuals tell you about the conditions of
identically and normally distributed residuals? Be sure to support your answers by
referring to the analysis.
There are not many residuals to analyze and so the histogram and box plot may not
be the most informative regarding a normal distribution. The histogram does show
a possibly bi-modal distribution which would cast doubt on the identically
distributed condition. The box plot is fairly symmetric although the “mound” is to
the left of zero and the mean is pulled slightly right, indicating a slight skew to the
right. The normal quantile plot shows points on the diagonal “normal model” line,
then above then below. The normal distribution condition may not be met exactly.
In general, there are problems with the identically distributed and normally
distributed error conditions. However, all of the P-values are quite extreme so the
no interaction model would still be the “best.”
4
Download