Stat 401G Lab 6: Solution Fall 2012 1. What is the relationship between the distance walked and the time it takes to deliver soft drinks to vending machines? An industrial engineer collected 20 observations on the delivery time (in minutes), number of cases delivered and distance walked (in feet). Delivery time includes time to unload the cases from the delivery truck, walk to the machine and load the machine with the soft drinks. The data are given below. Cases 7 3 3 4 6 7 2 7 5 10 Distance 560 220 340 80 150 330 110 210 605 215 Time 16.68 11.50 12.03 14.88 13.75 18.11 8.00 17.83 21.50 21.00 Cases 4 6 9 6 7 3 10 9 8 4 Distance 255 462 448 200 132 36 140 450 635 150 Time 13.50 19.75 24.00 15.35 19.00 9.50 17.90 18.75 19.83 10.75 a) Fit a simple linear model with Time as your response variable and Distance as your explanatory variable. Use the JMP output to answer the following questions. Be sure to turn in JMP output with your answers. What is the least squares regression equation for this model? Predicted Time = 12.06 + 0.0144*Distance Predict the delivery time for 6 cases and 200 feet. Predicted Time = 12.06 + 0.0144(200) = 12.06 + 2.88 = 14.94 minutes Give an interpretation of the estimated slope within the context of the problem. For each additional foot the delivery person walks, the time increases by 0.0144 minutes, on average. Give and interpretation of the estimated intercept within the context of the problem. If the delivery person does not have to walk any distance (the machine is right next to the truck) the predicted average time is 12.06 minutes. How much of the variation in Time is explained by the linear relationship with Distance? R2 = 0.362. Only 36.2% of the variation in Time is explained by the linear relationship with Distance. Is Distance a statistically significant variable for predicting Time? Support your answer statistically. Yes. F = 10.2335 or t = 3.20, with P-value = 0.0050. The small P-value indicates that the Distance is a statistically significant variable for predicting Time. 1 b) Fit a multiple regression model with Time as your response variable and Distance and Cases as explanatory variables. Use the JMP output to answer the following questions. Be sure to turn in JMP output with your answers. What is the least squares regression equation for this model? Predicted Time = 6.30 + 1.221*Cases + 0.0089*Distance Predict the delivery time for 6 cases and 200 fett. How does this compare to the prediction in a)? Predicted Time = 6.30 + 1.221(6) + 0.0089(200) = 6.30 + 7.326 + 1.78 = 15.41 minutes. This predicted time is about a half a minute longer than the prediction in a). Give an interpretation of the estimated slope for Distance within the context of the problem. Holding the number of cases constant, the average delivery time increases by 0.0089 minutes for each additional foot of distance. Give an interpretation of the estimated slope for Cases within the context of the problem. Holding the distance constant, the average delivery time increases by 1.221 minutes for each additional case. Why is there not an interpretation of the estimated intercept within the context of the problem for this model? The estimated intercept gives the predicted value of time when the distance is zero (the truck is next to the machine) and there are zero cases delivered. Distance being zero makes sense but cases being zero does not. If there are no cases to deliver there is no delivery. Does Cases add significantly to the model with Distance? statistically. Support your answer Yes. F = 33.2253 or t = 5.76 with a P-value < 0.0001. The small P-value indicates that Cases adds significantly to the model with Distance. c) Fit a multiple regression model with Time as your response variable and Distance, Cases, and Distance*Cases as explanatory variables. Be careful! Remember to turn off the Center Polynomial option before you fit the model. Use the JMP output to answer the following questions. Be sure to turn in JMP output with your answers. What is the least squares regression equation for this model? Predicted Time = 3.99 + 1.619*Cases + 0.0199*Distance – 0.00173*Cases*Distance Predict the delivery time for 6 cases and 200 feet. How does this compare to the prediction in b)? Predicted time = 3.99 + 1.619(6) + 0.0199(200) – 0.00173(6)(200) = 3.99 + 9.714 + 3.98 – 2.076 = 15.61 minutes. This predicted time is 0.20 minutes longer than in b). 2 Does Distance*Cases add significantly to the model with Distance and Cases? Support your answer statistically. No. F = 1.5255 or t = –1.24 with a P-value = 0.2346. The P-value is not small, therefore the variable Distance*Cases does not add significantly to the model with Distance and Cases. What does the result above tell you about interaction between Distance and Cases? Because the interaction term is not statistically significant there is no statistically significant interaction between Distance and Cases. What does the result above tell you about the relationship between Time and Distance as you change the number of delivered? Because there is not statistically significant interaction between Distance and Cases, the linear relationship between Time and Distance does not change as we change the number of cases. d) Fit a multiple regression model with Time as your response variable and Distance, Cases, and (Distance – Mean Distance)*(Cases – Mean Cases) i.e. leave the Center Polynomial option turned on. Use the JMP output to answer the following questions. Be sure to turn in JMP output with your answers. What is the least squares regression equation for this model? Predicted Time = 6.96 + 1.124*Cases + 0.0095*Distance – 0.00173*(Cases – 6)*(Distance – 286.4) Predict the delivery time for 6 cases and 200 feet. How does this compare to the prediction in c)? Predicted Time = 6.96 + 1.124(6) + 0.0095(200) – 0.00173(6 – 6)(200 – 286.4) = 6.96 + 6.744 + 1.9 = 15.61 minutes. This is virtually the same prediction as in c). Does (Distance – Mean Distance)*(Cases – Mean Cases) add significantly to the model with Distance and Cases? Support your answer statistically. No. F = 1.5255 or t = –1.24 with a P-value = 0.2346. The P-value is not small, therefore the variable (Distance – 286.4)*(Cases – 6) does not add significantly to the model with Distance and Cases. How does this compare to your result in c) about the interaction between Distance and Cases? This is the same result as in part c), there is no statistically significant interaction between Cases and Distance. 3 e) For the “best” model for these data, compute the residuals. Plot the residuals versus Distance. Plot the residuals versus Cases. Analyze the distribution of the residuals. Use the JMP output to answer the following questions. Be sure to turn in JMP output with your answers. Note: The “best” model would be the model that contains only Distance and Cases. What do the plots of residuals versus the explanatory variables tell you about the adequacy of the model you have chosen as “best”? What do they tell you about the equal standard deviation condition? Be sure to support your answers by referring to the plots. In plot of residuals versus distance is a bit strange. There appears to be a butterfly type pattern, e.g. wide, narrow, wide spread. The linear relationship is probably ok but the equal standard deviation may be somewhat in doubt. The plot of residuals versus cases is a random scatter and so the linear relationship with cases is the best we can do. Also the spread around the zero line is constant for different values of cases indicating that the equal standard deviation condition is met. What does the analysis of the distribution of residuals tell you about the conditions of identically and normally distributed residuals? Be sure to support your answers by referring to the analysis. There are not many residuals to analyze and so the histogram and box plot may not be the most informative regarding a normal distribution. The histogram does show a possibly bi-modal distribution which would cast doubt on the identically distributed condition. The box plot is fairly symmetric although the “mound” is to the left of zero and the mean is pulled slightly right, indicating a slight skew to the right. The normal quantile plot shows points on the diagonal “normal model” line, then above then below. The normal distribution condition may not be met exactly. In general, there are problems with the identically distributed and normally distributed error conditions. However, all of the P-values are quite extreme so the no interaction model would still be the “best.” 4