The Toluca Company www.oswego.edu/~srp/stats/toluca.txt A scatterplot, regression analysis and residual plots are shown. 550 450 6 400 5 350 4 Frequency workhours 500 300 250 200 3 2 1 150 0 100 -80 20 30 40 50 60 70 80 90 100 110 -60 -40 -20 0 20 40 60 80 100 RESI1 120 lotsize 97.64 Normal Probability Plot of the Residuals (response is workhour) 2 48.82 residual Normal Score 1 0 0.00 -48.82 -1 -2 -97.64 -100 0 100 Residual 20 70 120 lotsize The regression equation is workhours = 62.4 + 3.57 lotsize Predictor Constant lotsize Coef 62.37 3.5702 S = 48.82 SE Coef 26.18 0.3470 R-Sq = 82.2% T 2.38 10.29 P 0.026 0.000 R-Sq(adj) = 81.4% Analysis of Variance Source Regression Residual Error Total DF 1 23 24 SS 252378 54825 307203 MS 252378 2384 F 105.88 P 0.000 Predicted Values for New Observations New Obs 1 Fit 294.43 SE Fit 9.92 ( 90.0% CI 277.43, 311.43) ( 90.0% PI 209.04, 379.81) Values of Predictors for New Observations New Obs 1 lotsize 65.0 Toluca Company 1 1. Here are three of the assumptions that need to be assessed before making inferences. In this situation all three are met. For each, describe briefly how the data confirms the assumption. Assumption 1: The relationship between the mean of the response and the predictor is linear. Assumption 2: The standard deviation is approximately the same no matter what the predictor value is. Assumption 3: Deviations from the relationship have approximate normal distribution. 2. Draw the estimated regression line on the scatterplot. (Use x = 20 first. Now, if x = 120 we have a 100 lot increase, so the estimated time increases by 357.02. Using x = 20 and 120 you can plot the line quite accurately.) 3. Consider the observation with 20 lots, taking 113 hours. What are the fit and residual for this case? 4. Take a look at the plot of residuals vs. lotsize. What percent of the data falls within 1 standard deviation of the line? Within two standard deviations of the line? 5. Obtain a 90% confidence interval for the increase in hours due to an additional lot. Then rewrite the interval to reflect the change in hours due to an additional 10 lots. 6. Consider testing H0: 1 = 0 HA: HA: 1 0. Justify a decision: (All your decisions should be equivalent.) a) using common sense Toluca Company 2 b) based on an examination of the scatterplot c) using your confidence interval d) by computing a test statistic and P-value 7. Associated with the estimated intercept b0 = 62.37 you see a test statistic of t = 2.38 and a Pvalue of 0.000. Verify that the test statistic is correct for a null hypothesis of H0: 0 = 0. Use your table to verify that the P-value is correct for the alternative HA: 0 0. (Minitab produces the Pvalue for the two-sided alternative.) What is the P-value for HA: 0 > 0 (which is a better alternative as no one would ever believe that a negative intercept = start-up time could occur). 8. Toluca Company guidelines specify an average start-up time of 50 hours for this type of job. The plant manager is concerned the start-up process is running too slowly (relative to the standard set in the guidelines). State appropriate hypotheses and them using the data. Obtain a Pvalue and use it to determine how strong the evidence is that the start-up time of 50 is not met. Hypotheses: H0: __________ HA: __________ Test statistic t = P-value = 9. Plug a lot size of x = 65 into the regression equation ŷ = 62.37 + 3.5702x to confirm the “fit” given at the bottom of the output. (You obtain this in Minitab by filling in the “Predictions for new observations” box in the “Options” dialog. There’s also a place to specify the confidence.) 10. Use the fit, its standard error, and the t-table to confirm the given 90% confidence interval. Interpret this interval. Then draw the interval as a vertical line over x = 65 on the scatterplot. Would you say there’s a 90% chance that a job of 65 lots would fall in this range? (It might be wise also to compare the width of this interval with the residual standard deviation.) Toluca Company 3 Solutions 1. The scatterplot and the residual vs. lotsize plot make it clear that a linear relationship is appropriate. The normal probability plot suggests that the error distribution is at least approximately normal. The residual vs. lotsiz plot confirms that the standard deviation of the times is fairly constant over the different values for lotsize. 2. When x = 20, ŷ = 133.77. When x = 120 ŷ = 133.77 + 357.02 = 490.79. 3. Fit fit is 133.77. The residual is -20.77. 4. 16 of 25 = 68%. Then 24 of 25 = 96%. 5. The estimated value is 3.5702 with a standard error of 0.3470. For 90% confidence use t = 1.714. The interval is 3.5702 1.714(0.3470) or 3.5702 0.5948. We’re 90% confident that a job that is 1 lot bigger will take, on average, between 2.98 and 4.16 hours longer. For 10 additional lots the interval is from 29.8 to 41.6 hours. 6. a) Of course a bigger job takes longer. Not only is the null obviously false, even the alternative is wrong (how could bigger jobs take less time?). b) The scatterplot makes it clear that there’s a strong relationship. c) The confidence interval is nowhere near 0; so 0 can be rejected at the 5% level (and much smaller levels). d) The test statistic is (3.5702 – 0)/0.3470 = 10.29 which gives a P-value of 0.000+. Reject H0. 7. (62.37 – 0)/26.18 = 2.38. From the table the tail area is about 0.015, for a P-value of about 0.03. (This is what Minitab has, exact to 0.026.) For the right-sided alternative the P-value would be 0.013. 8. H0: 0 = 50 HA: 0 > 50. The test statistic is t = (62.37 – 50)/26.18 = 0.47. The P-value for such a test is then about 0.32. 9. The fit is, as shown in Minitab, is 62.37 + 3.5702(65) = 294.43. 10. 294.43 1.714(9.92) or 294.43 17.00 which is 277.43 to 311.43. This is a confidence interval. Confidence intervals do not predict individual values, they estimate means. We’re 90% confident that the average job of 65 lots takes between 277 and 311 hours. Individual jobs vary considerably more (the standard deviation is 49, which is far larger than the error margin of 17 for this CI). Toluca Company 4