4/19/02 252x0232 c (Page layout view!) ECO252 QBA2 THIRD HOUR EXAM April 18, 2002 Name Correct Hour of Class Registered (Circle) MWF TR 10 12 12:30 2:00 I. (10+ points) Do all the following; 1. Hand in your computer printouts for problems 2 and 3.(5 points – 3 point penalty for not handing in). remember that the ANOVA printout must be completed, using a 5% significance level, for full credit. I should be able to tell what is tested and what are the conclusions. 2. a. In particular, is the interaction between car and driver significant? Which numbers made you think that? (2) b. Create two confidence intervals for the difference between the means for drivers 2 and 3, one that is valid alone, and one that is valid simultaneously with other similar intervals. Do these intervals show a significant difference between these two means? Why? (4) c. In your income and education regression, (i) explain what coefficients are significant and why? (2) (ii) What income would you predict for someone with 3 years of education? (1) (iii) Make a confidence interval for the income of someone with 3 years of education using some of the information generated by Minitab below. (2) Descriptive Statistics Variable Educ N 32 Mean 12.000 Median 12.000 TrMean 12.071 Variable Educ Min 4.000 Max 20.000 Q1 8.000 Q3 16.000 StDev 4.363 Column Sum of Squares Sum of squares (uncorrected) of Educ = 5198.0 SEMean 0.771 4/12/02 252x0232 II. Do at least 4 of the following 5 Problems (at least 10 each) (or do sections adding to at least 40 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where applicable. Never say 'yes' or 'no' without a statistical test. 1. On the following pages there are printouts from two computer problems. a. The One-way ANOVA Problem ( Albright, Winston, Zappe - abbreviated): An automobile parts producer has instituted an employee empowerment program in five plants. Random samples of employees in each plant are asked to rate the success of the program on a 1 to 10 scale. 10 being the highest rating. They want to know if the program is being implemented with equal success at each plant and are thus looking to see if there is a significant difference between mean ratings at each plant. They are assuming that the results are distributed according to Normal distributions with similar variances. (i) Indicate what hypothesis was tested, what the p-value was and whether, using the p-value, you would reject the null if () the significance level was 5% and () the significance level was 1%. Explain why. Does this mean that the success was equal in all plants? (3) (ii) Do a 'normal' and a Scheffe confidence interval .05 for the difference between the means in the two plants that were least successful. Do these intervals indicate a difference in the success of the program between these two plants? Why? (4.5). (iii) The printout gives 95% confidence intervals for the means for each plant. Find the numbers for the confidence interval for 'Midwest.' Why is this interval smaller than the others? (2.5) (iv) I would question whether ANOVA was appropriate for this problem because there is no evidence that the underlying populations are Normally distributed. What method would I prefer for this problem? (1) b. The Regression Problem: This relates the number of shares in thousands to the age of board members of a corporation. (i) Looking at significance tests and the value of R-squared, how successful is this regression? Why? Why shouldn't this surprise you? (3) (ii) Note that c1 contains 'shares' and that c4 contains predicted values of 'shares.' Add a regression line to the graph. (1) (ii) What equation relates the number of shares owned to the age of the board member? How many shares does it say that we should expect a 83-year old board member to own? Would you take this seriously? Why? (2) 2 4/12/02 252x0232 One-way ANOVA problem MTB > RETR 'C:\MINITAB\2X0232-1.MTW'. Retrieving worksheet from file: C:\MINITAB\2X0232-1.MTW Worksheet was saved on 4/ 9/2002 MTB > print c1-c5 Data Display Row south midwest n-east s-west west 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 7 1 8 7 2 9 3 8 5 7 4 7 6 10 3 9 10 8 4 3 2 7 7 5 10 10 6 3 5 2 6 4 5 2 7 8 7 7 5 5 5 4 3 4 5 5 3 3 3 5 5 6 4 7 10 7 6 6 7 4 3 7 8 9 10 4 10 4 6 6 6 6 6 3 4 8 6 2 4 5 6 4 7 4 3 5 4 7 6 4 MTB > AOVOneway c1 c2 c3 c4 c5. One-Way Analysis of Variance Analysis of Variance Source DF SS Factor 4 46.24 Error 85 393.55 Total 89 439.79 Level south midwest n-east s-west west Pooled StDev = N 11 26 14 18 21 Mean 5.545 6.000 4.429 6.556 5.048 MS 11.56 4.63 StDev 2.697 2.623 1.158 2.229 1.532 F 2.50 p 0.049 Individual 95% CIs For Mean Based on Pooled StDev ---+---------+---------+---------+--(----------*----------) (------*------) (---------*--------) (--------*-------) (-------*-------) ---+---------+---------+---------+--- 2.152 Regression Problem Worksheet size: 100000 cells MTB > RETR 'C:\MINITAB\2X0232-5.MTW'. Retrieving worksheet from file: C:\MINITAB\2X0232-5.MTW Worksheet was saved on 4/12/2002 MTB > echo MTB > Execute 'C:\MINITAB\252SOLS3.MTB' 1. Executing from file: C:\MINITAB\252SOLS3.MTB MTB > #252sols3 MTB > print c1 c2 3 4/12/02 252x0232 Data Display Row shares age 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 7.9 66.4 29.7 60.5 10.4 28.7 86.9 121.1 35.3 2.8 74.4 13.1 9.1 19.1 18.8 3.1 96.5 47.0 31.1 53 60 69 49 67 68 46 62 63 55 57 71 66 70 66 57 54 64 56 MTB > plot c1*c2 (plot omitted) MTB > regress c1 on 1 c2 c3 c4 Regression Analysis The regression equation is shares = 153 - 1.86 age Predictor Constant age Coef 152.95 -1.860 s = 33.01 Stdev 64.82 1.061 R-sq = 15.3% t-ratio 2.36 -1.75 p 0.031 0.098 R-sq(adj) = 10.3% Analysis of Variance SOURCE Regression Error Total DF 1 17 18 SS 3348 18522 21870 MS 3348 1090 F 3.07 Unusual Observations Obs. age shares Fit Stdev.Fit 8 62.0 121.10 37.65 7.70 R denotes an obs. with a large st. resid. Residual 83.45 St.Resid 2.60R plot c4*c2 (plot omitted) plot c4*c2 c1*c2; symbol; type 3 1; color 8 9; overlay. end 100 C4 MTB > MTB > SUBC> SUBC> SUBC> SUBC> MTB > p 0.098 50 0 50 60 70 age 4 4/12/02 252x0232 2. A researcher believes that the data below has a Normal distribution with a mean of 80 and a standard x x 80 deviation of 5. For your convenience the values of z are computed for you. 5 a. Use a chi-squared test to find out if the distribution is correct. (9) b. Is there a better way to do this problem than chi-squared? Why? Do it. (5) c. Assume that, instead of using population means given above, we actually checked the data and found that x 80 and s 5. How would this change what we did in a)? (1) d. Assume that, instead of using population means given above, we actually checked the data and found that x 80 and s 5. How would this change what we did in b)? (1) x interval z interval below 74 below -1.2 74-78 -1.2 to -0.4 78-82 -0.4 to 0.4 82-86 0.4 to 1.2 86-90 1.2 to 2.0 above 90 above 2.0 Observed Frequency 23 53 52 46 24 2 200 5 4/12/02 252x0232 3. (Weirs) A maker of stain removers is testing the effectiveness of four different formulations of a new product. Columns represent formulations 1-4 of the product and the 6 rows represent different stains (Creosote, crayon, motor oil, grape juice, ink, coffee). Each formulation is rated on a 1-10 scale for its effectiveness. Stain 1 2 3 4 5 6 Sum Count Form 1 Form 2 Form 3 Form 4 1 7 2 5 9 10 7 5 4 6 1 4 9 7 4 5 6 8 4 4 9 4 2 6 38 42 20 29 6 6 6 6 Sum of Squares 296 314 sum count 15 4 31 4 15 4 25 4 22 4 21 4 129 24 24 Sum of squares 79 255 69 171 132 137 843 90 a. Assume that the parent distribution is Normal and compare the mean ratings for the four formulations, noting the fact that it is cross-classified. Use .10 . (14) Note: If you wish to ignore that the fact that the data is classified by stain type, indicate this now and compare the column means assuming that the data is four independent random samples from a Normal distribution.(10). ( .10 ) b. Using the same significance level, assume that Formulation 1 is the current formula and use Scheffe intervals to see which formulations have mean ratings that differ significantly from the current formulation. (4) c. Using a significance level of 15%, repeat the analysis in b) using Bonferroni intervals. (4) 6 4/12/02 252x0232 3(ctd.). d. Actually, when Weirs presented the data in the previous problem, repeated below, he assumed that the underlying distribution was not Normal. So compare the median ratings using a 10% significance level. (6) Stain 1 2 3 4 5 6 Sum Count Sum of Squares Form 1 Form 2 Form 3 Form 4 1 7 2 5 9 10 7 5 4 6 1 4 9 7 4 5 6 8 4 4 9 4 2 6 38 42 20 29 6 6 6 6 296 314 sum count 15 4 31 4 15 4 25 4 22 4 21 4 129 24 24 Sum of squares 79 255 69 171 132 137 843 90 7 4/12/02 252x0232 4. Use methods appropriate to testing goodness of fit. a. Test the hypothesis that the numbers below came from a Normal distribution. Use a 10% significance level. (6) note that Minitab says the following: mean 303.000 stdev 64.0878 n 9.00000 b. Test the hypothesis that the numbers below came from a Normal distribution with a mean of 240 and a standard deviation of 50 (6) 238 222 272 280 292 301 333 357 432 8 4/12/02 252x0232 5. (Weirs) The following data gives years of membership and numbers of shares (in thousands) owned for 8 board members of our corporation. Numbers are the dependent variable and years is the independent variable. Data Display Row 1 2 3 4 5 6 7 8 Total share years 300 408 560 252 288 650 600 522 3580 6 12 14 6 9 13 15 9 84 years shares squared squared 36 90000 144 166464 196 313600 36 63504 81 82944 169 422500 225 390000 81 272484 968 1771496 Note that n 8 and that you will have to compute xy . a. Compute the regression equation Y b0 b1 x to predict thousands of shares owned on the basis of age. (6) b. On the basis of your regression, how many thousands of shares do you expect to be owned by someone who has been on the board for 3 years ? (1) c. Compute R 2 . (4) d. Compute s e . (3) e. Compute s b0 and do a significance test on b0 .(4) f.. Do an interval that shows the average number of shares that would be owned by someone who has been on the board for 3 years. (3) g. Using your SST etc., put together the ANOVA table (6) 9 4/12/02 252x0232 (Intentionally left blank for calculations) 10