Applied Business Statistics Exam III December 8, 2006 Answer all of the following (25) questions Use the following to answer questions 1-2: Let p1 represent the population proportion of U.S. Senate and Congress (House of Representatives) democrats who are in favor of a new modest tax on "junk food". Let p2 represent the population proportion of U.S. Senate and Congress (House of Representative) republicans who are in favor of a new modest tax on "junk food". Out of the 265 democratic senators and congressman 106 of them are in favor of a "junk food" tax. Out of the 285 republican senators and congressman only 57 of them are in favor a "junk food" tax. 1. Find a 95 percent confidence interval for the difference between proportions l and 2. pˆ 1 106 0.4 265 s pˆ1 pˆ 2 = pˆ 2 57 0 .2 285 (.4)(0.6) (0.2)(0.8) =0.038297 265 285 Confidence Interval: pˆ 1 pˆ 2 z s pˆ1 pˆ 2 2 Answer: (0.125, 0.275) 2. At = .01, can we conclude that the proportion of democrats who favor "junk food" tax is more than 5% higher than proportion of republicans who favor the new tax? Z= (0.4 0.2) 0.05 3.919 0.038 At α =0.01, 3.91> 2.33 Answer: Reject H0 3. Test H0: μ1 μ2, HA: μ1 > μ2 at α = .10, where X 1 = 77.4, X 2 = 72.2, s1 = 3.3, s2 = 2.1, n1 = 6, n2 = 6. assume population standard deviations are equal. Answer: Reject H0 t10,.10 1.372 (6 1)(3.3) 2 (6 1)(2.1) 2 s 7.65 662 2 1 1 s X1 X 2 7.65 1.597 6 6 77.4 72.2 tcalc 3.256 1.597 Since 3.256 >1.372, reject H 0 Use the following to answer questions 4-5 A fast food company uses two management-training methods. Method 1 is a traditional method of training and Method 2 is a new and innovative method. The company has just hired 36 new management trainees. 15 of the trainees are randomly selected and assigned to the first method, and the remaining 16 trainees are assigned to the second training method. After three months of training, the management trainees took a standardized test. The test was designed to evaluate their performance and learning from training. The sample mean score and sample standard deviation of the two methods are given below. The management wants to determine if the company should implement the new training method. Method 1 Method 2 Mean 69 72 Standard deviation 3.4 3.8 4. Write the null hypothesis and the alternative hypothesis. Answer: H0: 1 - 2 0, H0: 1 2 5. (15 1)(3.4) 2 (16 1)(3.8) 2 s2 13.05 16 15 2 1 1 s X1 X 2 13.05 1.298 15 16 69 72 tcalc 2.311 1.298 -2.311 < -1.6699 reject H0 Use the following to answer questions 6-7: The mid-distance running coach, Zdravko Popovich, for the Olympic team of an eastern European country claims that his six-month training program significantly reduces the average time to complete a 1500-meter run. Five mid-distance runners were randomly selected before they were trained with coach Popovich's six-month training program and their completion time of 1500-meter run was recorded (in minutes). After six months of training under coach Popovich, the same five runners' 1500 meter run time was recorded again the results are given below. Runner Completion time before training Completion time after training 1 5.9 5.4 2 7.5 7.1 3 6.1 6.2 4 6.8 6.5 5 8.1 7.8 6. At an alpha level of .05, can we conclude that there has been a significant decrease in the mean time per mile? Answer: Reject H0, significant decrease in completion time after training. Let Di (Time before)-(Time after) H 0 : d 0, H A : d 0 t4,.05 2.132 .5 .4 (.1) .3 .3 .28 5 sd .228 sd .228, .102 n 5 .28 0 t 2.746 .102 2.746 2.132, reject H 0 d 7. Construct the appropriate 95% confidence interval. Answer: .063 minutes to .497 minutes Let Di (Time before)-(Time after) t4,.05 2.132 .5 .4 (.1) .3 .3 .28 5 sd .228 sd .228, .102 n 5 .28 (2.132)(.102) .063 min. to .497 min. d Use the following to answer questions 8 An experiment was performed on a certain metal to determine if the strength is a function of heating time. Results based on 10 metal sheets are given below. Use the simple linear regression model. X = 30 X = 104 Y = 40 Y = 178 XY = 134 2 2 8. Find the estimated y-intercept. Answer: b0 = 1 (30)(40) 14 10 (30) 2 SS XX 104 14 10 14 b1 1 14 40 30 b0 1 1 10 10 SS XY 134 9-11. Complete the following partial ANOVA table from a simple linear regression analysis with a sample size of 15 observations. Use the F test to test the significance of the model at = .05. Source Regression Error Total SS 309.9 685.5 995.95 DF 1 13 14 MS 309.9 52.77 71.14 F 5.87 Consider the following partial computer output for a multiple regression model. Predictor Constant X1 X2 X3 Coefficient (bi) 99.3883 -0.007207 0.0011336 0.9324 Standard Dev (sb) 0.0031 0.00122 0.373 12. The calculated value of the t statistic for X1 is ________. Answer: -0.00727/0.0031 = -2.325 Use the following to answer questions 113-18: Below is a partial multiple regression ANOVA table. Source X1 X2 X3 Error SS 535.9569 1,167.5634 18.9886 3,459.6803 df 1 1 1 8 13. How many observations were in the sample? Answer: n-(3+1) = 8 , n=12 14. What is the total sum of squares and the degrees of freedom for total sum of squares? SS Total = 535.9569 + 1167.5634 + 18.9886 + 3459.68 = 5182.19 15. What is the mean square error? MSE = 3459.6803/8 = 432.46 16. Calculate the explained variation. Explained variation = SSR = 535.9569 + 1167.5634 + 18.9886 = 1722.51 17. Calculate the proportion of the variation explained by the multiple regression model. Explained variation = SSR = 535.9569 + 1167.5634 + 18.9886 = 1722.51 SSR 1722.51 R2 .3324 SST 5182.19 18. Test the overall usefulness of the model at =.01. Calculate F and make your decision about whether the model is useful for prediction purposes. Answer: F.01,3,8 7.59 535.9569 1167.5634 18.9886 574.17 3 3459.6803 MSE 432.46 8 574.17 F 1.33 432.46 1.33 7.59, failed to reject H 0 MS Regression Use the following to answer questions 19-21: The management of a professional baseball team is in the process of determining the budget for next year. A major component of future revenue is attendance at the home games. In order to predict attendance at home games the team statistician has used a multiple regression model with dummy variables. The model is of the form: y = 0 + 1x1 + 2D2 + 3D3 + where: Y = attendance at a home game x1 = current power rating of the team on a scale from 0 to 100 before the game. x2 and x3 are dummy variables, and they are defined below. x2 = 1, if weekend x2 = 0, otherwise x3 = 1, if weather is favorable x3 = 0, otherwise After collecting the data based on 30 games from last year, and implementing the above stated multiple regression model, the team statistician obtained the following least squares multiple regression equation: yˆ 1050 250 x1 2200 x2 5400 x3 The multiple regression compute output also indicated the following: sb1 800, sb2 1000, sb3 1850 19. Interpret the estimated model coefficient b1 Answer: For each additional rating point the baseball team receives, the average attendance is expected to increase by 250 people when the independent variable (x1) is within the experimental region and the other two independent variables are held constant. Difficulty: Hard Interpret the estimated model coefficient b2. Answer: The estimated average attendance for weekend home games is 2200 people more than the estimated average attendance for weekday home games when the independent variable (x2) is within the experimental region and the other two independent variables are held constant. Difficulty: Hard 20. Assume that the overall model is useful in predicting the game attendance and the team statistician wants to know if the mean attendance is higher on the weekends as compared to the weekdays. State the appropriate null and alternative hypotheses. Answer: H0: 2 0 HA: 2 > 0 21. Assume that the overall model is useful in predicting the game attendance. Assume today is Wednesday morning and the weather forecast indicates sunny, excellent weather conditions for the rest of the day. Later today, there is a home baseball game for this team. Assume that the current power rating of the team is 85 and predict the attendance for today's game. yˆ 1050 250(85) 2200(0) 5400(1) 25, 600 22. SSE = 1.10 n=5 s 1.10 0.6055 3 23. yˆ 6.72 1.39 sb 0.06 1.39 1.67(0.06) (1.289, 1.49) b1 1.2024 4.098 sb1 0.2934 There is significant negative relationship. 24. t 25. s 2 p (n1 1) s12 (n 2) s 22 n1 n2 2 (13 1)(5) 2 (10 1)(3) 2 s 4.259 23 2 2 p 85. Calculate the coefficient of determination. Answer: .7777