STAT 211 – 200 EXAM 5 – FORM A SUMMER03 Although tea is the world’s most widely consumed beverage after water, little is known about its nutritional value. Folacin is the only B vitamin present in any significant amount in tea, and recent advances in assay methods have made accurate determination of folacin content feasible. Consider the data on folacin content for randomly selected specimens of the four leading brands of green tea. Brand Observations 1 7.9 6.2 6.6 8.6 8.9 10.1 9.6 2 5.7 7.5 9.8 6.1 8.4 3 6.8 7.5 5.0 7.4 5.3 6.1 4 6.4 7.1 7.9 4.5 5.0 4.0 Let 1 , 2 , 3 , 4 be the true average folacin content for Brand 1, Brand 2, Brand 3, Brand 4 respectively. The following are the MINITAB results for different analysis. Analysis of Variance Source Factor Error Total Level Brand Brand Brand Brand 1 2 3 4 DF 3 20 SS 23.50 41.78 65.27 MS 7.83 n 7 5 6 6 Mean 8.271 7.500 6.350 5.817 StDev 1.463 1.681 1.060 1.551 Pooled StDev = F 3.75 P 0.028 1.445 Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0111 Critical value = 3.96 Intervals for (column level mean) - (row level mean) Brand2 Brand1 -1.598 3.141 Brand2 Brand3 -0.330 4.173 -1.301 3.601 Brand4 0.203 4.706 -0.767 4.134 Brand3 -1.803 2.870 Test for Equal Variances Bartlett's Test (normal distribution) Test Statistic: 0.965 P-Value : 0.810 Levene's Test (any continuous distribution) Test Statistic: 0.433 P-Value : 0.732 Comparison of two means: Difference = mu Brand 1 - mu Brand 4 Estimate for difference: 2.455 95% CI for difference: (0.582, 4.328) T-Test of difference = 0 (vs not =): T-Value = 2.92 P-Value = 0.015 DF = 10 STAT 211 – 200 EXAM 5 – FORM A Difference = mu Brand 1 - mu Brand 4 Estimate for difference: 2.455 95% CI for difference: (0.614, 4.296) T-Test of difference = 0 (vs not =): T-Value = 2.93 Both use Pooled StDev = 1.50 Difference = mu Brand 2 - mu Brand 4 Estimate for difference: 1.683 95% CI for difference: (-0.583, 3.950) T-Test of difference = 0 (vs not =): T-Value = 1.71 Difference = mu Brand 2 - mu Brand 4 Estimate for difference: 1.683 95% CI for difference: (-0.522, 3.889) T-Test of difference = 0 (vs not =): T-Value = 1.73 Both use Pooled StDev = 1.61 Difference = mu Brand 3 - mu Brand 4 Estimate for difference: 0.533 95% CI for difference: (-1.235, 2.302) T-Test of difference = 0 (vs not =): T-Value = 0.70 Difference = mu Brand 3 - mu Brand 4 Estimate for difference: 0.533 95% CI for difference: (-1.175, 2.242) T-Test of difference = 0 (vs not =): T-Value = 0.70 Both use Pooled StDev = 1.33 SUMMER03 P-Value = 0.014 DF = 11 P-Value = 0.125 DF = 8 P-Value = 0.118 DF = 9 P-Value = 0.506 DF = 8 P-Value = 0.503 DF = 10 Folacin data found to be normally distributed. Answer the following 10 questions using all this information. 1. 2. 3. Which of the following is correct based on the information and analysis for the assumptions required for the analysis? (a) Only normality assumption is satisfied for the data (b) Only constant variance assumption is satisfied for this data (c) Both the normality assumption and constant variance assumption are satisfied for this data (d) Neither the normality assumption nor constant variance assumption are satisfied for this data Which of the following is the point estimate for 4 2 ? (a) -5.817 (b) -1.683 (c) 1.683 (d) 5.817 (e) I need to know 2 and 4 to answer the question I would like to test H 0 : 2 4 0 versus H a : 2 4 0 , which of the following would be the corresponding P-value? (a) 0.0590 (b) 0.0625 (c) 0.1180 (d) 0.1250 (e) I do not have enough information to compute it 4. Which of the following is the MSE (Mean Squared Error) for the Analysis of Variance? (a) 2.089 (b) 7.830 (c) 23.50 (d) 41.780 STAT 211 – 200 EXAM 5 – FORM A SUMMER03 5. Are there significant differences among the true mean folacin content of four brands using =0.05? (a) Since the P-value on the corresponding test is less than 0.05, there are significant differences among four brands. (b) Since the P-value on the corresponding test is more than 0.05, there are significant differences among four brands. (c) Since the P-value on the corresponding test is less than 0.05, there are no significant differences among four brands. (d) Since the P-value on the corresponding test is more than 0.05, there are no significant differences among four brands. 6. Which of the following is the total degrees of freedom for the Analysis of Variance? (a) 20 (b) 21 (c) 23 (d) 24 (e) 25 7. Which of the following is the point estimate for the constant standard deviation in the analysis of variance? (a) 1.330 (b) 1.445 (c) 1.500 (d) 1.610 8. Look at the Tukey’s pairwise comparisons and tell me which of the following is the right conclusion using =0.05? (a) There are no significant differences between those four brands. (b) Only Brands 1 and 4 are significantly different (c) Only Brands 1,2 and 4 are significantly different (d) Only Brands 1,3 and 4 are significantly different (e) Those four brands are significantly different than one another 9. I believe that true averages of Brand 1 and Brand 2 are significantly different than Brand 3 and Brand 4. To test this belief, which of the following null hypothesis that you should use? (a) 1 3 , 1 4 , 2 3 , 2 4 (b) (c) 1 0.5( 3 4 ) , 2 0.5( 3 4 ) 1 2 3 4 10. If you were comparing the true variances if Brand 1 and Brand 2, which of the following would be the corresponding test statistics? (a) I do not have enough information to answer this question (b) 0.7575 (c) 0.8703 11. Which of the following cannot be dependent samples collected for the analysis? (a) The new treatment will be compared to a current treatment by recording the change in cholesterol readings over a 10 week period. The study will involve at most 30 participants. (b) A study was designed to measure the effect of home environment on academic achievement of 12year old students. Since they wanted to control the genetic differences of choosing different people, thirty sets of identical twins were identified. One is assigned to academic and the other one to nonacademic group. (c) Two random samples of 6 rats each exposed to different environments. One sample of rats held in normal environment at 26C, the other sample was held in cold 5C. Blood pressures for rats are recorded for comparison purposes. STAT 211 – 200 EXAM 5 – FORM A SUMMER03 Infestation of crops by insects has long been of great concern to farmers and agricultural scientists. Certain article reports normally distributed data on x=age of cotton plant (days) and y=% damaged squares. The following are the data and corresponding analysis done in MINITAB. X: 9 12 12 15 18 18 21 21 27 30 30 33 Y: 11 12 23 30 29 52 41 65 60 72 84 93 Regressing Y on X Predictor Constant X Coef -19.670 3.2847 S = 9.094 SE Coef 7.524 0.3440 R-Sq = 90.1% Analysis of Variance Source DF Regression 1 Residual Error 10 Total 11 T -2.61 9.55 P 0.026 0.000 R-Sq(adj) = 89.1% SS 7541.7 827.0 8368.7 MS 7541.7 82.7 F 91.19 P 0.000 Scatter plot of x versus y Y 90+ 60+ 30+ 0+ x x x x x x x x x x x x ------+---------+---------+---------+---------+---------+X 10.0 15.0 20.0 25.0 30.0 35.0 Answer the following 9 questions using this information. 12. Which of the following is a possible relationship between x and y? (a) They are positively related (b) They are negatively related (c) They are unrelated 13. Which of the following is the regression equation for regressing y on x? (a) X=-19.670+3.2847Y (b) X=3.2847-19.670Y (c) Y=-19.670+3.2847X (d) Y=3.2847-19.670X STAT 211 – 200 EXAM 5 – FORM A SUMMER03 14. What is the predicted % damaged squares when the age of a cotton plant is 20 days? (a) There is no way to predict this with the given information (b) -390.1153 (c) 42.7393 (d) 46.024 (e) 49.3087 15. What is the corresponding residual in the regression analysis when the age of a cotton plant is 15 days? (a) There is no way to compute this with the given information (b) -29.6005 (c) -0.3995 (d) 0.3995 (e) 29.6005 16. Suppose previously believed that the expected change in % damaged squares is 3.5 with 1 day increase in the age of the cotton plant. Which of the following is the test statistics to see if data support this belief? (a) -0.6259 (b) -0.3225 (c) 0.3225 (d) 0.6259 (e) There is no way to compute this with the given information 17. If the P-value is computed larger than 0.20 for the belief in the previous question, which of the following conclusions can be achieved using =0.05? (a) The expected change in % damaged squares is 3.5 for 1 day increase in the age of the cotton plant. (b) The expected change in % damaged squares is not 3.5 for 1 day increase in the age of the cotton plant. 18. Would you feel comfortable using the results on the output to predict % damaged squares when the age of a cotton plant is 5 days? (a) Yes (b) No 19. What proportion of the observed variation in % damaged squares can be attributed to the simple linear regression relationship between the age of a cotton plant and the % damaged squares? (a) 0.6590 (b) 0.8118 (c) 0.9010 (d) 0.9492 (e) 0.9743 20. Which of the following is the correlation between x and y? (a) -0.9010 (b) -0.8118 (c) 0.8118 (d) 0.9010 (e) 0.9492 21. Which of the following cannot be right? (a) If you like to explore the linear relationship between two alphanumeric variables, you may use simple linear regression (b) If you like to compare more than two population means, you may use Analysis of Variance (c) If you like to compare two variances, you may use F-test STAT 211 – 200 EXAM 5 – FORM A SUMMER03 (d) If you like to compare more than two population variances, you may use Bartlett’s test if the data is normally distributed A random sample of 5726 telephone numbers from a certain region taken in March 1992 yielded 1105 that were unlisted, and 1 year later a sample of 5384 yielded 980 unlisted numbers. Let p 1 be the true proportion of unlisted numbers in March 1992, p2 be the true proportion of unlisted numbers in March 1993, X be the number of unlisted numbers in the corresponding sample, n is the sample size. The following is the results from MINITAB. Sample X n Sample p 1 1105 5726 0.192979 2 980 5384 0.182021 Estimate for p(1) - p(2): 0.0109586 95% CI for p(1) - p(2): (-0.00355738, 0.0254746) Test for p(1) - p(2) = 0 (vs not = 0): Z = 1.48 P-Value = 0.139 Answer the following 4 questions using this information. 22. What is the point estimate for p1-p2? (a) 0.005 (b) 0.011 (c) 0.139 (d) 0.182 (e) 0.193 23. Is there a difference in true proportions of unlisted numbers between the two years using =0.05? (a) Since the P-value 0.05 on the corresponding test, there is a difference in true proportions of unlisted numbers between the two years (b) Since the P-value 0.05 on the corresponding test, there is no difference in true proportions of unlisted numbers between the two years (c) Since the P-value >0.05 on the corresponding test, there is no difference in true proportions of unlisted numbers between the two years (d) Since 0 does not fall in the corresponding interval, there is no difference in true proportions of unlisted numbers between the two years (e) Since 0 falls in the corresponding interval, there is difference in true proportions of unlisted numbers between the two years 24. Which of the following would be the P-value if you were testing there are more unlisted numbers in March 1992 comparing to March 1993? (a) 0.0348 (b) 0.0695 (c) 0.139 (d) 0.278 25. If I claim that the true proportion of the unlisted numbers in March 1992 is higher than the true proportion of the unlisted numbers in March 1993, which of the following is the valid alternative hypothesis? (a) 1 2 p1 p2 (c) p1 p2 (d) 1 2 (b)