Spring 2016 STATISTICS 402B Sample Exam I Questions 1. Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water from various locations of a stream. Do the data suggest that the true average concentration of zinc in the bottom water exceeds that of surface water? The data are given below aggregated: Location 1 2 3 4 5 6 7 8 9 10 Bottom .430 .266 .567 .531 .707 .716 .651 .589 .469 .723 Surface .415 .238 .390 .610 .605 .663 .632 .523 .411 .612 Difference d .015 .028 .177 -.079 .102 .053 .019 .066 .058 .111 P d = .55 P 2 d = .072194 Let µd = µbottom − µsurf ace be the mean difference in zinc concentration of the two populations. Assume that the differences di have an approximately Normal distribution with mean µd and variance σd2 . (a) Calculate the estimate sd of σd2 . (b) State the null and alternative hypothesis in terms of µd needed to test the research hypothesis that the mean zinc concentration in the bottom water exceeds that of surface water. (c) Compute a t-statistic to test the above hypotheses. (d) Find bounds for the p-value (using the t-table) and use it make a decision using α = 0.05. (e) Construct a 90% confidence interval for µd . (f) Base on this interval would you reject the null hypothesis in part (b)? Explain how you arrived at your conclusion. What is the α-level associated with this test? 2. An investigation was conducted into the dust content in the flue gases of two types of solid-fuel boilers. Thirteen boilers of type A and nine boilers of type B were used under different fuels (coal, wood, straw, etc.) at various temperature settings. Over a period of time, following quantities of dust, in grams, were deposited in similar dust traps inserted in each of the 22 flues. A partial JMP output of an analysis of this data is given below: Boiler type A type B 72.1 51.0 58.4 39.3 82.1 52.8 67.2 58.8 Dust Content (in grams) 76.7 75.1 48.0 53.3 55.5 41.2 66.6 46.0 56.4 58.9 1 61.5 60.6 55.2 63.1 Let µA and µB represent mean dust content of gases from the boiler types A and B, respectively. (a) Examining the output above, find evidence to support the assumption of equal population variances. Explain. (b) Under the assumption of equal variances, calculate an estimate of the common error variance σ2. (c) State the appropriate null and alternative hypotheses to determine if the mean dust content of gases from the boiler type B was lower. (d) Calculate a t-statistic to test the hypotheses in part (c)(must use the pooled variance t-test). (e) Approximate a p-value associated with the above t-statistic, and make your decision using α = .05. (f) Construct a 95% confidence interval for µA − µB . Show work. (g) It is observed that there is a large variation in the dust collected from the boilers of each type, due to the differences in fuelling and conditions under which the boilers were used, leading to an estimate of the difference that is too wide to be useful. Explain briefly how the study may be performed as a paired experiment to improve accuracy of this estimate? 3. An experiment is performed on individuals with elevated serum cholesterol. The individuals are treated with Lipitor . The dosage levels are (None, Low and High). A dosage of ”None” is a placebo pill that looks like the drug but contains no Lipitor. At the beginning of the study the total cholesterol for each individual is determined. At the end of three months the total cholesterol for each individual is determined again. The change in cholesterol is calculated (Three Month Value - Beginning Value) so that a negative change means the total cholesterol has decreased. (a) Identify the experimental units, treatments, and the response in this experiment (b) If the experimenters plan to use α = 0.05, to detect a difference in dosage level means of 1.4 standard deviations (i.e., 1.4σ) with a power of 0.9, what is the total number of individuals needed for the experiment? 2 The experiment is performed with a total of 54 individuals randomly assigned to the three treatment groups so that the same number of individuals is in each group. Refer to the JMP output for the Lipitor Experiment. (c) Can you verify any assumptions about the distributions of the data you made when using the plot of the cholestrol change by the dosage level? What are these assumptions and do they appear feasible? (d) Report the ANOVA table to test the hypothesis that the dosage level means are different against the alternative that at least one is not different from others. Be sure to report the appropriate test statistic and the p-value. State your decision, reason for the decision and a conclusion within the context of the problem. Use α = 0.05. (e) The t value for Fisher’s Least Significant Difference (t0.05,51 ) is 2.008. Compute the LSD for comparing dosage level means at α = .05 (f) Use the connecting letters report constructed using the LSD to indicate which dosage levels have statistically significant different means and which don’t. (g) Calculate the value for Tukey HSD for comparing dosage level means at α = .05. Use the Q value from the output. (h) Use the connecting letters report constructed using constructed using the Tukeys HSD to indicate which dosage levels have statistically significant different means and which don’t. (i) What can you conclude from the plots in the distribution of residuals. 4. A recent study that examined chocolate’s effect on blood vessel function in healthy people 11 people received 46 grams of naturally flavonoid-rich dark chocolate every day for two weeks, while a group of 10 people received a placebo consisting of dark chocolate with low flavonoid content. Participants had their vascular health measured (by means of flow-mediated dilation) before and after the two-week period. An increase over the two-week period indicated greater vascular health. (a) What makes this an experiment instead of an observational study? Explain. (b) Identify the experimental unit in this experiment. (c) What is the factor under study? What are the treatments (levels of the factor)? (d) What is the response being measured? (e) Randomization is not mentioned in the description above. Describe how should randomization have been used in this experiment? (f) Are replications being used in this experiment? Explain how. (g) What are the sources of random error variation in this experiment? Explain briefly. (h) How would you estimate the error variance from the data? (Be specific. You may use symbols to denote sample data). 3 Lipitorī Experiment Cholesterol Change by Dosage Level Analysis of Variance Source DF Model Error C. Total 2 51 53 Sum of Squares 5803.259 38742.222 44545.481 Mean Square F Ratio 2901.63 759.65 3.8197 Prob > F 0.0285* Effect Tests Source Nparm DF 2 2 Level Sum of Squares 5803.2593 F Ratio Prob > F 3.8197 0.0285* Effect Details Level Least Squares Means Table Level Least Sq Mean -32.00000 -29.55556 -8.88889 H L N Std Error Mean 6.4963726 6.4963726 6.4963726 -32.000 -29.556 -8.889 LSMeans Differences Student's t α= Level N L H 0.050 t= 2.00758 A B B Levels not connected by same letter are significantly different. Least Sq Mean -8.88889 -29.55556 -32.00000 LSMeans Differences Tukey HSD α= Level N L H 0.050 Q= 2.41398 A A B B Least Sq Mean -8.88889 -29.55556 -32.00000 Levels not connected by same letter are significantly different. Distribution of Residuals