Spring 2016 STATISTICS 402B Sample Exam I Questions 1. Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water from various locations of a stream. Do the data suggest that the true average concentration of zinc in the bottom water exceeds that of surface water? The data are given below aggregated: Location 1 2 3 4 5 6 7 8 9 10 Bottom .430 .266 .567 .531 .707 .716 .651 .589 .469 .723 Surface .415 .238 .390 .610 .605 .663 .632 .523 .411 .612 Difference d .015 .028 .177 -.079 .102 .053 .019 .066 .058 .111 P d = .55 P 2 d = .072194 Let µd = µbottom − µsurf ace be the mean difference in zinc concentration of the two populations. Assume that the differences di have an approximately Normal distribution with mean µd and variance σd2 . (a) Calculate the estimate sd of σd2 . (typo here) (b) State the null and alternative hypothesis in terms of µd needed to test the research hypothesis that the mean zinc concentration in the bottom water exceeds that of surface water. (c) Compute a t-statistic to test the above hypotheses. (d) Find bounds for the p-value (using the t-table) and use it make a decision using α = 0.05. 1 (e) Construct a 90% confidence interval for µd . (f) Base on this interval would you reject the null hypothesis in part (b)? Explain how you arrived at your conclusion. What is the α-level associated with this test? 2. An investigation was conducted into the dust content in the flue gases of two types of solid-fuel boilers. Thirteen boilers of type A and nine boilers of type B were used under different fuels (coal, wood, straw, etc.) at various temperature settings. Over a period of time, following quantities of dust, in grams, were deposited in similar dust traps inserted in each of the 22 flues. A partial JMP output of an analysis of this data is given below: Boiler type A type B 72.1 51.0 58.4 39.3 82.1 52.8 67.2 58.8 Dust Content (in grams) 76.7 75.1 48.0 53.3 55.5 41.2 66.6 46.0 56.4 58.9 61.5 60.6 55.2 63.1 Let µA and µB represent mean dust content of gases from the boiler types A and B, respectively. (a) Examining the output above, find evidence to support the assumption of equal population variances. Explain. 2 (b) Under the assumption of equal variances, calculate an estimate of the common error variance σ2. (c) State the appropriate null and alternative hypotheses to determine if the mean dust content of gases from the boiler type B was lower. (d) Calculate a t-statistic to test the hypotheses in part (c)(must use the pooled variance t-test). (e) Approximate a p-value associated with the above t-statistic, and make your decision using α = .05. (f) Construct a 95% confidence interval for µA − µB . Show work. (g) It is observed that there is a large variation in the dust collected from the boilers of each type, due to the differences in fuelling and conditions under which the boilers were used, leading to an estimate of the difference that is too wide to be useful. Explain briefly how the study may be performed as a paired experiment to improve accuracy of this estimate? 3. An experiment is performed on individuals with elevated serum cholesterol. The individuals are treated with Lipitor . The dosage levels are (None, Low and High). A dosage of ”None” is a placebo pill that 3 looks like the drug but contains no Lipitor. At the beginning of the study the total cholesterol for each individual is determined. At the end of three months the total cholesterol for each individual is determined again. The change in cholesterol is calculated (Three Month Value - Beginning Value) so that a negative change means the total cholesterol has decreased. (a) Identify the experimental units, treatments, and the response in this experiment individuals selected for the experiment, the 3 levels of Lipitor (None, Low, and High), change in cholesterol after 3 months of treatment (b) If the experimenters plan to use α = 0.05, to detect a difference in dosage level means of 1.4 standard deviations (i.e., 1.4σ) with a power of 0.9, what is the total number of individuals needed for the experiment? Using the sample size tables supplied with α = 0.05, δ/σ = 1.4, and β = .1 and number of groups=3, we get sample size per group = 14. That is a total of 52 individuals needed. The experiment is performed with a total of 54 individuals randomly assigned to the three treatment groups so that the same number of individuals is in each group. Refer to the JMP output for the Lipitor Experiment. (c) Can you verify any assumptions about the distributions of the data you made when using the plot of the cholestrol change by the dosage level? What are these assumptions and do they appear feasible? Yes, we can see that each of the samples appear to be a random sample from normal distributions. Also the spreads or the variances of each sample appear to support the model assumption that these are the same for the three groups (i.e., homogeneity of variance assumption) (d) Report the ANOVA table to test the hypothesis that the dosage level means are different against the alternative that at least one is not different from others. Be sure to report the appropriate test statistic and the p-value. State your decision, reason for the decision and a conclusion within the context of the problem. Use α = 0.05. Source of Variation d.f. SS MS F p − value Dosage Level 2 5803.259 2901.63 3.8197 0.0285 Error 51 38, 742.222 759.65 Total 53 44, 545.481 Since the p-value is < .05 we reject H0 : µN = µL = µH and conclude that the mean changes in cholesterol are not all equal at the different dosage levels. (e) The t value for Fisher’s Least Significant Difference (t0.05,51 ) is 2.008. Compute the LSD for comparing dosage level means at α = .05 LSD = t.5,51 · q 2×759.65 18 = 2.008 × 9.187 = 9.187 (f) Use the connecting letters report constructed using the LSD to indicate which dosage levels have statistically significant different means and which don’t. The “None” dosage level (the placebo drug level) has significantly smaller mean change in cholesterol than those for the two Lipitor levels (“Low” and “High”). There is no significant difference between the mean changes in cholesterol for the two Lipitor levels (“Low” and “High”). (g) Calculate the value for Tukey HSD for comparing dosage level means at α = .05. Use the Q value from the output. HSD or TUKEY = 2.41398 × · q 2×759.65 18 = 2.41398 × 9.187 = 22.18 (h) Use the connecting letters report constructed using constructed using the Tukeys HSD to indicate which dosage levels have statistically significant different means and which don’t. 4 In this case, we find that There is no significant difference between the mean changes in cholesterol for the two Lipitor levels (“None” and “Low”) as well as no significant difference between the mean changes in cholesterol for the two Lipitor levels (“Low” and “High”) There is a significant difference between the mean changes in cholesterol for the two Lipitor levels (“None” and “High”). (i) What can you conclude from the plots in the distribution of residuals. By looking at all three plots there is no evidence to suggest that the residuals do not follow a normal distribution. 4. A recent study that examined chocolate’s effect on blood vessel function in healthy people 11 people received 46 grams of naturally flavonoid-rich dark chocolate every day for two weeks, while a group of 10 people received a placebo consisting of dark chocolate with low flavonoid content. Participants had their vascular health measured (by means of flow-mediated dilation) before and after the two-week period. An increase over the two-week period indicated greater vascular health. (a) What makes this an experiment instead of an observational study? Explain. The levels of the factor under study were selecteed by the experimenter for the comparative study. (b) Identify the experimental unit in this experiment. a healthy individual (c) What is the factor under study? What are the treatments (levels of the factor)? Factor is chocolate and the the two levels are: low flavonoid (control) and high flavonoid. (d) What is the response being measured? vascular health measured (by means of flow-mediated dilation) (e) Randomization is not mentioned in the description above. Describe how should randomization have been used in this experiment? The randomization is done in two steps: first allocate 11 people randomly to receive the low flavonoid chocolate and the rest to receive high low flavonoid chocolate; second randomize the order in which vascular health measured ( i.e who is measured first, who’s next etc.) for the entire 21 people. (f) Are replications being used in this experiment? Explain how. Yes, because each treatment is given to more than one person. The low flavonoid is replicated 11 times, and the high flavonoid 10 times (g) What are the sources of random error variation in this experiment? Explain briefly. How each person metabolizes the chocolate they eat might be the chief sourse of variation here although differences in the participants diets and their lifestyle (say, exercise or not) may be major sources of how the vascular health varies over the two-week period. (h) How would you estimate the error variance from the data? (Be specific. You may use symbols to denote sample data). Calculate the sample variance for data from each treatment and the find the pooled variance estimate s2p as done in class for two-sample t-tests. 5