STATISTICS 402B

advertisement
Spring 2016
STATISTICS 402B
Sample Exam I Questions
1. Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health
hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water
from various locations of a stream. Do the data suggest that the true average concentration of zinc in the
bottom water exceeds that of surface water? The data are given below aggregated:
Location
1
2
3
4
5
6
7
8
9
10
Bottom
.430
.266
.567
.531
.707
.716
.651
.589
.469
.723
Surface
.415
.238
.390
.610
.605
.663
.632
.523
.411
.612
Difference d
.015
.028
.177
-.079
.102
.053
.019
.066
.058
.111
P
d = .55
P 2
d = .072194
Let µd = µbottom − µsurf ace be the mean difference in zinc concentration of the two populations. Assume
that the differences di have an approximately Normal distribution with mean µd and variance σd2 .
(a) Calculate the estimate sd of σd2 . (typo here)
(b) State the null and alternative hypothesis in terms of µd needed to test the research hypothesis that the mean zinc concentration in the bottom water exceeds that of surface
water.
(c) Compute a t-statistic to test the above hypotheses.
(d)
Find bounds for the p-value (using the t-table) and use it make a decision using α = 0.05.
1
(e)
Construct a 90% confidence interval for µd .
(f)
Base on this interval would you reject the null hypothesis in part (b)? Explain how you
arrived at your conclusion. What is the α-level associated with this test?
2. An investigation was conducted into the dust content in the flue gases of two types of solid-fuel boilers.
Thirteen boilers of type A and nine boilers of type B were used under different fuels (coal, wood, straw,
etc.) at various temperature settings. Over a period of time, following quantities of dust, in grams, were
deposited in similar dust traps inserted in each of the 22 flues. A partial JMP output of an analysis of this
data is given below:
Boiler
type A
type B
72.1
51.0
58.4
39.3
82.1
52.8
67.2
58.8
Dust Content (in grams)
76.7 75.1 48.0 53.3 55.5
41.2 66.6 46.0 56.4 58.9
61.5
60.6
55.2
63.1
Let µA and µB represent mean dust content of gases from the boiler types A and B, respectively.
(a) Examining the output above, find evidence to support the assumption of equal population
variances. Explain.
2
(b) Under the assumption of equal variances, calculate an estimate of the common error variance
σ2.
(c) State the appropriate null and alternative hypotheses to determine if the mean dust content
of gases from the boiler type B was lower.
(d) Calculate a t-statistic to test the hypotheses in part (c)(must use the pooled variance t-test).
(e) Approximate a p-value associated with the above t-statistic, and make your decision using
α = .05.
(f) Construct a 95% confidence interval for µA − µB . Show work.
(g) It is observed that there is a large variation in the dust collected from the boilers of each
type, due to the differences in fuelling and conditions under which the boilers were used,
leading to an estimate of the difference that is too wide to be useful. Explain briefly how
the study may be performed as a paired experiment to improve accuracy of this estimate?
3. An experiment is performed on individuals with elevated serum cholesterol. The individuals are treated
with Lipitor . The dosage levels are (None, Low and High). A dosage of ”None” is a placebo pill that
3
looks like the drug but contains no Lipitor. At the beginning of the study the total cholesterol for each
individual is determined. At the end of three months the total cholesterol for each individual is determined
again. The change in cholesterol is calculated (Three Month Value - Beginning Value) so that a negative
change means the total cholesterol has decreased.
(a) Identify the experimental units, treatments, and the response in this experiment
individuals selected for the experiment, the 3 levels of Lipitor (None, Low, and High), change in
cholesterol after 3 months of treatment
(b) If the experimenters plan to use α = 0.05, to detect a difference in dosage level means of 1.4
standard deviations (i.e., 1.4σ) with a power of 0.9, what is the total number of individuals
needed for the experiment?
Using the sample size tables supplied with α = 0.05, δ/σ = 1.4, and β = .1 and number of groups=3,
we get sample size per group = 14. That is a total of 52 individuals needed.
The experiment is performed with a total of 54 individuals randomly assigned to the three
treatment groups so that the same number of individuals is in each group. Refer to the
JMP output for the Lipitor Experiment.
(c) Can you verify any assumptions about the distributions of the data you made when using
the plot of the cholestrol change by the dosage level? What are these assumptions and do
they appear feasible?
Yes, we can see that each of the samples appear to be a random sample from normal distributions.
Also the spreads or the variances of each sample appear to support the model assumption that these
are the same for the three groups (i.e., homogeneity of variance assumption)
(d) Report the ANOVA table to test the hypothesis that the dosage level means are different
against the alternative that at least one is not different from others. Be sure to report the
appropriate test statistic and the p-value. State your decision, reason for the decision and
a conclusion within the context of the problem. Use α = 0.05.
Source of Variation d.f.
SS
MS
F p − value
Dosage Level
2
5803.259 2901.63 3.8197
0.0285
Error
51 38, 742.222 759.65
Total
53 44, 545.481
Since the p-value is < .05 we reject H0 : µN = µL = µH and conclude that the mean changes in
cholesterol are not all equal at the different dosage levels.
(e) The t value for Fisher’s Least Significant Difference (t0.05,51 ) is 2.008. Compute the LSD
for comparing dosage level means at α = .05
LSD = t.5,51 ·
q
2×759.65
18
= 2.008 × 9.187 = 9.187
(f) Use the connecting letters report constructed using the LSD to indicate which dosage levels
have statistically significant different means and which don’t.
The “None” dosage level (the placebo drug level) has significantly smaller mean change in cholesterol
than those for the two Lipitor levels (“Low” and “High”).
There is no significant difference between the mean changes in cholesterol for the two Lipitor levels
(“Low” and “High”).
(g) Calculate the value for Tukey HSD for comparing dosage level means at α = .05. Use the
Q value from the output.
HSD or TUKEY = 2.41398 × ·
q
2×759.65
18
= 2.41398 × 9.187 = 22.18
(h) Use the connecting letters report constructed using constructed using the Tukeys HSD to
indicate which dosage levels have statistically significant different means and which don’t.
4
In this case, we find that
There is no significant difference between the mean changes in cholesterol for the two Lipitor levels
(“None” and “Low”) as well as no significant difference between the mean changes in cholesterol for
the two Lipitor levels (“Low” and “High”)
There is a significant difference between the mean changes in cholesterol for the two Lipitor levels
(“None” and “High”).
(i) What can you conclude from the plots in the distribution of residuals.
By looking at all three plots there is no evidence to suggest that the residuals do not follow a normal
distribution.
4. A recent study that examined chocolate’s effect on blood vessel function in healthy people 11 people received
46 grams of naturally flavonoid-rich dark chocolate every day for two weeks, while a group of 10 people
received a placebo consisting of dark chocolate with low flavonoid content. Participants had their vascular
health measured (by means of flow-mediated dilation) before and after the two-week period. An increase
over the two-week period indicated greater vascular health.
(a) What makes this an experiment instead of an observational study? Explain.
The levels of the factor under study were selecteed by the experimenter for the comparative study.
(b) Identify the experimental unit in this experiment.
a healthy individual
(c) What is the factor under study? What are the treatments (levels of the factor)?
Factor is chocolate and the the two levels are: low flavonoid (control) and high flavonoid.
(d) What is the response being measured?
vascular health measured (by means of flow-mediated dilation)
(e) Randomization is not mentioned in the description above. Describe how should randomization have
been used in this experiment?
The randomization is done in two steps: first allocate 11 people randomly to receive the low flavonoid
chocolate and the rest to receive high low flavonoid chocolate; second randomize the order in which
vascular health measured ( i.e who is measured first, who’s next etc.) for the entire 21 people.
(f) Are replications being used in this experiment? Explain how.
Yes, because each treatment is given to more than one person. The low flavonoid is replicated 11
times, and the high flavonoid 10 times
(g) What are the sources of random error variation in this experiment? Explain briefly.
How each person metabolizes the chocolate they eat might be the chief sourse of variation here although
differences in the participants diets and their lifestyle (say, exercise or not) may be major sources of
how the vascular health varies over the two-week period.
(h) How would you estimate the error variance from the data? (Be specific. You may use symbols to
denote sample data).
Calculate the sample variance for data from each treatment and the find the pooled variance estimate
s2p as done in class for two-sample t-tests.
5
Download