STATISTICS 402B

advertisement
Spring 2016
STATISTICS 402B
Sample Exam I Questions
1. Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health
hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water
from various locations of a stream. Do the data suggest that the true average concentration of zinc in the
bottom water exceeds that of surface water? The data are given below aggregated:
Location
1
2
3
4
5
6
7
8
9
10
Bottom
.430
.266
.567
.531
.707
.716
.651
.589
.469
.723
Surface
.415
.238
.390
.610
.605
.663
.632
.523
.411
.612
Difference d
.015
.028
.177
-.079
.102
.053
.019
.066
.058
.111
P
d = .55
P 2
d = .072194
Let µd = µbottom − µsurf ace be the mean difference in zinc concentration of the two populations. Assume
that the differences di have an approximately Normal distribution with mean µd and variance σd2 .
(a) Calculate the estimate sd of σd2 .
(b) State the null and alternative hypothesis in terms of µd needed to test the research hypothesis that the mean zinc concentration in the bottom water exceeds that of surface
water.
(c) Compute a t-statistic to test the above hypotheses.
(d) Find bounds for the p-value (using the t-table) and use it make a decision using α = 0.05.
(e) Construct a 90% confidence interval for µd .
(f) Base on this interval would you reject the null hypothesis in part (b)? Explain how you
arrived at your conclusion. What is the α-level associated with this test?
2. An investigation was conducted into the dust content in the flue gases of two types of solid-fuel boilers.
Thirteen boilers of type A and nine boilers of type B were used under different fuels (coal, wood, straw,
etc.) at various temperature settings. Over a period of time, following quantities of dust, in grams, were
deposited in similar dust traps inserted in each of the 22 flues. A partial JMP output of an analysis of this
data is given below:
Boiler
type A
type B
72.1
51.0
58.4
39.3
82.1
52.8
67.2
58.8
Dust Content (in grams)
76.7 75.1 48.0 53.3 55.5
41.2 66.6 46.0 56.4 58.9
1
61.5
60.6
55.2
63.1
Let µA and µB represent mean dust content of gases from the boiler types A and B, respectively.
(a) Examining the output above, find evidence to support the assumption of equal population
variances. Explain.
(b) Under the assumption of equal variances, calculate an estimate of the common error variance
σ2.
(c) State the appropriate null and alternative hypotheses to determine if the mean dust content
of gases from the boiler type B was lower.
(d) Calculate a t-statistic to test the hypotheses in part (c)(must use the pooled variance t-test).
(e) Approximate a p-value associated with the above t-statistic, and make your decision using
α = .05.
(f) Construct a 95% confidence interval for µA − µB . Show work.
(g) It is observed that there is a large variation in the dust collected from the boilers of each
type, due to the differences in fuelling and conditions under which the boilers were used,
leading to an estimate of the difference that is too wide to be useful. Explain briefly how
the study may be performed as a paired experiment to improve accuracy of this estimate?
3. An experiment is performed on individuals with elevated serum cholesterol. The individuals are treated
with Lipitor . The dosage levels are (None, Low and High). A dosage of ”None” is a placebo pill that
looks like the drug but contains no Lipitor. At the beginning of the study the total cholesterol for each
individual is determined. At the end of three months the total cholesterol for each individual is determined
again. The change in cholesterol is calculated (Three Month Value - Beginning Value) so that a negative
change means the total cholesterol has decreased.
(a) Identify the experimental units, treatments, and the response in this experiment
(b) If the experimenters plan to use α = 0.05, to detect a difference in dosage level means of 1.4
standard deviations (i.e., 1.4σ) with a power of 0.9, what is the total number of individuals
needed for the experiment?
2
The experiment is performed with a total of 54 individuals randomly assigned to the three treatment
groups so that the same number of individuals is in each group. Refer to the JMP output for the
Lipitor Experiment.
(c) Can you verify any assumptions about the distributions of the data you made when using
the plot of the cholestrol change by the dosage level? What are these assumptions and do
they appear feasible?
(d) Report the ANOVA table to test the hypothesis that the dosage level means are different
against the alternative that at least one is not different from others. Be sure to report the
appropriate test statistic and the p-value. State your decision, reason for the decision and
a conclusion within the context of the problem. Use α = 0.05.
(e) The t value for Fisher’s Least Significant Difference (t0.05,51 ) is 2.008. Compute the LSD
for comparing dosage level means at α = .05
(f) Use the connecting letters report constructed using the LSD to indicate which dosage levels
have statistically significant different means and which don’t.
(g) Calculate the value for Tukey HSD for comparing dosage level means at α = .05. Use the
Q value from the output.
(h) Use the connecting letters report constructed using constructed using the Tukeys HSD to
indicate which dosage levels have statistically significant different means and which don’t.
(i) What can you conclude from the plots in the distribution of residuals.
4. A recent study that examined chocolate’s effect on blood vessel function in healthy people 11 people received
46 grams of naturally flavonoid-rich dark chocolate every day for two weeks, while a group of 10 people
received a placebo consisting of dark chocolate with low flavonoid content. Participants had their vascular
health measured (by means of flow-mediated dilation) before and after the two-week period. An increase
over the two-week period indicated greater vascular health.
(a) What makes this an experiment instead of an observational study? Explain.
(b) Identify the experimental unit in this experiment.
(c) What is the factor under study? What are the treatments (levels of the factor)?
(d) What is the response being measured?
(e) Randomization is not mentioned in the description above. Describe how should randomization have
been used in this experiment?
(f) Are replications being used in this experiment? Explain how.
(g) What are the sources of random error variation in this experiment? Explain briefly.
(h) How would you estimate the error variance from the data? (Be specific. You may use symbols to
denote sample data).
3
Lipitorīƒ’ Experiment
Cholesterol Change by Dosage Level
Analysis of Variance
Source
DF
Model
Error
C. Total
2
51
53
Sum of
Squares
5803.259
38742.222
44545.481
Mean Square
F Ratio
2901.63
759.65
3.8197
Prob > F
0.0285*
Effect Tests
Source
Nparm
DF
2
2
Level
Sum of
Squares
5803.2593
F Ratio
Prob > F
3.8197
0.0285*
Effect Details
Level
Least Squares Means Table
Level
Least Sq
Mean
-32.00000
-29.55556
-8.88889
H
L
N
Std Error
Mean
6.4963726
6.4963726
6.4963726
-32.000
-29.556
-8.889
LSMeans Differences Student's t
α=
Level
N
L
H
0.050
t=
2.00758
A
B
B
Levels not connected by same letter are significantly different.
Least Sq
Mean
-8.88889
-29.55556
-32.00000
LSMeans Differences Tukey HSD
α=
Level
N
L
H
0.050
Q=
2.41398
A
A B
B
Least Sq
Mean
-8.88889
-29.55556
-32.00000
Levels not connected by same letter are significantly different.
Distribution of Residuals 
Download