Stat 401 A: HW 6 answers

advertisement
Stat 401 A: HW 6 answers
Residual
-1.0 -0.5 0.0 0.5
1.0
1.5
1) Bacterial load in hamburger patties
a) 1 pt. p = 0.0092
b) 2 pts. estimated multiplicative effect: 5.8, 95% ci is (1.7, 19.8).
Very strong evidence that the treatment has an effect.
The median CFU/gm in control patties is 5.8 times (95% ci: 1.7, 19.8) the median in treated patties.
c) 1 pt. The assumption of equal variances seems reasonable on a log scale. The boxplots shows
approximately the same spread and the sd’s are almost exactly the same in the two groups.
active
control
Treatment
Notes: The p-value comes from a t-test using the pooled sd with log(cfu) as the response variable.
The multiplicative effect is estimated by backtransforming the log-scale estimate of the difference and
its confidence interval. The log-scale estimate is 1.76 (as control – active), so the multiplicative effect is
exp(1.76) = 5.81. The 95% ci on the log scale is (0.543, 2.988), so the ci for the multiplicative effect is
(exp(0.543), exp(2.988)) = (1.72, 19.85), which I rounded to reflect the uncertainty.
SAS and R will give you the negative of all these values because they compute active – control. exp(1.76) is 0.172, so you could report that the median CFU/gm in treated patties is 0.17 times the median in
control patties.
*Need to be consistent in back-scaling; if you exponentiate the multiplicative factor , you should do the
same to the CI.
2) problem 4:30 sunlight protection factor (spf).
Note: I set this question because I wanted another example of interpreting data following a log
transformation. As it turns out, the distribution of the ratios is almost as close to normal as the
distribution of the log ratio. That means you could calculate a confidence interval on the ratio is just as
appropriate as a ci on the log ratio. Both answers are accepted. If the distribution of the ratio was very
skewed, while the log ratio is more symmetric, you should only do inference on the log ratio.
Estimate and ci for the spf, expressed as time with sunscreen / pretreatment time (2 pts):
If you work with the ratio: estimated spf is 9.2 with a 95% ci of (5.6, 12.8)
If you work with the log ratio: estimated spf is 7.4 with a 95% ci of (4.76, 11.47)
Note: log scale difference and 95% ci are 1.998 and (1.56, 2.44)
Any obvious potential confounding variables? (not graded)
The obvious one comes from the description of the study: “Pre treatment” was done first (“(a) before
receiving treatment”) and “sunscreen” done second (“(b) after receiving a particular sunscreen
treatment”). Any sensitization (or tanning, or ???) from the first exposure could effect the results of the
second.
You might also think of lots of reasons for variability among individuals: different skin color, different
degree of tan that might affect the magnitude of the ratio.
1.0
5
1.5
10
2.0
15
2.5
20
3.0
Note: The box plots for the ratio and the log ratio are:
Ratio
Log Ratio
The ratio looks a bit skewed, but it is not strongly so and the sample size is small.
3) problem 5:17 (incomplete ANOVA) p-value column and question about evidence of difference
removed from assignment.
Completed table: (3 pts)
Source
df
SS
MS
F
between
7
35,819 5,117 3.50
within
24
35,088 1,462
total
31
70,907
How many groups (1pt): 8
Notes: between df and SS obtained by difference. MS obtained as SS/df for each row. F obtained as MS
between / MS within. between df is k-1, where k is the number of groups, so # groups = 7 + 1.
4) problem 5:18 (fatty acid)
a) evaluation of the six treatment group model (4 pts):
The estimated means are:
Control CPFA150 CPFA300 CPFA450 CPFA50 CPFA600
185.60 171.67 146.67 151.00 168.33 152.33
Residual
-30 -20 -10 0 10
20
30
plot of residuals and means (obtained as predicted values from the fitted model):
150
160
170
Treatment mean
180
Residual
-30 -20 -10 0 10
20
30
boxplot of residuals and days (SAS will also easily produce a scatter plot):
Day1
Day2
Day3
Day4
Day5
Any concerns? No single correct answer here. I would be concerned about:
a) Variability in the control group (with 15 obs and mean > 180) larger than in the other groups.
b) The “methods of this chapter” ignore the day. The box plot suggests the days are not the same.
Either is acceptable for full credit.
Notes: looking at variation between days in just the control group is very striking. In fact, the larger
variation in the control group residuals is because of the day-day variability.
*Note: Most people got marked off in the plots for not plotting the correct amount of days or plotting
against the y values instead of the residuals.
b) 2 pts.
estimated means are:
Group1 Group10
168.3
192.7
Group2
171.7
Group3
146.7
Group4
151.0
Group5
152.3
Group6
157.3
Group7
195.7
Group8
203.3
Group9
179.0
F test: F = 7.80, p < 0.0001 No conclusion needed.
5) problem 5:23 (T rex bones), 4 pts total
-0.4
Residual
-0.2
0.0
0.2
0.4
What to analyze: a plot of residuals vs predicted value for a preliminary analysis of untransformed
values is:
11.2
11.4
11.6
11.8
12.0
12.2
Predicted
I don’t see any issues with unequal variance or outliers, so I would analyze untransformed responses.
Test of equal means: p < 0.0001.
Conclusion: Very strong evidence that at least one bone has a different mean isotopic composition.
Note: Notice how the conclusion was worded. This is important. You can not claim that all bones have
different means. All you know is that all 12 bones do not have the same mean (reject null hypothesis of
equal means). That’s why the conclusion says ‘at least one bone’.
*Note: Most lost points for not providing a reason for using raw or logged variables, or the reason was
not adequate.
Download