Stat 401 A: HW 6 answers Residual -1.0 -0.5 0.0 0.5 1.0 1.5 1) Bacterial load in hamburger patties a) 1 pt. p = 0.0092 b) 2 pts. estimated multiplicative effect: 5.8, 95% ci is (1.7, 19.8). Very strong evidence that the treatment has an effect. The median CFU/gm in control patties is 5.8 times (95% ci: 1.7, 19.8) the median in treated patties. c) 1 pt. The assumption of equal variances seems reasonable on a log scale. The boxplots shows approximately the same spread and the sd’s are almost exactly the same in the two groups. active control Treatment Notes: The p-value comes from a t-test using the pooled sd with log(cfu) as the response variable. The multiplicative effect is estimated by backtransforming the log-scale estimate of the difference and its confidence interval. The log-scale estimate is 1.76 (as control – active), so the multiplicative effect is exp(1.76) = 5.81. The 95% ci on the log scale is (0.543, 2.988), so the ci for the multiplicative effect is (exp(0.543), exp(2.988)) = (1.72, 19.85), which I rounded to reflect the uncertainty. SAS and R will give you the negative of all these values because they compute active – control. exp(1.76) is 0.172, so you could report that the median CFU/gm in treated patties is 0.17 times the median in control patties. *Need to be consistent in back-scaling; if you exponentiate the multiplicative factor , you should do the same to the CI. 2) problem 4:30 sunlight protection factor (spf). Note: I set this question because I wanted another example of interpreting data following a log transformation. As it turns out, the distribution of the ratios is almost as close to normal as the distribution of the log ratio. That means you could calculate a confidence interval on the ratio is just as appropriate as a ci on the log ratio. Both answers are accepted. If the distribution of the ratio was very skewed, while the log ratio is more symmetric, you should only do inference on the log ratio. Estimate and ci for the spf, expressed as time with sunscreen / pretreatment time (2 pts): If you work with the ratio: estimated spf is 9.2 with a 95% ci of (5.6, 12.8) If you work with the log ratio: estimated spf is 7.4 with a 95% ci of (4.76, 11.47) Note: log scale difference and 95% ci are 1.998 and (1.56, 2.44) Any obvious potential confounding variables? (not graded) The obvious one comes from the description of the study: “Pre treatment” was done first (“(a) before receiving treatment”) and “sunscreen” done second (“(b) after receiving a particular sunscreen treatment”). Any sensitization (or tanning, or ???) from the first exposure could effect the results of the second. You might also think of lots of reasons for variability among individuals: different skin color, different degree of tan that might affect the magnitude of the ratio. 1.0 5 1.5 10 2.0 15 2.5 20 3.0 Note: The box plots for the ratio and the log ratio are: Ratio Log Ratio The ratio looks a bit skewed, but it is not strongly so and the sample size is small. 3) problem 5:17 (incomplete ANOVA) p-value column and question about evidence of difference removed from assignment. Completed table: (3 pts) Source df SS MS F between 7 35,819 5,117 3.50 within 24 35,088 1,462 total 31 70,907 How many groups (1pt): 8 Notes: between df and SS obtained by difference. MS obtained as SS/df for each row. F obtained as MS between / MS within. between df is k-1, where k is the number of groups, so # groups = 7 + 1. 4) problem 5:18 (fatty acid) a) evaluation of the six treatment group model (4 pts): The estimated means are: Control CPFA150 CPFA300 CPFA450 CPFA50 CPFA600 185.60 171.67 146.67 151.00 168.33 152.33 Residual -30 -20 -10 0 10 20 30 plot of residuals and means (obtained as predicted values from the fitted model): 150 160 170 Treatment mean 180 Residual -30 -20 -10 0 10 20 30 boxplot of residuals and days (SAS will also easily produce a scatter plot): Day1 Day2 Day3 Day4 Day5 Any concerns? No single correct answer here. I would be concerned about: a) Variability in the control group (with 15 obs and mean > 180) larger than in the other groups. b) The “methods of this chapter” ignore the day. The box plot suggests the days are not the same. Either is acceptable for full credit. Notes: looking at variation between days in just the control group is very striking. In fact, the larger variation in the control group residuals is because of the day-day variability. *Note: Most people got marked off in the plots for not plotting the correct amount of days or plotting against the y values instead of the residuals. b) 2 pts. estimated means are: Group1 Group10 168.3 192.7 Group2 171.7 Group3 146.7 Group4 151.0 Group5 152.3 Group6 157.3 Group7 195.7 Group8 203.3 Group9 179.0 F test: F = 7.80, p < 0.0001 No conclusion needed. 5) problem 5:23 (T rex bones), 4 pts total -0.4 Residual -0.2 0.0 0.2 0.4 What to analyze: a plot of residuals vs predicted value for a preliminary analysis of untransformed values is: 11.2 11.4 11.6 11.8 12.0 12.2 Predicted I don’t see any issues with unequal variance or outliers, so I would analyze untransformed responses. Test of equal means: p < 0.0001. Conclusion: Very strong evidence that at least one bone has a different mean isotopic composition. Note: Notice how the conclusion was worded. This is important. You can not claim that all bones have different means. All you know is that all 12 bones do not have the same mean (reject null hypothesis of equal means). That’s why the conclusion says ‘at least one bone’. *Note: Most lost points for not providing a reason for using raw or logged variables, or the reason was not adequate.