STAT 410/511 Exam 1 October 19, 2011 Solutions 100 points Name: 1. Researchers in Finland selected 1409 people at random from the survivors of a previous “FINMONICA” study. They interviewed them about diet and coffee drinking, and followed them over an average of 21 years to see if they developed symptoms of dementia and Alzheimer’s disease. The scientists report a 65% decrease in the risk of dementia for those who drank 3 to 5 cups of coffee per day (relative to those who drank 0 to 2 cups per day). We’ll assume that they are reporting this decrease based on “convincing evidence” with a small p-value. What is the scope of their inference? (12 pts) The subjects were randomly selected from people who are still alive and were part of the FINMONICA study, so we can make inference back to all those who participated in that study. If the FINMONICA study was a random sample of all Finns (or some other large group) then inference could extend back to that group as well. Researchers did not assign levels of coffee drinking, but just observed how much people drank. That means that any association between coffee drinking and dementia symptoms is just and association and not a causal connection. 2. Weights ( in grams) of rainbow trout captured by electrofishing on the Ruby river were analyzed based on length classes (length cut into 25mm intervals) and the residual diagnostic plots are shown. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −100 ● ● 11145 100 150 200 Fitted values 250 300 350 ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 Theoretical Quantiles 2 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 11145 7214 ● 6853 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 ● ● ● ● ● ● ● 11145 50 2.0 4 ● ● ● ● 1.5 ● ● 1.0 ● ● ● Standardized residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 0 ● ● ● ● ● ● ● ● ● ● ● −50 Residuals ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ●●●● 0 ● ● ● ● ● Scale−Location 6853 7214 ● ● Standardized residuals 50 ● Normal Q−Q 7214 ● 6853 ● −2 100 Residuals vs Fitted ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 100 150 200 250 300 350 Fitted values Discuss any violations of the assumptions of ANOVA visible in the plots. (12 pts) The first plot shows a definite fan shape in the residuals. We see that spread for the “weight” response is increasing with length class. This violates the “equal variance”q assumption needed to run ANOVA. The same problem shows up in the third plot of |ei | versus fits as a “half fan” with increasing trend. The second plot shows another problem – long tailed distribution of residuals as compared to a normal distribution because we have very small values on the left and large values on the right relative to the line. If we want to continue with an ANOVA we’ll need to have lots of data points and hope that the central limit theorem will take care of the problem. There is not information to evaluate independence from these plots. (Which could be an issue if different electrofishing runs faced different weather and water conditions.) Stat 410/511 Midterm Page 2 3. Barrick and Showers collected data on oxygen isotopic composition for 12 bones (each measured three or more times) from a single Tyrannosaurus rex specimen. They wanted to see if the means are equal for the 12 bones because that helps answer the question of dinosaurs being warm or cold blooded. (a) Using the model yij = µi + ij for i = 1, . . . , 12 and j = 1, . . . , ni , express the null and alternative hypotheses in terms of model parameters. (The usual hypotheses for an ANOVA setting). (8 pts) H0 : µ1 = µ2 = · · · = µ12 versus Ha : not all means are equal. 12.0 (b) The data are plotted below and we have a partial anova table. 11.0 11.5 ● 1 2 3 4 5 6 7 8 9 10 11 12 Df Sum Sq Mean Sq F value Pr(>F) bone 11 6.07 0.552 7.432 0.0001 Residuals 40 2.97 0.07425 Total Fill in one blank at a time below: i. Df for bone group in line 1. 11(2 pts) ii. Total Df 51(2 pts) iii. Total Sum Sq 9.04(2 pts) iv. Mean Sq for line 1 0.(2 pts) v. Mean Sq for line 2 0.07425(2 pts) vi. F value for line 1 7.432(2 pts) vii. Under one hypothesis, we know the distribution of F. Which hypothesis, and what is that distribution? (6 pts) H0 , F11,40 Stat 410/511 Midterm Page 3 (c) State your conclusions based on the above F test. (8 pts) We have very strong (convincing) evidence that mean oxygen compositions differ from bone to bone and that the means are not all the same. (d) The bones can be subdivided into four groups according to proximity to the body core. The warm/cold blooded question involves differences between these four groups. i. We want to use an extra sum of squares F test to compare the four groups model to a model with one mean. The SSE for a four means model is 7.16. Find the Extra Sum of Squares and the top of the fraction. (8 pts) Correction, Oct 20, 2 pm What I had printed is a solution to a different question. I compared the 4 means (reduced) model to the 12 means (full) model and concluded that 12 means was quite a bit better than 4. That’s not what the exam asks. I was supposed to compare 4 means (as full model) to 1 mean (reduced model) ESS = 9.04 − 7.16 = 1.88 Numerator = 1.88/(4 − 1) = 0.627 ii. Compute the bottom of the fraction (show work). (5 pts) Bottom of the fraction: MSE = 7.16/(n − 4) = 7.16/48 = 0.1492 iii. Compute the F statistic and give its degrees of freedom (show work): (5 pts) Fstat = 0.627/0.1492 = 4.2 on 3, 48 df iv. The p-value is 0.010. State your conclusion. (5 pts) We have strong evidence that the 1 mean model is not adequate compared to the 4 means model. (e) Is there a problem with measuring each bone multiple times? Discuss in terms of the assumptions for ANOVA. (5 pts) Yes. The repeated measurements within a bone are not necessarily independent. I would expect samples of bone taken closer together to be more similar than those taken further apart. (spatial correlation) 4. Consider two-sample t-procedures applied to log transformed data. (a) Draw a side-by-side boxplot of data which need log transformation. Describe two characteristics we observe in such a plot which tell us logs are needed. (6 pts) Plot needs to show that the group with larger median also has more spread, and values are positive. (b) In the cloud seeding example of Sleuth §3.5, we estimated that seeding was associated with an increase of 1.14 in the log scale (SE = 0.45) with a 95% confidence interval for the difference in log means of (0.24, 2.05). Interpret this interval on the original scale (in acre-feet). (4 pts) We are 95% confident that the true median precipitation is 1.27 to 7.77 times larger for seeded compared to unseeded days. (c) What do we mean when we say we have 95% confidence in an interval? (4 pts) Stat 410/511 Midterm Page 4 Our confidence is in the process. When we repeat the procedure of gathering data (at random) and building the interval many, many times, 95% of such intervals will contain their true parameter.