STAT 410/511 Exam 1

advertisement
STAT 410/511
Exam 1
October 19, 2011 Solutions
100 points
Name:
1. Researchers in Finland selected 1409 people at random from the survivors of a previous
“FINMONICA” study. They interviewed them about diet and coffee drinking, and followed them over an average of 21 years to see if they developed symptoms of dementia
and Alzheimer’s disease. The scientists report a 65% decrease in the risk of dementia for
those who drank 3 to 5 cups of coffee per day (relative to those who drank 0 to 2 cups per
day). We’ll assume that they are reporting this decrease based on “convincing evidence”
with a small p-value. What is the scope of their inference?
(12 pts)
The subjects were randomly selected from people who are still alive and were part of the
FINMONICA study, so we can make inference back to all those who participated in that
study. If the FINMONICA study was a random sample of all Finns (or some other large
group) then inference could extend back to that group as well. Researchers did not assign
levels of coffee drinking, but just observed how much people drank. That means that any
association between coffee drinking and dementia symptoms is just and association and
not a causal connection.
2. Weights ( in grams) of rainbow trout captured by electrofishing on the Ruby river were analyzed based on length classes (length cut into 25mm intervals) and the residual diagnostic
plots are shown.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−100
●
● 11145
100
150
200
Fitted values
250
300
350
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
−1
0
1
Theoretical Quantiles
2
3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● 11145
7214
● 6853
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3
●
●
●
●
●
●
● 11145
50
2.0
4
●
●
●
●
1.5
●
●
1.0
●
●
●
Standardized residuals
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
0.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4
0
●
●
●
●
●
●
●
●
●
●
●
−50
Residuals
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●●●●
0
●
●
●
●
●
Scale−Location
6853
7214
● ●
Standardized residuals
50
●
Normal Q−Q
7214
● 6853
●
−2
100
Residuals vs Fitted
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50
100
150
200
250
300
350
Fitted values
Discuss any violations of the assumptions of ANOVA visible in the plots.
(12 pts)
The first plot shows a definite fan shape in the residuals. We see that spread for the
“weight” response is increasing with length class. This violates the “equal variance”q assumption needed to run ANOVA. The same problem shows up in the third plot of |ei |
versus fits as a “half fan” with increasing trend. The second plot shows another problem –
long tailed distribution of residuals as compared to a normal distribution because we have
very small values on the left and large values on the right relative to the line. If we want to
continue with an ANOVA we’ll need to have lots of data points and hope that the central
limit theorem will take care of the problem. There is not information to evaluate independence from these plots. (Which could be an issue if different electrofishing runs faced
different weather and water conditions.)
Stat 410/511 Midterm Page 2
3. Barrick and Showers collected data on oxygen isotopic composition for 12 bones (each
measured three or more times) from a single Tyrannosaurus rex specimen. They wanted
to see if the means are equal for the 12 bones because that helps answer the question of
dinosaurs being warm or cold blooded.
(a) Using the model yij = µi + ij for i = 1, . . . , 12 and j = 1, . . . , ni , express the null
and alternative hypotheses in terms of model parameters. (The usual hypotheses for
an ANOVA setting).
(8 pts)
H0 : µ1 = µ2 = · · · = µ12
versus Ha : not all means are equal.
12.0
(b) The data are plotted below and we have a partial anova table.
11.0
11.5
●
1
2
3
4
5
6
7
8
9
10
11
12
Df Sum Sq Mean Sq F value Pr(>F)
bone
11
6.07
0.552
7.432 0.0001
Residuals 40
2.97 0.07425
Total
Fill in one blank at a time below:
i. Df for bone group in line 1.
11(2 pts)
ii. Total Df
51(2 pts)
iii. Total Sum Sq
9.04(2 pts)
iv. Mean Sq for line 1
0.(2 pts)
v. Mean Sq for line 2
0.07425(2 pts)
vi. F value for line 1
7.432(2 pts)
vii. Under one hypothesis, we know the distribution of F. Which hypothesis, and
what is that distribution?
(6 pts)
H0 , F11,40
Stat 410/511 Midterm Page 3
(c) State your conclusions based on the above F test.
(8 pts)
We have very strong (convincing) evidence that mean oxygen compositions differ from
bone to bone and that the means are not all the same.
(d) The bones can be subdivided into four groups according to proximity to the body
core. The warm/cold blooded question involves differences between these four groups.
i. We want to use an extra sum of squares F test to compare the four groups model
to a model with one mean. The SSE for a four means model is 7.16. Find the
Extra Sum of Squares and the top of the fraction.
(8 pts)
Correction, Oct 20, 2 pm
What I had printed is a solution to a different question. I compared the 4 means
(reduced) model to the 12 means (full) model and concluded that 12 means was
quite a bit better than 4. That’s not what the exam asks. I was supposed to
compare 4 means (as full model) to 1 mean (reduced model)
ESS = 9.04 − 7.16 = 1.88 Numerator = 1.88/(4 − 1) = 0.627
ii. Compute the bottom of the fraction (show work).
(5 pts)
Bottom of the fraction: MSE = 7.16/(n − 4) = 7.16/48 = 0.1492
iii. Compute the F statistic and give its degrees of freedom (show work):
(5 pts)
Fstat = 0.627/0.1492 = 4.2 on 3, 48 df
iv. The p-value is 0.010. State your conclusion.
(5 pts)
We have strong evidence that the 1 mean model is not adequate compared to the
4 means model.
(e) Is there a problem with measuring each bone multiple times? Discuss in terms of the
assumptions for ANOVA.
(5 pts)
Yes. The repeated measurements within a bone are not necessarily independent. I
would expect samples of bone taken closer together to be more similar than those
taken further apart. (spatial correlation)
4. Consider two-sample t-procedures applied to log transformed data.
(a) Draw a side-by-side boxplot of data which need log transformation. Describe two
characteristics we observe in such a plot which tell us logs are needed.
(6 pts)
Plot needs to show that the group with larger median also has more spread, and values
are positive.
(b) In the cloud seeding example of Sleuth §3.5, we estimated that seeding was associated
with an increase of 1.14 in the log scale (SE = 0.45) with a 95% confidence interval
for the difference in log means of (0.24, 2.05). Interpret this interval on the original
scale (in acre-feet).
(4 pts)
We are 95% confident that the true median precipitation is 1.27 to 7.77 times larger
for seeded compared to unseeded days.
(c) What do we mean when we say we have 95% confidence in an interval?
(4 pts)
Stat 410/511 Midterm Page 4
Our confidence is in the process. When we repeat the procedure of gathering data
(at random) and building the interval many, many times, 95% of such intervals will
contain their true parameter.
Download