Exam 1 answers

advertisement
Stat 301 -- Fall 2015 -- Midterm exam 1
Answers
Notes:
1) Many other wordings or versions of explanations were accepted for full or partial credit.
2) Notes on some answers explain some of the comments I wrote on your exams and explain common
mistakes.
Problem 1. Pace of life in US cities.
1. 3 pts. Are the 36 observations a sample or a population (in the statistical meaning of a
population)? Briefly explain your answer.
sample. The observations are a subset of the population.
2. 3 pts. We have talked about two measures of location, the mean and the median. Do you
expect the values of the mean and median for the walk variable to be about the same value, or
not? Briefly explain your answer.
similar values. The distribution of walk is symmetrical, so the mean and median are the same.
3. 3 pts. The average of the 36 walk values is 21.4. Is this number a statistic or a parameter?
Briefly explain your choice.
statistic. It is computed from the data.
The last set of questions concern the talk variable.
4. 4 pts. Is the distribution of values symmetrical or not? Briefly explain your answer.
skewed. Various possible reasons: more observations further below the middle than above.
5. 3 pts. What is the median of these values?
22
6. 5 pts. JMP reports the mean as 20.5833333. Is this an appropriate way to report this number?
If not, what is an appropriate value to report. Briefly explain your answer(s).
No. 20.6 Kelley’s rule: se/3 is 0.21, so report mean to 0.1 digit.
Note: Using sd is wrong because you are concerned about the precision of the mean.
7. 4 pts. Calculate the T statistic to test H0: population mean equals 22.0. Show your work.
𝑇=
20.583 − 22
= −2.20
0.643
Note: I deducted 1 point if you reversed estimate and parameter, which gave +2.20
8. 3 pts. If possible from the provided JMP output, report the two-tailed p-value for the test of the
null hypothesis that the mean equals 22.0 If not possible, say “not possible”.
0.0342 On the JMP output
Problem 2. Music and attention span.
9. 4 pts. Calculate the pooled standard deviation. Show your work.
(28 − 1) 2.6002 + (22 − 1)2.2532
𝑠𝑝 = √
= 2.45
28 + 22 − 2
Note: The most common problem was not squaring the sd’s.
10. 5 pts. (Put your answer on the top of the next page) Compare the number you calculated in
part 9 to the sd for the Music group and the sd for the Control group. Briefly explain your
choice. For example, if you answered a) larger, then you should explain why it is reasonable
that the pooled sd is larger than both the Music sd and the Control sd.
b. in between. The pooled sd is an average of the two sd’s, so you expect it to be in between.
(Actually, it is a weighted average of the variances, but the key point is average).
Note: If you did Q 9 incorrectly, you got credit for the larger / in between / smaller (2 points) if it
matched your result from Q 9. However, to get the 3 points for the explanation, you had to give
a correct explanation for the wrong result.
11. 4 pts. Calculate the degrees of freedom (df) for the pooled standard deviation. Show your work.
48 = 28 + 22 – 2
12. 5 pts. Compare the number you calculated in part 11 to the df for the Music group and the df
for the Control group. Briefly explain your choice (i.e., why this should be expected, as in
question 10)
a. larger. The pooled sd combines the two group-specific sd’s, so there is more information =
more df in the pooled sd.
13. 4 pts. Is it appropriate to assume equal variances for these data? Briefly explain why or why
not.
Yes. The ratio of sd’s is less than 2.
Notes:
1) Statements that observations are not normally distributed, or that treatments were randomly
assigned, or that the ou = the eu are all correct but irrelevant. Those statements address other
assumptions, not equal variances.
2) Calculating the difference of the two sd’s is wrong because the difference depends on the
units. Two sd’s (e.g. 0.1 and 0.3) that are numerically close when Y is measured in kg are not
close (100 and 300) when Y measured in gm.
3) You got most, but not full, credit if you just said the ratio was close to 1.
14. 4 pts. Calculate the standard error of the difference between the Music and Control group
means, assuming equal variances. Show your work.
𝟏
𝟏
𝟏
𝟏
0.70 (or 0.6994) = 𝒔𝒑 √𝒏 + 𝒏 = 𝟐. 𝟒𝟓√𝟐𝟐 + 𝟐𝟖
𝟏
𝟐
0.6994 could also be found on the JMP output (I intended to erase it but didn’t).
Note: You got partial credit if you used the unequal variance formula.
15. 5 pts. Compare the number you calculated in part 14 to the se for the mean of the Music group
and the se for the mean of the Control group. Briefly explain your choice (as in Q 10 and 12).
a. larger. The variability of the difference includes the variability of the Music mean and the
variability of the Control mean.
Note: Many folks talked about pooling. That’s irrelevant. If you didn’t pool and used the
unequal variance formula, you would get the same answer (larger) and have the same
explanation.
16. 3 pts. What is the two-sided p-value for the test of the null hypothesis that the mean attention
span in the Music group is the same as that for the Control group?
0.0021 or 0.0018. I didn’t specify equal variances (p=0.0021), so either answer was accepted.
17. 3 pts. Write a one-sentence conclusion about the results of the hypothesis test in question 16.
Many ways to word this. Basic idea is strong evidence that listening to rap music decreases
attention span.
Note: Saying ‘reject H0’ got full credit only if you described H0.
My comments like “what does this say about the issue under study” indicate that your
conclusion lacked enough detail (e.g. only said reject H0).
18. 4 pts. Will 0 be inside or outside the 99% confidence interval for the difference between the
two means? Briefly explain your answer.
outside. The p-value is less than 0.01.
Note: Many folks said “probably outside” because the 95% ci did not include 0, or because the
p-value was less than 0.05. My comment “can be more definite” indicates that you can say
something more definite than “probably”.
Problem 3: patch area and butterfly species diversity
19. 4 pts. What variable is the independent variable in this regression? log area
What variable is the dependent variable in this regression? species
20. 3 pts. What is the estimated slope of the regression line? 28.5 (on the JMP output)
21. 3 pts. What are the units for the regression slope.
Note: The units for species are “number of species”. The units for logArea are “logArea”
number of species / log area or number of species per log area
22. 5 pts. It is appropriate to conclude from this analysis that “Increasing the area of a patch
increases the average number of butterfly species”? Briefly explain why or why not.
Yes, because the study is an experiment (patch size was randomly assigned to sites).
Notes:
1) If your answer didn’t discuss causal claim or observation / experimental study, I wrote “why
causal” and you got partial credit.
2) If you claimed the study was observational, so a causal claim is incorrect, you got no credit.
23. 5 pts. Briefly explain what the estimated intercept, 36.25, “means” in the context of this study.
average number of species on a patch of log Area = 0 (or area = 1 ha).
24. 3 pts. Predict the average number of butterfly species that would be found if a patch had a log
Area of 1.5. Show your work.
79 species. = 36.25 + 1.5 * 28.5
25. 4 pts. The standard error for the predicted number of species (In the JMP output) is smaller for
patches of 1 log Area (Area of 10 ha) than it is for patches of 3 log Area (Area of 1000 ha).
Briefly explain why this is to be expected.
The mean X (log patch size) is closer to 1 than to 3. The se for the predicted Y is smallest when
X is at its mean value and increases when X moves away from the mean in either direction.
Notes:
1) This was the hardest problem on the exam.
2) The most common wrong answer was that there are more 10ha patches (log Area = 1) than
there are 1000ha patches (log Area = 3). True, but irrelevant. Would be relevant only if you
were computing the mean for each patch size from just the observations at that size. That’s not
a regression.
3) The next most common wrong answer was something subject-matter related, e.g. harder to
count species (so more variable) on larger patches or variability increases with the mean
number. These may be issues, but they raise concerns about unequal variance in the raw data,
not the precision of a regression prediction.
26. 3 pts. The mean predicted number of species for patches of 250 ha is 104.59 species (see JMP
packet). Report the 95% interval that appropriately describes the uncertainty in this estimate.
(81.7, 127.5)
Notes:
1) This was the second hardest problem on the exam.
2) The question asks for the precision of the estimated mean (underlined above). That is
described by the confidence interval for the mean (or for the line).
3) If you answered (48.7, 160.5) you gave me the prediction interval for a single new patch.
Download