Radon concentrations

advertisement
Stat 401 F/XW: HW 2 answers.
1) Radon. 7 pts.
a) 2 pt.
Radon concentrations
40
radon
30
20
10
0
1
group
The distribution is skewed with some extremely large values.
Note: if you use different software that identifies individual extreme values,
you see two unusually large points. I would call these extreme values
not outliers, because there is nothing erroneous about these points. They
are just houses with high radon concentrations. To a public health official,
these would be the two most important points in the data set.
b) 2 pt. mean = 4.54, median = 3.5
I would say these are not similar. I would expect this because the data are skewed.
You could also argue these are similar, because the influence of one or two large
values is diluted when you have a total of 42 observations.
Note: Although you could argue either way, I strongly favor 'not similar' because
the difference between the two estimates is large relative to the variability
in the bulk of the data. The 25'th percentile is 2.2; the 75'th percentile is 5.5.
The difference (IQR) is 3.3. The difference between mean and median is 1/3 of that.
c) 1 pt. The se of the mean is 0.76 (directly from SAS / JMP output) or calculated as:
4.896/sqrt(42) = 0.76.
d) 2 pt. Yes, the se is appropriate because the data are a simple random sample. The
skewness is irrelevant. The se does not assume any particular distribution.
Note: A lot of folks misunderstood this point. I talked about this briefly in the lecture, but it is
not emphasized in the book. In many ways this is analogous to the mean of a skewed
distribution. The sample average from a simple random sample is a valid and appropriate
estimator of the mean of a population, no matter what shape it has. Our discussion about
mean or median was about which is more appropriate for a particular goal. If the mean is the
right quantity, the sample average (and sample se) are appropriate because the sample is a
simple random sample. Skewness doesn't matter for either the average or the se of the
average. The distribution will matter when you use t-statistics to compute a confidence
interval.
Note: when would the se not be an appropriate measure? When the sampling is not a simple
random sample. If you had a way to identify houses likely to have high radon and houses
likely to have low radon (e.g. based on the underlying soil/rock), you could sample 21 houses
from the 'likely high' group and 21 houses from the 'likely low' group. The sample se we have
talked about is not appropriate. The survey is a stratified random sample for which different
methods need to be used to estimate the population mean and the se of that estimate.
2) mutagen and microsatellite nuclei. 7 pt.
a) 1pt for treatments, 1 pt for randomly assigned and can make causal conclusions
Treatments are control or 80 mg/ml of mutagen.
They are randomly assigned to a vial of cells.
Yes, because treatments are randomly assigned.
b) 1 pt
1/126 = 0.0079 = 0.79%.
Explanation: Using the permutation distribution, there is one permutation with a difference
of 8.2 or more extreme. There are 126 possible permutations, so P[diff >= 8.2] = 1/126
c) 2 pt
3/126 = 0.024 = 2.4% (0r 0.0238 = 2.38%)
The two-sided p-value is the probability of >= 8.2 or <= -8.2. There is 1 value >= 8.2 and two
<= -8.2, so three events in total.
d) 1 pt for p-value and correct interpretation of result, 1 pt for estimate of effect size
There are many possible (and reasonable) ways to word this. Here's mine.
There is evidence (p=0.024) that the mutagen increases the number of microsatellite nuclei.
The estimated increase with an 80 mg/ml dose of mutagen is 8.2 nuclei.
Note: When grading we were looking for appropriate interpretation of the p-value (evidence
of) and an estimate of the size of the effect. You could also include the means or medians of
the two groups (full credit) or the difference in the medians (full credit).
3) Problem ./26: 6 points
We were looking for:
Statement of methods: (1 pt)
An appropriate graphic (2 pts)
Appropriate numerical results (1 pt) that are consistent with your choice of method (1 pt)
A statement reporting your conclusion (1 pt).
One possible answer is:
The data were analyzed by plotting side-by-side box plots. Because these data are very
skewed (Figure 1), we report the median percent pro-environment votes for each party. The
Democratic party (median = 92%) had a much higher pro-environment vote percent than the
Republican Party (median = 7.1%). The one independent member of the House of
Representatives voted 100% pro-environment.
100
PctPro
75
50
25
0
D
I
R
Party
Figure 1. Percent pro-environment votes for members of the Democratic (D) and Republican
(R) parties and the one Independent.
Note: The statement about the Independent party member of Congress is not required.
Download