Assignment #3

advertisement
Assignment #3
STAT 992
Spring 2015
Complete the following problems below. Within each part, include your R program output with code
inside of it and any additional information needed to explain your answer. Your R code and output
should be formatted in the exact same manner as in the lecture notes.
1) (44 total points) Does drinking beer make you more or less attractive to mosquitos? The purpose
of this problem is to investigate this question in a similar manner as how John Rauser does in a
YouTube video at https://www.youtube.com/watch?v=5Dnw46eC-0o. This presentation cites
Lefevre et al. (PLoS ONE, 2010) as motivation and the source of the data. The data used in the
video is available in the mosquitos.csv file available on the graded materials web page on my
course website. Below is part of the data:
> set1 <- read.csv(file = "mosquitos.csv")
> head(set1)
Response Treatment
1
27
Beer
2
20
Beer
3
21
Beer
4
26
Beer
5
27
Beer
6
31
Beer
> tail(set1)
Response Treatment
38
13
Water
39
22
Water
40
20
Water
41
24
Water
42
18
Water
43
20
Water
The response Yi is the number of mosquitos that were attracted to a subject in the beer (i = 1) and
water (i = 2) groups. The treatment corresponds to whether a subject drank beer or water prior to
the mosquitos being released.
a) (4 points) Perform the standard t-test of H0: 1 – 2 = 0 vs. H0: 1 – 2  0, where i is the
population mean for the beer and water groups, using both the “equal variances” and “unequal
variances” assumption in two separate tests.
b) (3 points) At approximately 6:50 in the video, Rauser says that “only a few people in this
audience have any idea how that argument really works” in relation to using the standard t-test
for the difference of means. Show me that you know how the argument works!
c) The purpose of this part is to use the bootstrap to perform the hypothesis test in part a). Set a
seed number of 8199 before each implementation of the bootstrap. Use R = 4999.
i) (6 points) Implement approach #1 on page 72 of the notes with t  yBeer  yWater . In your
work, show that the null hypothesis is satisfied with your adjusted data. State the final pvalue and include a histogram of the t values with the observed value of t plotted upon it.
1
ii) (3 points) Implement approach #1 on page 72 of the notes with z 
yBeer  yWater
s12 s22

n1 n2
. State the
final p-value.
iii) (3 points) Through using the resampled values of z in the previous part, determine if there
is enough justification to use the t-distribution in part a). Explain your answer.
iv) (5 points) Implement approach #2 on page 72 of the notes. Use t  yBeer  yWater . State the
p-value and include a histogram of the t values with the observed value of t plotted upon it.
d) The purpose of this part is to use a permutation test to perform the hypothesis test in part a).
Set a seed number of 8199 before taking the resamples. Use R = 4999.
i) (3 points) Perform a permutation test using t  yBeer  yWater for the hypothesis test in part a).
ii) (3 points) Provide verification for the first resample that the order statistics remain the same
as what was observed under the null hypothesis.
iii) (2 points) Construct a histogram of the t values with the observed value of t plotted upon it.
e) While not mentioned in the video, the paper suggests that the counts here are actually
binomial with 100 trials per individual. Complete the following.
i) (4 points) Estimate and state a logistic regression model using the glm() function with
Response as the response variable and Treatment as the explanatory variable. To
determine if the proportion of mosquitos who are attracted to a subject is dependent on
whether beer or water is consumed, perform a likelihood ratio test using the estimated
model, the standard “-2log()” statistic, and the standard 2 distribution approximation.
Include a statement of the hypotheses.
ii) (5 points) Perform a bootstrap test using case-based resampling, -2log() as the test
statistic, R = 4999, and a seed of 8199. Make sure to take resamples under the null
hypothesis and thoroughly describe in words how this was done!
iii) (3 points) Construct a histogram of the -2log() values from the previous part. Include the
observed value of -2log() on the plot and a 12 distribution. Is there enough justification to
use the 12 distribution in part 1)e)i)? Explain.
2
Download