Assignment #3 STAT 992 Spring 2015 Complete the following problems below. Within each part, include your R program output with code inside of it and any additional information needed to explain your answer. Your R code and output should be formatted in the exact same manner as in the lecture notes. 1) (44 total points) Does drinking beer make you more or less attractive to mosquitos? The purpose of this problem is to investigate this question in a similar manner as how John Rauser does in a YouTube video at https://www.youtube.com/watch?v=5Dnw46eC-0o. This presentation cites Lefevre et al. (PLoS ONE, 2010) as motivation and the source of the data. The data used in the video is available in the mosquitos.csv file available on the graded materials web page on my course website. Below is part of the data: > set1 <- read.csv(file = "mosquitos.csv") > head(set1) Response Treatment 1 27 Beer 2 20 Beer 3 21 Beer 4 26 Beer 5 27 Beer 6 31 Beer > tail(set1) Response Treatment 38 13 Water 39 22 Water 40 20 Water 41 24 Water 42 18 Water 43 20 Water The response Yi is the number of mosquitos that were attracted to a subject in the beer (i = 1) and water (i = 2) groups. The treatment corresponds to whether a subject drank beer or water prior to the mosquitos being released. a) (4 points) Perform the standard t-test of H0: 1 – 2 = 0 vs. H0: 1 – 2 0, where i is the population mean for the beer and water groups, using both the “equal variances” and “unequal variances” assumption in two separate tests. b) (3 points) At approximately 6:50 in the video, Rauser says that “only a few people in this audience have any idea how that argument really works” in relation to using the standard t-test for the difference of means. Show me that you know how the argument works! c) The purpose of this part is to use the bootstrap to perform the hypothesis test in part a). Set a seed number of 8199 before each implementation of the bootstrap. Use R = 4999. i) (6 points) Implement approach #1 on page 72 of the notes with t yBeer yWater . In your work, show that the null hypothesis is satisfied with your adjusted data. State the final pvalue and include a histogram of the t values with the observed value of t plotted upon it. 1 ii) (3 points) Implement approach #1 on page 72 of the notes with z yBeer yWater s12 s22 n1 n2 . State the final p-value. iii) (3 points) Through using the resampled values of z in the previous part, determine if there is enough justification to use the t-distribution in part a). Explain your answer. iv) (5 points) Implement approach #2 on page 72 of the notes. Use t yBeer yWater . State the p-value and include a histogram of the t values with the observed value of t plotted upon it. d) The purpose of this part is to use a permutation test to perform the hypothesis test in part a). Set a seed number of 8199 before taking the resamples. Use R = 4999. i) (3 points) Perform a permutation test using t yBeer yWater for the hypothesis test in part a). ii) (3 points) Provide verification for the first resample that the order statistics remain the same as what was observed under the null hypothesis. iii) (2 points) Construct a histogram of the t values with the observed value of t plotted upon it. e) While not mentioned in the video, the paper suggests that the counts here are actually binomial with 100 trials per individual. Complete the following. i) (4 points) Estimate and state a logistic regression model using the glm() function with Response as the response variable and Treatment as the explanatory variable. To determine if the proportion of mosquitos who are attracted to a subject is dependent on whether beer or water is consumed, perform a likelihood ratio test using the estimated model, the standard “-2log()” statistic, and the standard 2 distribution approximation. Include a statement of the hypotheses. ii) (5 points) Perform a bootstrap test using case-based resampling, -2log() as the test statistic, R = 4999, and a seed of 8199. Make sure to take resamples under the null hypothesis and thoroughly describe in words how this was done! iii) (3 points) Construct a histogram of the -2log() values from the previous part. Include the observed value of -2log() on the plot and a 12 distribution. Is there enough justification to use the 12 distribution in part 1)e)i)? Explain. 2