Math 58B - Introduction to Biostatistics Spring 2015 Jo Hardin Lab Assignment 10 Lab Goals: 1. To understand why the technical conditions are important for a chi-square test. 2. To understand the pieces of the chisq test. Note 1: if you can't remember what an ISCAM function does, pass it the argument "?". load(url("http://www.rossmanchance.com/iscam2/ISCAM.RData")) # iscamsummary("?") # page 187 Note 2: Though, for this lab, you'll only use the applet (no R). For doing Chi-square tests in R, use the function chisq.test. help(chisq.test) # page 287 ## starting httpd help server ... done In class In this lab, we will investigate a new setting: chi-square test of 𝑟 × 𝑐 tables. Recall that a chi-square test is able to test equality of proportions over two or more groups. Note that this test is a generalization of the two sample comparison from chapter 2. As an initial step, complete Investigation 4.2 using the data at the beginning of the investigation. Note that Investigation 4.2 runs until part (l), but it is interrupted in the middle by Practice Problem 4.2. When using the data in the applet, type it into the sample data box exactly like below: dark night room far 40 39 12 normal 114 115 22 near 18 78 41 Then click on "use table." To turn in First consider the test applied by the randomization applet (questions 1-4). Use the following data: dark night room far 38 29 12 normal 94 115 22 near 58 78 41 Then click on "use table." .1. Consider the row and column totals of the shuffled tables (you should put a check next to "show table" so that you can see the table). Explain what the applet is doing. How does it impose the null hypothesis (that is, describe how the applet simulation creates a scenario where the null hypothesis is true)? .2. Explain how the expected values are calculated under the null hypothesis (include a statement of the null hypothesis in this setting). Give one example calculation of an expected count under the null hypothesis (remember, expected counts are almost never integers). .3. Which of the 9 cells in the table contributes most to the calculation of the 𝑋 2 statistic? What does that say about how the null hypothesis is being violated (or not being violated)? Explain. .4. Given the data of interest, provide a complete hypothesis test (hypotheses, p-value, conclusion). Are your empirical (randomization) and theoretical (chi-sq) p-values approximately equal? Report both p-values. Now consider the technical conditions needed to do a Chi-Square test. At the end of Investigation 4.2, we learned that the technical conditions for applying the Chi-square test of independence was that all expected counts had to be at least 1 and the average expected cell count is at least 5 (questions (h) - (l)). .5. Re-do the randomization test using data that is similar to #1-4 in terms of proportions in each group, but is much different in sample size: dark night room far 2 2 0 normal 3 3 0 near 1 2 1 Repeat the complete hypothesis test (hypotheses, p-value, conclusion) for the new data (don't forget to change the count samples box to a new number!). Are your empirical (randomization) and theoretical (chi-sq) p-values approximately equal? Report both pvalues. .6. Why are the p-values (empirical vs. theoretical) closer for the larger sample sizes? [Your answer needs to be more than ``because the technical conditions aren't met for the smaller sample sizes." Why do you think the technical conditions being met leads to the p-values being closer?]