Lab Assignment 10

advertisement
Math 58B - Introduction to Biostatistics
Spring 2015
Jo Hardin
Lab Assignment 10
Lab Goals:
1.
To understand why the technical conditions are important for a chi-square test.
2.
To understand the pieces of the chisq test.
Note 1: if you can't remember what an ISCAM function does, pass it the argument "?".
load(url("http://www.rossmanchance.com/iscam2/ISCAM.RData"))
# iscamsummary("?") # page 187
Note 2: Though, for this lab, you'll only use the applet (no R). For doing Chi-square tests in
R, use the function chisq.test.
help(chisq.test)
# page 287
## starting httpd help server ... done
In class
In this lab, we will investigate a new setting: chi-square test of 𝑟 × 𝑐 tables. Recall that a
chi-square test is able to test equality of proportions over two or more groups. Note that
this test is a generalization of the two sample comparison from chapter 2. As an initial
step, complete Investigation 4.2 using the data at the beginning of the investigation.
Note that Investigation 4.2 runs until part (l), but it is interrupted in the middle by Practice
Problem 4.2. When using the data in the applet, type it into the sample data box exactly like
below:
dark night room
far 40 39 12
normal 114 115 22
near 18 78 41
Then click on "use table."
To turn in
First consider the test applied by the randomization applet (questions 1-4). Use the
following data:
dark night room
far 38 29 12
normal 94 115 22
near 58 78 41
Then click on "use table."
.1. Consider the row and column totals of the shuffled tables (you should put a check next
to "show table" so that you can see the table). Explain what the applet is doing. How does it
impose the null hypothesis (that is, describe how the applet simulation creates a scenario
where the null hypothesis is true)?
.2. Explain how the expected values are calculated under the null hypothesis (include a
statement of the null hypothesis in this setting). Give one example calculation of an
expected count under the null hypothesis (remember, expected counts are almost never
integers).
.3. Which of the 9 cells in the table contributes most to the calculation of the 𝑋 2 statistic?
What does that say about how the null hypothesis is being violated (or not being violated)?
Explain.
.4. Given the data of interest, provide a complete hypothesis test (hypotheses, p-value,
conclusion). Are your empirical (randomization) and theoretical (chi-sq) p-values
approximately equal? Report both p-values.
Now consider the technical conditions needed to do a Chi-Square test. At the end of
Investigation 4.2, we learned that the technical conditions for applying the Chi-square test
of independence was that all expected counts had to be at least 1 and the average expected
cell count is at least 5 (questions (h) - (l)).
.5. Re-do the randomization test using data that is similar to #1-4 in terms of proportions in
each group, but is much different in sample size:
dark night room
far 2 2 0
normal 3 3 0
near 1 2 1
Repeat the complete hypothesis test (hypotheses, p-value, conclusion) for the new data
(don't forget to change the count samples box to a new number!). Are your empirical
(randomization) and theoretical (chi-sq) p-values approximately equal? Report both pvalues.
.6. Why are the p-values (empirical vs. theoretical) closer for the larger sample sizes? [Your
answer needs to be more than ``because the technical conditions aren't met for the smaller
sample sizes." Why do you think the technical conditions being met leads to the p-values
being closer?]
Download