ANALYZING MORE GENERAL SITUATIONS UNIT 3 Unit Overview In the first unit we explored tests of significance, confidence intervals, generalization, and causation mostly in terms of a single proportion. In the second unit we compared two proportions, two averages (independent), and paired data. All of these had a binary categorical explanatory variable. In the third unit we will expand on this to compare multiple proportions, multiple averages, and two quantitative variables. The explanatory variable will be categorical (not necessarily binary) or quantitative. Unit Overview Throughout this unit we can still express the null hypothesis in terms of no association between the response and the explanatory variable and the alternative hypothesis in terms of an association between the response and the explanatory variable. We can also be more specific in our hypotheses to reflect the type of data (and hence parameters) we are working with. Comparing More Than Two Groups Using Proportions Chapter 8 Chapter Overview In chapter 5, our explanatory variable had two outcomes (like parents smoke or not) and the response variable also had two outcomes (like baby is boy or girl). In this chapter we will allow the explanatory variable to have more than two outcomes (like both parents smoke, only mother smokes, only father smokes, or neither parent smokes). We will still focus on the response variable having two outcomes, but it doesn’t have to. It can have many as well. Section 8.1: Simulation-Based Approach to Compare Multiple Proportions Example 8.1 Coming to a Stop Stopping Virginia Tech students investigated which vehicles came to a stop at a intersection where there was a four-way stop. While the students examined many factors for an association with coming to a complete stop, we will investigate whether intersection arrival patterns are associated with coming to a complete stop: Vehicle arrives alone Vehicle is the lead in a group of vehicles Vehicle is a follower in a group of vehicles Stopping Null hypothesis: There is no association between the arrival pattern of the vehicle and if it comes to a complete stop. Alternative hypothesis: There is an association between the arrival pattern of the vehicle and if it comes to a complete stop. Stopping Another way to write the null uses the probability that a single vehicle will stop is the same as the probability a lead vehicle will stop, which is the same as the longterm probability that a following vehicle will stop Or 𝜋Single = 𝜋Lead = 𝜋Follow where 𝜋 is the probability a vehicle will stop. The alternative hypothesis is that not all these probabilities are the same (at least one is different). Stopping Percentage of vehicles stopping: 85.8% of single vehicles 90.5% of lead vehicles 77.6% of vehicles following in a group Complete Stop Not Complete Stop Total Single Vehicle 151 (85.8%) 25 (14.2%) 176 Lead Vehicle 38 (90.5%) 4 (9.5%) 42 Following Vehicle 76 (77.6%) 22 (22.4%) 98 Total 265 51 316 Stopping Remember that no association implies the proportion of vehicles that stop in each category should be the same. Our question is, if the same proportion of vehicles come to a complete stop in all the three categories, how unlikely would we get proportions at least as far apart as we did? Stopping Applying the 3S Strategy We need to find a statistic that will describe how far apart our proportions are from each other. This is more complicated than when we just had two groups. We also need to decide what types of values for that statistic (e.g., large or small, positive or negative) we would consider evidence against the null hypothesis. Stopping To find a statistic we start by finding the three differences in proportions. Stopping How can we combine 3 differences (-0.047, -0.082 and 0.129) into a single statistic? Add them up? Average them? -0.047 + (-0.082) + 0.129 = 0. [-0.047 + (-0.082) + 0.129]/3 = 0. What could we do so we don’t always get a sum of zero? Stopping 1. Statistic We are going to use the mean of the absolute value of the differences (MAD) (0.047 + 0.082 + 0.129)/3 = 0.086. What would have to be true for the average of absolute differences to equal 0? What types of values of this statistic (e.g., large or small) would provide evidence in favor of the alternative hypothesis? Stopping 2. Simulate If there is no association between arrival pattern and whether or not a vehicle stops it basically means it doesn’t matter what the arrival pattern is. Some vehicles will stop no matter what the arrival pattern and some vehicles won’t. We can model this by shuffling either the explanatory or response variables. (The applet will shuffle the response.) Stopping Use the Multiple Proportions Applet Stopping The results of one shuffle of the response Stopping Simulated values of the statistic for 1000 shuffles Bell-shaped? Centered at 0? Stopping 3. Strength of evidence Do we use a 1-sided or 2-sided alternative hypothesis to compute a p-value? By finding the absolute values we have lost direction in terms of which proportion is smaller than another. When we are looking for more of a difference, the MAD statistic will be larger. Hence to calculate our p-value we will always count the simulations that are as large or larger than the MAD statistic. Stopping We had a p-value of 0.0854 so there is moderate evidence against the null hypothesis Do the results generalize to intersections beyond the one used? Can we draw any cause-and-effect conclusions? Probably not since intersections have different factors that influence stopping No, an observational study If we had stronger evidence of a difference in groups, we could follow with pairwise tests to see which proportions are significantly different from each other. (We will save this until the next section.) Recruiting Organ Donors Exploration 8.1 Theory-Based Approach to Compare Multi-Category Categorical Variables Section 8.2 Theory based vs. Simulation based Just as always: Simulation based methods Always work Require the ability to simulate (computer). Theory-based methods Avoid the need for simulation Additional ‘validity conditions’ must be met Other Distributions The null distributions in the past few chapters were bell-shaped and centered at 0. For these, we used normal and t-distributions to predict what the null distribution would look like. In this chapter our null distribution was neither bell-shaped nor centered at 0. We can, however, use a chi-squared distribution to predict the shape of the null distribution. When doing this, we will not use the MAD statistic, but a chi-square statistic. Sham Acupuncture A randomized experiment was conducted exploring the effectiveness of acupuncture in treating chronic lower back pain (Haake et al. 2007). Acupuncture inserts needles into the skin of the patient at acupuncture points to treat a variety of ailments. Sham Acupuncture 3 treatment groups 1. Verum acupuncture: traditional Chinese 2. Sham acupuncture: needles inserted into the skin, but not deeply and not at acupuncture points 3. Traditional, non-acupuncture, therapy of drugs, physical therapy and exercise. 1162 patients were randomly assigned to each treatment group 387 patients in groups 1 and 2, and 388 in group 3. Sham Acupuncture Null hypothesis - no association between type of treatment received and reduction in back pain. Alternative hypothesis - there is an association between type of treatment and reduction in back pain. Sham Acupuncture Here are the results. Pain reduced Pain not reduced Total Real Acup. Sham Acup. NonAcup. Total 184 (0.475) 203 (0.524) 387 171 (0.442) 216 (0.558) 387 106 (0.273) 282 (0.776) 388 461 701 1162 Sham Acupuncture Remember that the MAD statistic is the mean of the absolute value of differences in proportions for all 3 groups: Real vs. Sham: 0.476-0.442=0.034 Real vs. None:0.476-0.274= 0.202 Sham vs. None: 0.442-0.274= 0.168 The statistic (MAD) is (0.034+0.202+0.168)/3=0.135 Larger MAD statistics give more evidence against the null Sham Acupuncture The other statistic we saw, chi-square is calculated using the formula. 1 2 n i ( pˆ i pˆ ) pˆ (1 pˆ ) 𝑝𝑖 is the proportion of successes in category i 𝑝 is the overall proportion of successes in the dataset. 𝑛𝑖 is the sample size of category i Sham Acupuncture Don’t worry about the formula. Just realize it is another way to measure how far apart the proportions are from each other (or the overall proportion of successes). Just like the MAD statistic, the larger the chisquare statistic the more evidence there is against the null. Let’s test this in the Multiple Proportions Applet using both MAD and chi-square simulation. Sham Acupuncture Strength of evidence: In both cases, we got p-values of 0. Nothing as large as a MAD statistic of 0.135 or larger ever occurred in this simulation. Likewise nothing as large as a chi-square statistic of 38.05 or larger ever occurred in this simulation. Hence we have very strong evidence against the null and in support of the type of acupuncture used is associated with pain reduction. Sham Acupuncture The theory-based method will predict the chi-square null distribution. We can use the applet to overlay a theoretical chi-square distribution and find the theory-based p-value. The theory based method only works well when each cell in the 2-way table has at least 10 observations. This is easily met here. The smallest count was 106. Results Sham Acupuncture What if you find evidence of an association? What happens after a chi-squared test? No standard approach to follow Could describe the association with confidence intervals on each of the differences in proportions. Sham Acupuncture 95% confidence intervals (found in applet) on the difference improvement percentages comparing: Real to sham (-0.0366, 0.1038) Real to none (0.1356,0.2689)* Sham to none (0.1022, 0.2351)* There is evidence that the probability of pain reduction is different (lower) for no acupuncture treatment than the other two. There is no significant difference between real and sham acupuncture however. Let’s see all this in the applet. Exploration 8.2: Conserving Hotel Towels Hotels are encouraging guests to practice conservation by not having their towels washed. (Goldstein et al., 2008) conducted a randomized experiment to investigate how different phrasings on signs placed on bathroom towel racks impacted towel reuse behavior. Researchers were interested how messages which communicated different types of “social norms” impacted towel reuse. Exploration 8.2: Conserving Hotel Towels Rooms at a single hotel were randomly assigned to receive 1 of 5 messages on the sign on the towel bar Is there an association between the message left and whether or not towels get reused?