Unit 2: Comparing Two Groups In Unit 1, we learned the basic process of statistical inference using tests and confidence intervals. We did all this by focusing on a single proportion. In Unit 2, we will take these ideas and extend them to comparing two groups. We will compare two proportions, two independent means, and paired data. Chapter 5: Comparing Two Proportions 5.1: Descriptive (Two-Way Tables) 5.2: Inference with Simulation-Based Methods 5.3: Inference with Theory-Based Methods Positive and Negative Perceptions Consider these two questions: ◦ Are you having a good year? ◦ Are you having a bad year? Do people answer each question in such a way that would indicated the same answer? (e.g. Yes for the first one and No for the second.) Positive and Negative Perceptions Researchers questioned 30 students (randomly giving them one of the two questions). They then recorded if a positive or negative response was given. Is this an observational study or randomized experiment? Positive and Negative Perceptions Observational units ◦ The 30 students Variables ◦ Question wording (good year or bad year) ◦ Perception of their year (positive or negative) Which is the explanatory and which is the response? Raw Data in a Spreadsheet Individual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Type of Question Good Year Good Year Bad Year Good Year Good Year Bad Year Good Year Good Year Good Year Bad Year Good Year Bad Year Good Year Bad Year Good Year Response Individual Positive Negative Positive Positive Negative Positive Positive Positive Positive Negative Negative Negative Positive Negative Positive 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Type of Question Good Year Bad Year Good Year Good Year Good Year Bad Year Good Year Bad Year Good Year Bad Year Good Year Bad Year Good Year Bad Year Bad Year Response Positive Positive Positive Positive Positive Negative Positive Negative Positive Negative Positive Negative Positive Positive Negative Two-Way Tables A two-way table organizes data ◦ Summarizes two categorical variables ◦ Also called contingency table Are students more likely to give a positive response if they were given the good year question? Positive response Negative response Total Good Year Bad Year Total 15 3 18 4 8 12 19 11 30 Two-Way Tables Conditional proportions will help us better determine if there is an association between the question asked and the type of response. We can see that those given the positive question were more likely to respond positively. Good Year Positive response Negative response Total Bad Year 15/18 ≈ 0.83 4/12 ≈ 0.33 3 8 18 12 Total 19 11 30 Segmented Bar Graphs We can use segmented bar graphs to see this association. Remember that variables are associated if the conditional proportion of the outcomes for one group differ from the conditional proportion of outcomes in other groups. Perceptions Those responding to the good year question were more likely to answer positively (83% to 33%) than those responding positively to the bad year question. The statistic we will be using to measure this is the difference in proportions. ◦ 0.83 - 0.33 = 0.50 higher for the good year question than the bad year question. A Sneak Peak at Comparing Two Proportions: Simulation-Based Approach In the next section we will conduct tests of significance to compare two proportions and I want to give you a preview of that here. We will assume there is no association between the variables (i.e. the two population proportions are the same) and decide if two sample proportions differ enough to conclude this would be very unlikely just by random chance. Hypotheses Null Hypothesis: There is no association between which question is asked and the type of response. (The proportion of positive responses will be the same in each group. ) Alternative Hypothesis: There is an association between which question is asked and the type of response. (The proportion of positive responses will be different in each group. ) Results The difference in proportions of positive responses is 0.83 − 0.33 = 0.50. How likely is a difference this great or greater if the type of question asked made no difference in how the student would respond? Positive response Negative response Total Good Year Bad Year Total 15 (83%) 3 18 4 (33%) 8 12 19 11 30 Random Reassignment Notice that 19 students gave a positive response. If the null hypothesis is true, these 19 would have given a positive response no matter which question was asked. Therefore, under a true null hypothesis, we can randomly place these 19 people into either group and they will still give a positive response. This replicates the random assignment that was done in the experiment. We will also keep constant the 18 that receive the positive question and 12 that receive the negative question. Positive response Negative response Total Good Year Bad Year Total random random 18 random random 12 19 11 30 You can think about this random reassignment with the raw data as well. It doesn’t matter which question was asked, the responses will be the same. Therefore, we can shuffle the type of question and leave the responses fixed. This is equivalent to keeping the same column and row totals and just shuffling the inside of the two-way table as described earlier. Individual 1 2 3 4 5 6 7 Type of Question Good Year Good Year Bad Year Good Year Good Year Bad Year Good Year Response Positive Negative Positive Positive Negative Positive Positive Individual 16 17 18 19 20 21 22 Type of Question Good Year Bad Year Good Year Good Year Good Year Bad Year Good Year Response Positive Positive Positive Positive Positive Negative Positive 8 9 10 11 12 13 14 15 Good Year Good Year Bad Year Good Year Bad Year Good Year Bad Year Good Year Positive Positive Negative Negative Negative Positive Negative Positive 23 24 25 26 27 28 29 30 Bad Year Good Year Bad Year Good Year Bad Year Good Year Bad Year Bad Year Negative Positive Negative Positive Negative Positive Positive Negative Random Reassignment I did this once and found a difference in the proportions of positive responses for the two questions of 0.50 − 0.83 = −0.33 Positive response Negative response Total Good Year Bad Year Total 9 (50%) 9 18 10 (83%) 2 12 19 11 30 Random Reassignment I did this again and found a difference in the proportions of positive responses for the two questions of 0.61 − 0.67 = −0.06 Positive response Negative response Total GoodYear BadYear Total 11 (61%) 7 18 8 (67%) 4 12 19 11 30 Random Reassignment I did this again and found a difference in the proportions of positive responses for the two questions of 0.67 − 0.58 = 0.09 Positive response Negative response Total GoodYear Bad Year Total 12 (67%) 6 18 7 (58%) 5 12 19 11 30 Random Reassignment In my three randomizations, I have yet to see a difference in proportions that is as far away from zero as the observed difference of 0.5. Let’s do some more randomizations to develop a null distribution. Collection 1 -0.6 Dot Plot -0.4 -0.2 0.0 0.2 0.4 difference_in_proportions 0.6 Random Reassignment After 1000 randomizations, only 7 were as far away from zero as our observed proportion. Conclusion Since we have a p-value of 7/1000 or 0.007, we can conclude the alternative hypothesis and say we have strong evidence that how the question is phrased affects the response. Applets Let’s look at how this is done in two applets ◦ Simulation for Two Proportions ◦ Simulation for Multiple Proportions Exploration 5.1: Murderous Nurse? Example 5.2: Swimming With Dolphins Swimming with Dolphins Is swimming with dolphins therapeutic for patients suffering from clinical depression? Researchers recruited 30 subjects aged 1865 with a clinical diagnosis of mild to moderate depression. Discontinued antidepressants and psychotherapy 4 weeks prior to and throughout the experiment 30 subjects went to an island near Honduras Randomly assigned to two treatment groups Swimming with Dolphins Both groups engaged in one hour of swimming and snorkeling each day. One group swam in the presence of dolphins and the other group did not. Participants in both groups had identical conditions except for the dolphins After 2 weeks, each subjects’ level of depression was evaluated, as it had been at the beginning of the study The response variable is if the subject achieved substantial reduction in depression. Swimming with Dolphins Observational units ◦ The 30 subjects with mild to moderate depression. Explanatory variable ◦ Swimming with dolphins or not Response variable ◦ Reduction in depression or not Are the variables quantitative or categorical? Swimming with Dolphins Is this study an observational study or an experiment? Are the subjects in this study a random sample from a larger population? Swimming with Dolphins Results Dolphin group Control group Total Improved 10 (67%) 3 (20%) 13 Did Not Improve 5 12 17 Total 15 15 30 Swimming with Dolphins The difference in proportions of improvers is 0.67 – 0.20 = 0.47. There are two possible explanations for an observed difference of 0.47. ◦ A tendency to be more likely to improve with dolphins ◦ The 13 subjects were going to show improvement with or without dolphins and random chance assigned more improvers to the dolphins Swimming with Dolphins Null hypothesis: Dolphins don’t help ◦ Swimming with dolphins is not associated with substantial improvement in depression Alternative hypothesis: Dolphins help ◦ Swimming with dolphins increases the probability of substantial improvement in depression symptoms Swimming with Dolphins The parameter is the (long-run) difference in the probability of improving between receiving dolphin therapy and the control. Our statistic is the observed difference in sample proportions (𝑝𝑑𝑜𝑙𝑝ℎ𝑖𝑛𝑠 − 𝑝𝑐𝑜𝑛𝑡𝑟𝑜𝑙 ) or 0.67 – 0.20 = 0.47. Swimming with Dolphins Null Hypothesis: The probability someone exhibits substantial improvement after swimming with dolphins is the same as the probability someone exhibits substantial improvement after swimming without dolphins. Alternative Hypothesis: The probability someone exhibits substantial improvement after swimming with dolphins is higher than the probability someone exhibits substantial improvement after swimming without dolphins. Swimming with Dolphins Since we defined our parameter as the difference in probability of improving between the 2 groups, we can write our hypotheses this way as well: H0: 𝜋dolphins = 𝜋control 𝜋dolphins 𝜋control = 0 H a: 𝜋dolphins > 𝜋control 𝜋dolphins 𝜋control > 0 Swimming with Dolphins If the null hypothesis is true (dolphin therapy is not better) we would have 13 improvers and 17 non-improvers regardless of the group they were in. Any differences we see between groups arise solely from the randomness in the assignment to the groups. Swimming with Dolphins We can perform this simulation with cards. ◦ 13 black cards represent the improvers ◦ 17 red cards represent the non-improvers We assume these outcomes would happen no matter which treatment group subjects were in. Shuffle the cards and put 15 in one pile (dolphin therapy) and 15 in another (control group) An improver is equally likely to be assigned to each group Swimming with Dolphins In the actual study, there were 10 improvers (diff of 0.47) in the dolphin group. We conducted 3 simulations and got 8, 5, and 6 improvers in the dolphin therapy group. (notice the diff in proportions) Swimming with Dolphins We did 1000 repetitions to develop a null distribution. ◦ Why is it centered at about 0? ◦ What does each dot represent? Swimming with Dolphins 13 out of 1000 results had a difference of 0.47 or higher (p-value = 0.013). 0.47−0 0.178 ≈ 2.65 SD above zero. 0.47 is Using either the p-value or standardized statistic, we have strong evidence against the null and can conclude that swimming with dolphins increases the probability of substantial improvement in depression symptoms. Swimming with Dolphins A 95% confidence interval for the difference in the probability using the standard deviation from the null distribution is 0.467 + 2(0.178) = 0.467 + 0.356 or (0.111to 0.823) We are 95% confident that when allowed to swim with dolphins, the probability of improving is between 0.111 and 0.823 higher than when no dolphins are present. How does this interval back up our conclusion from the test of significance? Swimming with Dolphins Can we say that the presence of dolphins caused this improvement? ◦ Since this was a randomized experiment, and assuming everything was identical between the groups, we have strong evidence that dolphins were the cause Can we generalize to a larger population? ◦ Maybe mild to moderately depressed 18-65 year old patients willing to volunteer for this study ◦ We have no evidence that random selection was used to find the 30 subjects. Exploration 5.2: Contagious Yawns? MythBusters investigated this. 50 subjects were ushered into a small room by co-host Kari. She yawned as she ushered 34 in the room and for 16 she didn’t yawn. We will assume she randomly decided who would received the yawns. Comparing Two Proportions: Theory-Based Approach Section 5.3 Introduction Just as with a single proportion, we can often predict results of a simulation using a theory-based approach. The theory-based approach also gives a simpler way to generate a confidence intervals. Smoking and Birth Gender Smoking and Gender How does parents’ behavior affect the sex of their children? Fukuda et al., 2002 (Japan) found the following: ◦ 255 of 565 births (45.1%) where both parents smoked more than a pack a day were boys. ◦ 1975 of 3602 births (54.8%) where both parents did not smoke were boys. Other studies have shown a reduced male to female birth ratio where high concentrations of other environmental chemicals are present (e.g. industrial pollution, pesticides) Smoking and Gender A segmented bar graph and 2-way table Let’s compare the proportions to see if the difference is statistically significantly. Smoking and Gender Null Hypothesis: There is no association between smoking status of parents and sex of child. The probability of having a boy is the same for parents who smoke and don’t smoke. 𝜋smoking - 𝜋nonsmoking = 0 Alternative Hypothesis: There is an association between smoking status of parents and sex of child. The probability of having a boy is not the same for parents who smoke and don’t smoke 𝜋smoking - 𝜋nonsmoking ≠ 0 Smoking and Gender What are the observational units in the study? What are the variables in this study? Which variable should be considered the explanatory variable and which the response variable? Can you draw cause-and-effect conclusions for this study? Smoking and Gender OK to shuffle? In the last section we “re-randomized” subjects to treatment groups to simulate the null distribution. In this study the parents weren’t randomized to the treatment, since it’s observational, but we can still represent the null hypothesis of no association through randomization. Smoking and Gender Use the 3S Strategy to asses the strength 1. Statistic: The proportion of boys born to nonsmokers minus boys born to smokers is 0.548 – 0.451 = 0.097. Smoking and Gender 2. Simulate: Use the Multiple Proportions applet to simulate Many repetitions of shuffling the 2230 boys and 1937 girls to the 565 smoking and 3602 nonsmoking parents Calculate the difference in proportions of boys between the groups for each repetition. Shuffling simulates the null hypothesis of no association Smoking and Gender 3. Strength of evidence: Nothing as extreme as our observed statistic (≥ 0.097 or ≤ −0.097) occurred in 5000 repetitions, How many SDs is 0.097 above the mean? Smoking and Gender Notice the null distribution is centered at zero and is bell-shaped. This, along with its standard deviation can be predicted using normal distributions. Smoking and Gender We can use either the Multiple Proportion applet or the Theory-Based Inference applet to find the p-value Smoking and Gender Estimation (Confidence Intervals) How different are the population proportions of having a boy between non-smoking and smoking parents? How different are the probabilities of having a boy between non-smoking and smoking parents? The parameter we are estimating is the difference in population proportions (or probabilities) (𝜋ns − 𝜋s). It should be centered at 0.097, our sample difference in proportions Smoking and Gender From our test of significance, do we expect 0 to be in the interval of plausible values for the difference in the population proportions? Smoking and Gender Again, either applet can be used to determine a confidence interval. We are 95% confident that the probability of a boy baby is 0.053 to 0.141 higher for families where neither parent smokes compared to families with two smoking parents Smoking and Gender We can also write the confidence interval in the form: ◦ statistic ± margin of error. Our statistic is the observed sample difference in proportions, 0.097. We can find the margin of error by subtracting the statistic (center) from the upper endpoint or 0.141 – 0.097 = 0.044. 0.097 ± 0.044 ◦ Is the margin of error about the standard deviation? Smoking and Gender How would the interval change if the confidence level was 99%? Smoking and Gender Written as the statistic ± margin of error 0.097 ± 0.058. Margin of error ◦ 0.058 for the 99% confidence interval ◦ 0.044 for the 95% confidence interval Smoking and Gender How would the 95% confidence interval change if we were estimating 𝜋smoker – 𝜋nonsmoker instead of 𝜋nonsmoker – 𝜋smoker ? Smoking and Gender (−0.141, −0.053) or −0.097 ± 0.044 instead of (0.053, 0.141) or 0.097 ± 0.044 The negative signs indicate the probability of a boy born to smoking parents is lower than that for nonsmoking parents. Smoking and Gender Validity Conditions of Theory-Based Same as with a single proportion. Should have at least 10 observations in each of the cells of the 2 x 2 table. Male Female Total Smoking Parents Nonsmoking Parents Total 255 310 565 1975 1627 3602 2230 1937 4167 Smoking and Gender The strong significant result in this study yielded quite a bit of press when it came out Soon other studies came out which found no relationship between smoking and gender One article argued that confounding variables like social factors, diet, environmental exposure or stress were the reason for different study’s results. (These are all possible since it was an observational study.) Formulas How was z = 4.30 found? 𝑧= 𝑝1 − 𝑝2 1 1 𝑝(1 − 𝑝) + 𝑛1 𝑛2 Formulas How was the margin of error for the difference in proportions 0.044 found? 𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 ⨯ 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 ) + 𝑛1 𝑛2 The multiplier is dependent upon the confidence level. ◦ 1.645 for 90% confidence ◦ 1.96 for 95% confidence ◦ 2.576 for 99% confidence Strength of Evidence As the proportions move farther away from each other, the strength of evidence increases. As sample size increases, the strength of evidence increases. • Let’s run this previous test using both the Simulation-Based and the TheoryBased Applets. • Donating Blood Exploration 5.3 • Questions 1-14 (skip 2 and 5)