SOCY2200 Statistics Instructor: Natasha Sarkisian Worksheet 6: Hypothesis Testing for Comparing Two Means There are two basic types of problems that involve comparing two means – comparing means of independent samples and comparing means of non-independent samples (paired samples). The key difference is to establish whether the two groups are related in such a way that we could form pairs of observations. (Before and after measures are paired; measures from spouses, or siblings, or measures of two things from the same person are all paired). Independent Samples 1. State your null and research hypotheses: H0: μ1 = μ2 The research hypothesis can be directional (use one-tailed test): H1: μ1 < μ2 or H1: μ1 > μ2 Or it can be non-directional (use two-tailed test): H1: μ1 ≠ μ2 2. Select the alpha level (the level of risk of Type I error that you are willing to take): the most common choice is 0.05; the choices for more stringent tests are 0.01 and .001; for smaller samples, we sometimes use .10. 3. Identify the test statistic that you need to use: here, it’s once again the Student’s t . 4. We compute it using the formula: 5. Use the table to find the critical value: We need to use Table B2 and find the value of t based on three pieces of information: df=n1 + n2 -2, our selected alpha level, and whether our test is one-tailed vs two-tailed. 6. Compare computed value with the critical value – we compare the absolute values, so if the computed value is negative, change it to a positive one, then compare. 7. State your decision about H0: If your computed value is larger than the critical value reject H0 in favor of H1. If your computed value is smaller than the critical value fail to reject H0. Example of Independent Samples Group Comparison We want to study the effectiveness of a children’s TV program that is designed to teach reading skills. 50 children will participate in the study – half of them are randomly assigned to the experimental group, another half – to control group. Both groups spend time in the lab over a few months; the experimental group watches the “reading” show, the control group watches another show. After that, we find the mean reading score in the experimental group = 12.3 (SD = 2.1), and in the control group – 11.2 (SD=1.8). Can we conclude that our program actually helps children learn how to read? We want to have 95% certainty about our conclusion. Null hypothesis: the experimental group is not different from the control group in their reading scores. Research hypothesis: experimental group has higher reading scores than the control group. 1 1. H0: μ1 = μ2 H1: μ1 > μ2-- since we expect higher scores for the experimental group use a directional research hypothesis one-tailed test 2. We want to be 95% confident if we conclude that this program works – we need to use significance level alpha = 0.05 (1-.95=.05). 3. Test statistic – Student’s t 4. Compute: Numerator: 12.3-11.2=1.1 Denominator: sqrt[((25-1)*2.1*2.1 + (25-1)*1.8*1.8)/(25+25-2)* (25+25)/25*25] = 0.55 t = 1.1 / 0.55 = 2 5. Use Table B2 to find the critical value: df=n1+n2-2=50-2=48, alpha=0.05, one-tailed test tcrit = 1.676 6. Does computed statistic exceed critical value? 2 is larger than 1.676 so it does exceed the critical value 7. Conclusion: We can reject the null hypothesis H0: μ1 = μ2 That means, the experimental group is doing significantly better than the control group our program likely works Paired Samples 1. State your null and research hypotheses: H0: μ1 = μ2 The research hypothesis can be directional (use one-tailed test): H1: μ1 < μ2 or H1: μ1 > μ2 Or it can be non-directional (use two-tailed test): H1: μ1 ≠ μ2 2. Select the alpha level (the level of risk of Type I error that you are willing to take): the most common choice is 0.05; the choices for more stringent tests are 0.01 and .001; for smaller samples, we sometimes use .10. 3. Identify the test statistic that you need to use: here, it’s once again the Student’s t. 4. If we have raw data (actual values of two scores for each pair), we compute it using the formula: where D = second value in the pair minus the first value in the pair. If we have aggregated data (mean and standard deviation of the difference between the two scores in each pair), we use the formula: where d = variable containing the difference scores for each pair. We cannot solve such a problem if we only get the mean and SD for each of the two scores separately because we need to deal with pairs! 5. Use the table to find the critical value: We need to use Table B2 and find the value of t based on three pieces of information: df=n - 1, our selected alpha level, and whether our test is one-tailed vs two-tailed. 6. Compare computed value with the critical value – we compare the absolute values, so if the computed value is negative, change it to a positive one, then compare. 7. State your decision about H0: If your computed value is larger than the critical value reject H0 in favor of H1. If your computed value is smaller than the critical value fail to reject H0. 2 Example of Paired Samples Comparison We want to study the effectiveness of a financial literacy video that we developed. For a group of 15 volunteers, we measure their financial literacy before they watch the video and then administer a similar financial literacy measure one week after they watch the video. Can we conclude that this video increases financial literacy? We want to have 95% certainty about our conclusion. Here are the scores: Person # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Before 12 13 11 10 13 11 10 9 9 10 12 11 11 14 11 After 12 15 12 9 16 13 13 11 12 10 15 9 14 16 16 Null hypothesis: The financial literacy scores are the same before and after the video. Research hypothesis: The financial literacy scores after watching the video are higher than the ones before. 1. State hypotheses: H0: μ1 = μ2 H1: μ1 < μ2 one-tailed test 2. Select alpha: 0.05 (1-.95=.05) 3. Test statistic: Student’s t 4. Formula: So we need to calculate difference scores to apply the formula: D = second value – first value 3 Person # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ∑ Before 12 13 11 10 13 11 10 9 9 10 12 11 11 14 11 After 12 15 12 9 16 13 13 11 12 10 15 9 14 16 16 D 0 3 1 -1 3 2 3 2 3 0 3 -2 3 2 5 27 D2 0 9 1 1 9 4 9 4 9 0 9 4 9 4 25 97 Plug these numbers into the formula: t = 27/sqrt[(15*97 – 27*27)/(15-1)] = 3.75 5. Use the table to find critical value: Table B2 (df=n-1 = 15-1=14, alpha=.05, one-tailed) tcrit = 1.762 6. Compare: computed value 3.75 is larger than the critical value of 1.762 7. Decision: We reject H0 Conclusion: We are 95% confident that our video helps increase levels of financial literacy. If we had aggregated data, the problem would sound like this: We want to study the effectiveness of a financial literacy video that we developed. For a group of 15 volunteers, we measure their financial literacy before they watch the video and then administer a similar financial literacy measure one week after they watch the video. The average difference score that we obtained is 1.8, and the standard deviation for the difference is 1.86. Can we conclude that this video increases financial literacy? We want to have 95% certainty about our conclusion. All the steps would be the same, but we would calculate t using the other formula (where d is the variable containing difference scores): t= 1.8/(1.86/sqrt(15)) = 3.75 (note that this t value is the same as above) 4