Chapter 19 – Two – Sample Problems In Chapter 9, you studied randomized comparative experiments and the principles of sound experimental design (randomization, comparison and repetition). Comparing two random samples separately from two populations is a two – sample problem and is the focus of chapter 19 and part of chapter 20. One versus two Samples When testing a hypothesis, we ask if the sample mean is significantly different from some particular value. For example, what if tutored students had significantly higher SAT scores than the average? Even if this was the case you should ask yourself some questions. Are there other factors that make tutored students different from the rest of the population? Remember lurking variables. Just the fact that they signed up for tutoring suggests they are different from other students. Can you think of reasons why this group would be different? In practice, you need to also have a control group and compare the two samples. Comparing Two Samples (Chapter 19) Suppose you want to compare the means of two groups, but they are not matched pairs (no pairing of individuals). In this case, the samples can be different sizes. We are comparing the means of the two groups and we can assume that: Each group is a SRS from two distinct populations Responses in each group are independent of those from the other group Both populations are normally distributed. The mean and standard deviation from the populations are unknown. It is enough that the distributions have similar shapes and that the data have no strong outliers. We will call the variable 𝑥1 in the first population and 𝑥2 in the second population because these variables might have different distributions in the two populations. Population 1 Population 2 Sample 1 Sample 2 Variable Mean Standard deviation size 𝑥1 𝜇1 𝜎1 𝑥2 𝜇2 𝜎2 𝑥̅1 𝑠1 𝑛1 𝑥̅2 𝑠2 𝑛2 We will use the sample means and standard deviations to estimate the unknown parameters. We want to compare the two population means either by giving a confidence interval for their difference 𝜇1 − 𝜇2 or by testing the hypothesis of no difference, 𝐻0 = 𝜇1 − 𝜇2 . Goal: To estimate 𝜇1 − 𝜇2 . To do this we will use the difference between the means of the two samples (𝑥̅1 − 𝑥̅2 ) Ex pg 468 People gain weight when they take in more energy from food then they expend. So to investigate the link between obesity and energy spent. Twenty healthy volunteers who do not exercise are chosen. Ten are lean and ten are mildly obese but still healthy. The following table gives data on the amount of time (in minutes per day) that the subjects spend standing or walking, sitting or lying down: What are the null and alternative hypotheses? 𝐻0 : 𝜇1 = 𝜇2 (both groups have same mean standing and walking time) 𝐻𝑎 : 𝜇1 > 𝜇2 (the lean group are more active than the obese group) Note: Have the conditions of inference met? Since the subjects are volunteers, this is not a SRS. But (read text) the study did take precautions to that we can assume that the two groups are independent SRSs. Calculating the group means yields the following. This gives . Now we need to learn the details of inference comparing two means. Two – Sample t procedures Is this observed difference surprising? This depends on the spread of the observations as well as the two means. Widely different means can occur by chance so we need to take variation into account. So we need to standardize the observed difference 𝑥̅1 − 𝑥̅2 by dividing by its standard deviation. So this standard deviation is But we don’t know the population standard deviation so we use the sample standard deviation. This is called the standard error, or estimated standard deviation, of the differences in the sample means: We then standardize the estimate by dividing by the standard error. This is the two – sample t statistic: The interpretation of this is the same as any z or t statistic. It tells us how far 𝑥̅1 − 𝑥̅2 is from 0 in terms of standard deviations. The Two – Sample t Procedures: To test the hypothesis Ho: µ1 = µ2, calculate the two-sample t statistic Find P-values from the t distribution with degrees of freedom from either Option 1 (software) or Option 2 (the smaller of n1 − 1 and n2 − 1). Ex Daily Activity and Obesity continued The two – sample t statistic comparing the average minutes spent walking and standing in the 2 groups (lean vs. obese): Next we need to compute the degrees of freedom. Because both n1 − 1 = 9 and n2 − 1 = 9, there are 9 degrees of freedom. Because 𝐻𝑎 is one – sided, the P – value is the area to the right of 𝑡 = 3.808 under the t curve with df = 9. You can either use Table C or the calculator to compute this. Using table: So 0.001 < 𝑃 < 0.0025 Using the calculator (preferred): You can either enter the data in List 1 and 2 on your calculator or use the means and standard deviations given below the table (which was obviously computed using the calculator). Go to 2 – sample 𝑡 test (under Stats/Tests) and enter the means and standard deviations and the number in each sample. You will be asked if the samples are “pooled”. If “pooled” this means the 2 populations have the same variance. Since we have no way of knowing what the population variance is, we cannot accurately answer this question. “Unpooled” works if the variances are the same or not so always use “unpooled”. This is discussed on page 487 in the text. So you get 𝑡 = 3.81, 𝑝 = 8.414 ∗ 10−4 = .000841 So what does this tell us? There is very strong evidence (𝑃 = 0.0008) that lean people spend more time walking and standing than moderately obese people. Problem: Are the following two schools comparable in SAT scores? (or are the scores different?) School 1: A random sample of 43 students has a mean SAT of 502 and a standard deviation of 60. School 2: A random sample of 35 students has a mean SAT of 480 and a standard deviation of 50. Step 1: What are the null and alternative hypotheses? Step 2: Calculate the test statistic: Step 3: Compute the P – value Step 4: Conclude Step 5: Interpret the P – value Confidence Intervals for Two Sample Means Draw an SRS of size n1 from a large Normal population with unknown mean µ1, and draw an independent SRS of size n2 from another large Normal population with unknown mean µ2. A level C confidence interval for µ1 − µ2 is given by Here t* is the critical value for confidence level C for the t distribution with degrees of freedom from either Option 1 (software) or Option 2 (the smaller of n1 − 1 and n2 − 1). Example 18.4 How much more active are lean people? Give a 90% confidence interval for 𝜇1 − 𝜇2 , the difference in average daily minutes spent standing and walking between lean and moderately obese adults. Using 9 degrees of freedom, the critical value is 𝑡 ∗ = 1.833 (from Table C) or using 𝐼𝑛𝑣𝑇 under Distribution on calculator. 𝐼𝑛𝑣𝑇(0.95, 9) = 1.833. Why did I use 0.95 instead of 0.90? Conclusion: So we are 90% confident that the actual difference in average daily minutes spent standing and walking between mean and mildly obese individuals is between 79.09 and 225.87 minutes. Note: This is quite wide is because the samples are small and the variation among individuals is big. Problem: Find a 95% confidence interval for the difference in population means between the 2 schools. What does this confidence interval tell us?