CHAPTER 24 INFERENCE: Comparing Means 1 Comparing Two Means Two-Sample Problems ♦ The goal of this inference is to compare difference in the means of two different groups; we may wish to compare the responses to two treatments or to compare the characteristics of two populations. For these problems, we use a two-sample t-test or a two-sample t-interval. It is important to note that there needs to be a separate sample from each treatment or each population. Comparing Two Means Assumptions and Conditions for Comparing two means ♦ Independence • Randomization: Two random samples from two distinct populations. • 10% Condition: Both samples are less than 10% of the population ♦ Normality • Nearly Normal Condition: Both populations are normally distributed. ♦ Independent Groups • Distinct Groups: The two samples are independent of one another; that is, there is nothing (or no one) in both groups; also, one sample has no influence on the other. Two-Sample t Procedures In order to calculate the confidence interval or the test statistic, we need to use the Standard Error for the difference in the means. Don’t forget: VARIANCES ADD! SD( y1 y2 ) Var ( y1 ) Var ( y2 ) 2 1 2 n n 1 2 SE( y1 y2 ) 12 n1 22 n2 s12 s22 n1 n2 2 Two-Sample t Procedures Draw an SRS of size n1 from a normal population with unknown mean µ1, and draw an independent SRS of size n2 from another normal population with unknown mean µ2. The confidence interval (CI) for µ1 - µ2 given by y1 y2 t * s12 s22 n1 n2 has confidence level at least C no matter what the population standard deviations are for either population. Two-Sample t Procedures For a significance test, we let t* be the upper (1 – C) / 2 critical value for the t(k) distribution with df = k. To test the hypothesis Ho: µ1 – µ2 = 0, compute the two-sample t statistic t y1 y2 1 2 s12 s2 2 n n 2 1 and use P-values or critical values for the t(k) distribution. Two-Sample t Procedures k is degree of freedom for a two-sample t-test where the df of the smaller of (n1 – 1) and (n2 – 1). Here is the actual formula: 2 s s n1 n2 df 2 2 2 2 1 s1 1 s2 n1 1 n1 n2 1 n2 2 1 2 2 But most people agree to either use k = the smaller of (n1 – 1) or (n2 – 1) or for the most part we let the calculator deal with this formula. Harder Working Hearts Resting pulse rates for a random sample of 26 smokers had a mean of 80 beats per minute (bpm) and a standard deviation of 5 bpm. Among 32 randomly selected nonsmokers, the mean was 74 bpm and the standard deviation was 6 bpm. Both sets of data were roughly symmetric and had no outliers. Is there evidence of a difference in the mean pulse rate between smokers and nonsmokers? If so, how big? Harder Working Hearts Step 1: Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do (and determine what the question is asking). ♦ We wish to determine if there is evidence of a difference in mean pulse rate between smokers and nonsmokers. Let s represent smokers and n represent non-smokers • Null Hypotheses: H0: μs - μn = 0 » There is no difference in pulse rates. • Alternative Hypotheses: HA: μs - μn ≠ 0 » There is a difference in pulse rates Harder Working Hearts Step 2: Verify the Assumptions by checking the conditions ♦ Independence: • Randomization Condition: We are told that both samples were a random sample. • 10% Condition: We have less than 10 % of all smokers and nonsmokers • There is no reason to doubt independence. Harder Working Hearts Step 2: Verify the Assumptions by checking the conditions ♦ Normality: • We are told that both sets of data are unimodal and symmetric with no outliers, so it is safe to assume that the sampling distribution of both groups are approximately normal. ♦ Independent Groups: • Data comes from two distinct populations, smokers and nonsmokers. Harder Working Hearts Step 3: If conditions are met, Name the inference procedure, find the Test statistic, and Obtain the p-value in carrying out the inference: Name the test: We will use a Two-Sample T-test ns = 26 nn = 32 ys 80 yn 74 Ss = 5 Sn = 6 (80 74) 0 Test Statistic: t 4.15, df 56 2 2 5 6 26 32 Obtain the p-value: p value 0.0001 Harder Working Hearts Step 4: Make a decision (reject or fail to reject H0). State your conclusion in context of the problem using the p-value – make sure you relate your solution to the population mean! ♦ Such a small p-value, .0001, makes it unlikely that we get such a difference in the means from sampling error, so we reject the null hypothesis. There is strong evidence that there is a difference in pulse rates between smokers and nonsmokers. How Much More? Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. ♦ Step 1: State what you want to know in terms of the Parameter and determine what the question is asking • We want to find an interval that is likely, with 99% confidence, to contain the true difference in mean pulse rates, μs – μn, of smokers and nonsmokers. Let s represent smokers and n represent non-smokers. How Much More? Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. ♦ Step 2: Verify the Assumptions by checking the conditions All assumptions and conditions were satisfied in the previous problem. How Much More? Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. ♦ Step 3: Name the inference, do the work, and state the Interval: Name the test: This is a Two-Sample T-Interval Interval: (2.148, 9.852) How Much More? Determine the true difference in mean pulse rate between smokers and nonsmokers with 99% confidence. ♦ Step 4: State your Conclusion in context of the problem • We are 99% confident that the true difference in pulse rates between smokers and nonsmokers is between 2.148 and 9.852. In other words, we are 99% confident that smokers have a pulse rate between 2.148 and 9.852 bpm higher than nonsmokers. Pizza, Pizza!!! Nutritional information from two different national chains, Papa Johns and Dominos, were examined to determine the amount of saturated fat (in grams) in one slice of various pizzas. Use the data below to determine if there is a difference in the two chains in the amount of saturated fat that slices of pizzas contain. The following table represents saturated fat (in grams) per a slice of pizza: P 6 6 8 6 8 7 4 7 6 9 6 5 5 7 4.5 D 17 8 12 12 10 15 8 7 8 11 10 11 10 13 5 13 16 11 16 12 Pizza, Pizza!!! Step 1: Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do (and determine what the question is asking). ♦ We want to know if the two pizza chains have significantly different mean saturated fat contents. Let P represent Papa John’s and D represent Dominos • H0: μP - μD = 0 » There is no difference in mean saturated fat content. • HA: μP - μD ≠ 0 » There is a difference in mean saturated fat content Pizza, Pizza!!! Step 2: Verify the Assumptions by checking the conditions ♦ Independence: • Randomization Condition: We are not told if the samples were randomly selected. We will assume that the pizzas were representative of the population. If not representative, our results may not be valid. • 10% Condition: It is safe to assume that we have less than 10% of all pizza slices. Pizza, Pizza!!! Step 2: Verify the Assumptions by checking the conditions ♦ Normality: • Both samples are relatively small, so we look at the sample distributions: Papa John’s Domino’s It is safe to assume normality, since both samples are unimodal symmetric ♦ Independent Groups: • Data comes from two distinct populations, Papa John’s and Domino’s. Pizza, Pizza!!! Step 3: If conditions are met, Name the inference procedure, find the Test statistic, and Obtain the p-value in carrying out the inference: Name the test: We will use a Two-Sample T-test nP = 14 nD = 20 yP 6.393 yD 11.250 SP = 1.389 Sn = 3.193 (6.393 11.250) 0 Test Statistic: t 6.035, df 28 2 2 1.393 3.193 14 20 Obtain the p-value: p value 0.000001 Pizza, Pizza!!! Step 4: Make a decision (reject or fail to reject H0). State your conclusion in context of the problem using the p-value – make sure you relate your solution to the population mean! ♦ The p-value is extremely small, .000001, so we reject the null hypothesis. There is very strong evidence that there is a difference in saturated fat content between Papa John’s and Domino’s. To T or Not to T, That is the Question Sometimes, you may wonder if you should use t or z. If you know σ, use z (this is very rare and almost never happens in the real world). Whenever you use s to estimate σ, use t. What about pooling? ♦ If we know that the variances are equal (or willing to assume this), we pool the two groups; otherwise don’t pool difference in means. Assignment Chapter 24 Lesson: Comparing Means Read: Chapter 24 Problems: 1 - 33 (odd)