Chapter 13 Comparing Two Population Parameters AP Statistics Hamilton and Mann Lipitor or Pravachol • Which drug is more effective at lowering “bad cholesterol?” • To figure this out, researchers designed a study they called PROVE-IT. • They used 4000 people with heart disease as subjects. These people were randomly assigned to one of two treatment groups: Lipitor or Pravachol. • At the end of the study, researchers compared the mean “bad cholesterol levels” for the two groups. For Pravachol it was 95 mg/dl versus 62 mg/dl for Lipitor. Is this difference statistically significant? • This is a question about comparing two means. Lipitor or Pravachol • The researchers also compared the proportion of subjects in each group who died, had a heart attack, or suffered other serious consequences within two years. • For Pravachol, the proportion was 0.263 and for Lipitor it was 0.224. Is this a statistically significant difference? • This is a question about comparing two proportions. Success vs. Failure in Business • How do small businesses that fail differ from small businesses that succeed? • Business school researchers compared the asset liability ratios of two samples of firms started in 2000, one sample of failed businesses and one of firms that are still going after two years. • This observational study compares two random samples, one from each of two different populations. Two-Sample Problems • Comparing two populations or two treatments is one of the most common situations encountered in statistical practice. We call such situations twosample problems. Two-Sample Problems • A two-sample problem can arise from a randomized comparative experiment that randomly divides subjects into two groups and exposes each group to a different treatment, like the PROVE-IT Study. • Comparing random samples separately selected from two populations, like the successful and failed small businesses, is also a two-sample problem. • Unlike the matched pairs designs studied earlier, there is no matching of units in the two samples and two samples can be of different sizes. • Inference procedures for two-sample data differ from those of matched pairs. Comparing Means and Proportions • Who is more likely to binge drink: male or female college students? • This is obviously a two-sample problem because we are comparing the population of male college students to female college students. • To conduct this study, the Harvard School of Public Health surveyed random samples of male and female undergraduates at four-year colleges and universities about their drinking behaviors. • This observational study was designed to compare the proportion of undergraduate males who binge drink with the proportion of undergraduate females who binge drink. Comparing Means and Proportions • A bank wants to know which of two incentive plans will most increase the use of its credit cards. • We are comparing the effect of two different treatments here, so it is a two-sample problem. • It offers each incentive to a random sample of credit card customers and compares the amount charged during the following six months. • This is a randomized experiment designed to compare the mean amount spent under each of the two incentive “treatments.” CHAPTER 13 SECTION 1 Comparing Two Means HW: 13.1, 13.2, 13.4, 13.6, 13.8, 13.10, 13.11, 13.14, 13.16 Comparing Two Means • We can examine two-sample data graphically by comparing dotplots or stempots (for small samples) and boxplots or histograms (for large samples). • Now we will apply the ideas of formal inference in this setting. • When both population distributions are symmetric, and especially when they are approximately Normal, a comparison of the mean responses in the two populations is the most common goal of inference. Notation Parameters Statistics Population Variable Mean Standard Deviation Sample Size Mean Standard Deviation 1 x1 μ1 1 n1 x1 s1 2 x2 μ2 2 n2 x2 s2 • There are four unknown parameters, the two means and the two standard deviations. • We want to compare the two population means, either by giving a confidence interval for their difference µ1 - µ2 or by testing the hypothesis of no difference, H0:µ1= µ2. • We use the sample means and standard deviations to estimate the unknown parameters. Calcium and Blood Pressure • Does increasing the amount of calcium in our diet reduce blood pressure? • An examination of a large number of people revealed a relationship between calcium intake and blood pressure. The relationship was strongest for black men. As a result, researchers designed a randomized comparative experiment. • The subjects were 21 healthy black men. A randomly chosen group of 10 of the men received calcium supplements for 12 weeks. The other 11 men received a placebo pill that looked similar for the 12 weeks. Calcium and Blood Pressure • The response variable is the decrease in systolic blood pressure for a subject after 12 weeks. An increase appears as a negative response. • Group 1 will be the calcium group and Group 2 will be the placebo group. Here are the data. Group 1 – Calcium Group 7 -4 18 17 -3 -5 1 10 11 -2 -3 3 -5 5 2 -11 -1 Group 2 – Placebo Group -1 12 -1 • Here are the summary statistics. Group Treatment n s 1 Calcium 10 5.000 8.743 2 Placebo 11 -0.273 5.901 -3 Calcium and Blood Pressure • Notice that the calcium group experienced a drop in blood pressure, while the placebo group shows a small increase, Is this good evidence that calcium decreases blood pressure in the entire population of healthy black men more than a placebo does? • This example fits the two-sample setting because we have a separate sample from each treatment and we have not attempted to match them. • Since we are testing a claim, we will conduct a significance test and follow the Inference Toolbox. Calcium and Blood Pressure • Step 1: Hypotheses – We write the hypotheses in terms of the mean decreases we would see in the entire population μ1 of black men taking calcium for 12 weeks and μ2 for black men taking the placebo for 12 weeks. There are two possible hypotheses: or Calcium and Blood Pressure • Step 2 – Conditions – We do not know the name of the test, but we know the conditions we must check to compare two means. – SRS – The 21 subjects are not an SRS. Therefore, we may not be able to generalize our findings to all healthy black men. Since we randomly assigned treatments, however, any differences can be attributed to the treatments themselves. – Normality – Since we have small samples, we must look at a boxplot and histogram for both samples. There are no serious problems (outliers or serious departure from Normality). – Independence – Since we randomized the treatments, we can safely assume that the calcium and placebo are two independent samples. Calcium and Blood Pressure • The natural estimator of the difference µ1 - µ2 is the difference between the sample means: • This statistic measures the average advantage of calcium over the placebo. In order to use this, however, we need to know about its sampling distribution. In other words, we need to know what the mean and standard deviation would be for the population of differences if we took repeated samples many times. The Two-Sample z Statistic • Here are the facts about the sampling distribution of the difference between the two sample means of independent SRSs. • Therefore, • If both populations are Normal, then the distribution of is also Normal with Two-Sample z Statistic • When the statistic has a Normal distribution, we can standardize it to obtain a standard Normal z statistic. Two-Sample z Statistic • In the very unlikely case that we know both population standard deviations, the two-sample z statistic is what we would use to conduct inference about • Since we rarely know one, much less two, population standard deviations, we are going to move immediately to the more useful t procedures. Two-Sample t Procedures • Because we don’t know the population standard deviations, we estimate them with the standard deviations from our two samples. • The result is the standard error, or estimated standard deviation, of the difference in sample means: • We then standardize our estimate result if the two-sample t statistic: the Two-Sample t Procedures • The statistic t has the same interpretation as any z or t statistic: it says how far is from its mean in standard deviation units. • The two-sample t statistic has approximately a t distribution. It does not have exactly a t distribution even if the populations are both exactly Normal. The approximation is very close though. • There is a catch: we must use a messy formula to calculate the degrees of freedom. Often, the degrees of freedom are not whole numbers. Two-Sample t Procedures • There are two practical options for using the twosample t procedures: 1. With technology, use the statistic t with accurate critical values from the approximating t distribution. 2. Without technology, use the statistic t with critical values from the t distribution with degrees of freedom equal to the smaller of n1 – 1 and n2 – 1. These procedures are always conservative for any two Normal populations. • Technology will obviously use method 1. • We are going to start by looking at how to do method 2. Two-Sample t Procedures • These two-sample t procedures always err on the safe side, reporting higher P-values and lower confidence than may actually be true. The gap between what is reported and the truth is actually quite small unless the sample sizes are both small and unequal. • As the sample sizes increase, probability values based on t with degrees of freedom equal to the smaller of n1 – 1 and n2 – 1 become more accurate. • Lets complete our calcium and blood pressure problem from earlier. Calcium and Blood Pressure • Here are the summary statistics again. Group Treatment n s 1 Calcium 10 5.000 8.743 2 Placebo 11 -0.273 5.901 • Step 3 – Calculations • Since it was a one-sided test, we are looking for the probability being 1.604 or greater when we have 9 degrees of freedom. From the table, it is between 0.05 and 0.10. Calcium and Blood Pressure • Step 4 – Interpretation – The experiment provides some evidence that calcium reduces blood pressure, but the evidence falls short of the traditional 5% and 1% levels of significance. We would fail to reject H0 at both significance levels. Creating a Confidence Interval • We can estimate the difference in mean decreases in blood pressure for the hypothetical calcium and placebo populations using a two-sample t interval. • We have already checked all of the conditions. Group Treatment n S • Recall 1 Calcium 10 5.000 8.743 2 Placebo 11 -0.273 5.901 • Since the 90% confidence interval includes 0, we cannot reject H0:μ1 – μ2 = 0 against the two-sided alternative at the α = 0.10 level of significance. Sample Size Matters • Sample sizes strongly influence the P-value of a test. • A result that fails to be significant at a specified level α in a small sample may be significant in a larger sample. • For instance, the difference of 5.273 in the mean systolic blood pressures between our two groups was not significant. In a larger study with more subjects, they were able to obtain a P-value of 0.008. Robustness Again • The two-sample t procedures are more robust than the one-sample t procedures, particularly when the distributions are not symmetric. • When the sizes of the two samples are equal and the two populations being compared have distributions with similar shapes, probability values from the t table are quite accurate for a broad range of distributions for samples as small as 5. When the populations have different shapes, larger samples are needed. Robustness Again • As a guide to practice, adapt the guidelines on p. 655 for the use of one-sample t procedures to twosample t procedures by replacing “sample size” with the “sum of the sample sizes” as long as both samples are at least 5. • These guidelines err on the side of safety, especially when the two-samples are of equal size. • Whenever possible, try to make both samples the same size. Two-sample procedures are most robust against non-Normality when the sample sizes are equal and the conservative P-values are most accurate. Software Approximations for the DF • The t procedures remain exactly as before except that we use the t distribution with df given by the formula in the box above to give critical values and find P-values. Calcium and Blood Pressure • Here are the summary statistics again. Group Treatment n s 1 Calcium 10 5.000 8.743 2 Placebo 11 -0.273 5.901 • For improved accuracy, lets calculate the df given by the formula on the prior slide. • Notice that the P-value here is 0.064 compared to the 0.0716 we got from the conservative approach. Degrees of Freedom • The formula from the box will always give us df at least as large as the smaller of the two samples and never bigger than n1 + n2 -2. • The number of degrees of freedom is generally not a whole number. Since the table only has whole numbers, we will need to use technology to do these calculations easily. • Let’s do the Calcium and Blood Pressure problem on the calculator! • We should use the calculator to do these calculations from now on! DDT Poisoning • Poisoning by the pesticide DDT causes convulsions in humans and other mammals. Researchers seek to understand how the convulsions are caused. In a randomized comparative experiment, the compared 6 white rats poisoned with DDT with a control group of 6 unpoisoned rats. Electrical measurements of nerve activity are the main clue to the nature of DDT poisoning. When a nerve is stimulated, its electrical response shows a sharp spike followed by a much smaller second spike. The experiment found that the second spike is larger in rats fed DDT than in normal rats. DDT Poisoning • The researchers measured the height (or amplitude) of the second spike as a percent of the first spike when a nerve in the rats leg was stimulated. • For the poisoned rats the results were: 12.207 16.869 25.050 22.429 8.456 20.589 • For the control group the results were: 11.074 9.686 12.064 9.351 8.182 6.642 • Let’s conduct a significance test at the 0.05 significance level to determine if there is a difference using the calculator. DDT Poisoning • Step 1 – Hypotheses – We want to compare the mean height μ1 of the secondspike electrical response in rats fed DDT with the mean height μ2 of the second-spike electrical response in the population of normal rats. Or DDT Poisoning • Step 2 – Conditions – Since both population standard deviations are unknown we need to conduct a 2-sample t test. – SRS – By randomly assigning the rats to the treatments, we can conclude that differences are a result of the treatment. The researchers are willing to assume that the two samples of rats represent an SRS. – Normality – We don’t know if the populations are Normal and do not have a large enough sample. We must look at a boxplot and histogram. No outliers or heavy skewness. – Independence – Due to the random assignment, the researchers can treat the two groups as independent. DDT Poisoning • Step 3 – Calculations – Since it is a two-sided hypothesis, we must find the probability that we are less than -2.99 or greater than 2.99. – The degrees of freedom are df = 5.9 and the P-value from t(5.9) distribution is 0.0246. • Step 4 – Conclusion – Since 0.0246 is less than the significance level of 0.05, we reject the null hypothesis and conclude that there is sufficient evidence to conclude that the height of the second-spike electrical response in rats fed DDT differs from that of normal rats. Pooled Two-Sample t Procedures • Do not use them. • If a printout says pooled, do not use that. Instead use the one that says unpooled. • On the calculator, always do No for pooled. • If you want more information you can read it on p. 800. CHAPTER 13 SECTION 2 Comparing Two Proportions HW: 13.26, 13.27, 13.28, 13.29, 13.30, 13.32, 13.33, 13.38 Prayer and In Vitro Pregnancy • Some women want to have children but cannot for medical reasons. One option for these women is in vitro fertilization. About 28% of women who undergo in vitro fertilization get pregnant. Can praying for these women help increase the pregnancy rate? • Researchers developed an experiment to help answer this question. (Why not just survey women who have already gone through in vitro to find out if a higher percentage of women who were prayed for got pregnant?) Prayer and In Vitro Pregnancy • A large group of women who were about to undergo in vitro fertilization served as the subjects. Each subject was randomly assigned to the treatment group (prayed for by people who did not know them) or a control group (no prayer). • The results: 44 of the 88 women (50%) got pregnant in the treatment (prayer) group while only 21 out of 81 got pregnant in the control group. • This seems like a large difference, but is it statistically significant? Two-Sample Proportions • We will use notation that is similar to what we used for two-sample means. We still want to compare two groups, Population 1 and Population 2. • Here is the notation: Population Population Proportion Sample Size Sample Proportion 1 p1 n1 pˆ1 2 p2 n2 pˆ 2 • We compare the populations by doing inference about the difference p1 - p2 between the population proportions. • The statistic that estimates this difference is Does Preschool Help? • To study the long-term effects of preschool programs for poor children, the High/Scope Educational Research Foundation has followed two groups of Michigan children since early childhood. – Group 1: Control Group – 61 children from population 1, poor children with no preschool – Group 2: Treatment Group – 62 children from population 2, poor children with preschool as 3- and 4-year-olds. • Both groups were from the same area and had similar backgrounds. • So our sample sizes are n1 = 61 and n2 = 62. Does Preschool Help? • One response variable of interest is the need for social services as adults. In the past ten years, 49 of the control sample and 38 of the preschool sample had needed social services. So the sample proportions are: • To see if the study provides significant evidence that preschool reduces the later need for social services, we are going to create a 95% confidence interval. Does Preschool Help? • To estimate how large the reduction is, we give a confidence interval for the difference. • Both the test and the confidence interval start with the difference in the sample proportions: • This means we need to know the sampling distribution of • So let’s look at that now! Sampling Distribution of . • Both are random variables because their values would vary if we took repeated samples of the same size. • In Chapter 7, we learned that if X and Y are any two random variables then • In Chapter 9, we learned that Sampling Distribution of . • Using all of this information, we can find the mean and standard deviation of • If the two sample proportions are independent, • Thus Sampling Distribution of . • As far as the shape, the distribution will be approximately normal when both of the distributions are approximately Normal. • In other words, • Actually, we are safe performing significance tests about as long as all of these values are greater than 5. • The distribution of is on the next graph. Sampling Distribution of . Sampling Distribution of . • The standard deviation of involves the unknown parameters p1 and p2. • Just like in Chapter 12, we must replace these by estimates in order to do inference. • Just like in Chapter 12, we do this a bit differently for confidence intervals and significance tests. Confidence Intervals for . • To obtain a confidence interval, replace p1 and p2 in the expression for with the sample proportions. • The result is the standard error of the statistic • The confidence interval again has the form Does Preschool Help? • Here is a summary of the information from the preschool problem we discussed earlier. Population Population Description Sample Size 1 Control n1 = 61 2 Preschool n2 = 62 Sample Proportion 49 0.803 61 38 pˆ 2 0.613 62 pˆ 1 • We setup our hypotheses earlier. So we have already done Step 1. Here are the Hypotheses as a reminder. or Does Preschool Help • Step 2 – Conditions – We are going to construct a two-proportion z interval. – SRS – We were not told how the children were selected, so we must be cautious when drawing conclusions. – Normality - Since all are at least 5 we can assume Normality. – Independence – We are fairly certain that there are at least 610 poor children who did not attend preschool and 620 poor children who did attend preschool in our populations of interest. Does Preschool Help • Step 3 – Calculations • Step 4 – Interpretation – We are 95% confident that the percent needing social services is between 3.3% and 34.7% lower among those who attended preschool. The interval is wide because of the small sample sizes. Also, our results may be questionable due to the fact that the samples may not have been SRSs. Significance Tests for . • Observed differences in sample proportions may reflect a difference in the populations, or it may just be due to variation due to random sampling. • Significance tests help us to determine if the difference we see is really there or just chance variation. • The null hypothesis will always say that there is no difference in the two populations. Hence • The alternative hypothesis will always say what kind of difference we expect. Significance Tests for . • To conduct a significance test, we must standardize to get a z statistic. • If H0 is true, all the observations in both samples come from a single population. • So, instead of estimating p1 and p2 separately, we combine the two samples and use the overall sample proportion to estimate the single population parameter p. Significance Tests for . • We call this single proportion the combined sample proportion. It is • Now, we use in place of both expression for the standard error of in the • This yields a z statistic that has the standard Normal distribution when H0 is true. Cholesterol and Heart Attacks • High levels of cholesterol in the blood are associated with higher risk of heart attacks. Does using a drug to lower blood cholesterol reduce heart attacks? • The Helsinki Heart Study looked at this question by randomly assigning middle-aged men to one of two treatments: 2051 men took the drug gemfibrozil to reduce their cholesterol levels, and a control group of 2030 men took a placebo. • During the next 5 years, 56 men in the gemfibrozil group and 84 men in the control group had heart attacks. Cholesterol and Heart Attacks Population Population Description Sample Size 1 Gemfibrozil n1 = 2051 2 Control n2 = 2030 Sample Proportion 56 0.0273 2051 84 pˆ 2 0.0414 2030 pˆ 1 • Is the apparent benefit of gemfibrozil statistically significant? • To answer this question, we need to conduct a significance test. • To conduct a significance test we need So let’s find Cholesterol and Heart Attacks • Step 1 – Hypotheses – We want to use this comparative randomized experiment to draw conclusions about p1, the proportion of middleaged men who would suffer heart attacks after taking gemfibrozil, and p2, the proportion of middleaged men who would suffer heart attacks if they only took a placebo. We hope to show that gemfibrozil reduces heart attacks, so we have a one-sided alternative. Cholesterol and Heart Attacks • Step 2 – Conditions - We are going to conduct a twoproportion z test. – SRS – Since the data come from a comparative randomized experiment, we meet this condition. This will allow us to conclude that the treatment caused the differences we observe. Since the men in the experiment were not randomly selected, we may not be able to generalize our results to the population of all middle-aged men. – Normality – We must use to check for Normality since we are assuming that both proportions are the same. So – Independence – Due to the random assignment of men, the two groups of men can be viewed as independent samples. Cholesterol and Heart Attacks • Step 3 – Calculations • We believed it would decrease heart attacks, so we need the probability that we are less than or equal to -2.47. Cholesterol and Heart Attacks • Step 4 – Interpretation – Since our P-value (0.0068) is less than 0.01, our results are significant at the α = 0.01 significance level. So there is strong evidence that gemfibrozil reduced the rate of heart attacks. Don’t Drink the Water • The movie A Civil Action tells the story of a legal battle that took place in the small town of Woburn, Massachusetts. A town well that supplied water to East Woburn residents was contaminated by industrial chemicals. During the period that residents drank the water from this well, a sample of 414 births showed 16 birth defects. On the west side of Woburn, a sample of 228 babies born during the same time period revealed 3 with birth defects. The plaintiffs suing the companies responsible for the contamination claimed that these data show that the rate of birth defects was significantly higher in East Woburn, where the contaminated well water was in use. How strong is the evidence supporting the claim? What decision should the judge make? Don’t Drink the Water Population Population Description Sample Size 1 East Coburn n1 = 414 2 West Coburn n2 = 228 • To conduct a significance test we need find Sample Proportion 16 0.0386 414 3 pˆ 2 0.0132 228 pˆ 1 So let’s • Step 1 – Hypotheses – We are interested in seeing if there is a difference in the proportion of birth defects between East and West Coburn. Don’t Drink the Water • Conditions – We are going to conduct a TwoProportion z test. – SRS – We don’t know that they are SRSs, but we will treat them as SRSs. – Normality – We must check our rules. Since each is larger than 5, it is approximately Normal. – Independence – We must assume that both populations are at least 10 times as large as the sample of babies. Don’t Drink the Water • Step 3 - Calculations – The P-value would be the probability that we would be 1.82 or greater. • Step 4 – Interpretation – Since the P-value (0.0344) is smaller than the usual level of significance of 0.05, we reject the null hypothesis and conclude that there is reason to believe that the proportion of birth defects was higher in East Coburn.