Chapter 11 Comparing Two Populations or Treatments Suppose we have a population of adult men with a mean height of 71 inches and standard deviation of 2.5 inches. We also have a population of adult women with a mean height of 65 inches and standard deviation of 2.3 inches. Assume heights are normally distributed. Suppose we take a random sample of 30 men and a random sample of 25 women from their respective populations and calculate the On the next slide we will difference in their heights (man’s height – investigate this distribution. woman’s height). If we did this many times, what would the distribution of differences be like? Male Heights Female Heights Randomly take one of the sample means for the 71 sM = 2.5 65 sF = 2.3 males and one of Suppose we took repeated Suppose we took repeated the sample means samples of size n = 25 samples from the of size n = 30 from the for the females population of female heights population and of male heights and and find the calculated the sampledifference calculated means. in the sample means. We would have the sampling Weheights. would have the sampling mean distribution of xF 71 s xM Doing this repeatedly, we will create the sampling distribution of (xM – xF) 2.5 30 distribution of xM. 65 xM - xF s xF 2. 3 25 2.5 2 2.3 s x M xF 30 25 6 2 Heights Continued . . . • Describe the sampling distribution of the difference in mean heights between men and women. The sampling distribution is normally distributed 2.5 2 2.3 2 with s x M xF 71 65 6 • x M x F 30 25 What is the probability that the difference in mean heights of a random sample of 30 men and a random sample of 25 women is less than 5 inches? P ((xM xF ) 5) .0614 6 Properties of the Sampling Distribution of x1 – x2 If the random samples on which x1 and x2 are based are selected independently of one another, then 1. x1 x2 x1 x2 1 2 2 2 1 2 distribution s s 2 2 s s 1 2 of x1s– x2 isalways and x1 x 2 2 2 2 The sampling s s s 2. x1 x 2 x1 x2 Mean value of centered at then1valuen2of 1 – 2, so x1 – x2 isn1an n2 x1 –unbiased x2 statistic for estimating 1 – 2. 3. In n1 variance and n2 are large or theispopulation distributions The of both the differences are (at the leastsum approximately) normal, x1 and x2 each have (at of the variances. least approximately) normal distributions. This implies that the sampling distribution of x1 – x2 is also (approximately) normal. The properties for the sampling distribution of x1 – x2 implies that x1 – x2 can be standardized to obtain a variable with a sampling distribution that is approximately the standard normal (z) distribution. When two random samples are independently selected and n1 and n2 are both large or the population We must s1know andthe ss2 1isand distributions are (at least approximately)If normal, distribution of unknown s in we order 2 x1 x2 ( 1 2 ) must use tthis z to use s 12 s 22 distributions. procedure. n1 n2 is described (at least approximately) by the standard normal (z) distribution. Two-Sample t Test for Comparing Two Populations Null Hypothesis: H0: 1 – 2 = hypothesized value Test Statistic: t x1 x2 hypothesiz ed value 2 2 s s 1 2 A conservative of the P estimate The can hypothesized valuethe is tn1 found n2 by using value be often 0, but times curve with thethere numberare of degrees The appropriate df for the two-sample t test is ofwhen freedom equalinterested to the smaller of we are in V1 V2 2 (n1 –a1) or (n2 – 1). 2that 2 difference testing for df s1 s2 2 2 V2 where V1 is not and V1 V2 0. n1 n2 n1 1 n2 1 The computed number of df should be truncated to an integer. Two-Sample t Test for Comparing Two Populations Continued . . . Null Hypothesis: H0: 1 – 2 = hypothesized value Alternative Hypothesis: P-value: Ha: 1 – 2 > hypothesized value Area under the appropriate t curve to the right of the computed t Ha: 1 – 2 < hypothesized value Area under the appropriate t curve to the left of the computed t Ha: 1 – 2 2(area to right of computed ≠ hypothesized value t) if +t or 2(area to left of computed t) if -t Another Way to Write Hypothesis Statements: H0: 1 = - 22 = 0 Ha: 1 <- 2 < 0 Ha: 1 >- 2 > 0 Ha: 1 -≠ 22 ≠ 0 When the hypothesized value is 0, we Be sure to can rewrite define theseBOTH 1 and 2! hypothesis statements: Two-Sample t Test for Comparing Two Populations Continued . . . Assumptions: 1) The two samples are independently selected random samples from the populations of interest 2) The sample sizes are large (generally 30 or larger) or the population distributions are (at least approximately) normal. When comparing two treatment groups, use the following assumptions: 1) Individuals or objects are randomly assigned to treatments (or vice versa) 2) The sample sizes are large (generally 30 or larger) or the treatment response distributions are approximately normal. Are women still paid less than men for comparable work? A study was carried out in which salary data was collected from a random sample of men and from a random sample of women who worked as purchasing managers and who were subscribers to Purchasing magazine. Annual salaries (in thousands of dollars) appear below (the actual sample sizes were much larger). Use a = .05 to determine there is evidence If we hadifdefined 1convincing as the mean salary that the mean annual salary for male purchasing managers is greater female purchasing and 2 than thefor mean annual salary formanagers female purchasing as the mean salary for male purchasing managers. managers, then the correct alternative hypothesis would be the difference in Men 81 69 81 76 76 74 69 76 79 65 the means is less than 0. Women 78 60 67 61 62 73 71 58 68 48 H0: 1 – 2 = 0 Ha: 1 – 2 > 0 Where 1 = mean annual salary for male State the hypotheses: purchasing managers and 2 = mean annual salary for female purchasing managers Salary War Continued . . . Men 81 69 81 76 76 74 69 76 79 65 Women 78 60 67 61 62 73 71 58 68 48 H0: 1 – 2 = 0 Ha: 1 – 2 > 0 Where 1 = mean annual salary for male purchasing managers and 2 = mean annual salary for female purchasing managers Assumptions: 1)Given two independently selected random samples of male and female purchasing managers. Men 2) Since the sample sizes are small, we must Even though these from subscribers of Verifyare thesamples assumptions determine if it is plausible that the sampling Women Purchasingfor magazine, the two authors of the study believed distributions each of the populations 80 are approximately normal. Since the boxplots it was reasonable to view the samples as 60 are reasonably symmetrical withpopulations no outliers, of it interest. representative of the is plausible that the sampling distributions are approximately normal. Salary War Continued . . . Men 81 69 81 Women 78 60 67 76 76 74 69 76 79 65 61 62 73 71 58 68 48 Where 1 = mean annual salary for male H0: 1 – 2 = 0 purchasing managers and 2 = mean annual Ha: 1 –What 2 > 0potential typefor error salary female purchasing managers could we have made with this conclusion? 74.6 64.6 0 3.11 t Test Statistic: (round down) this 8.62 Type I 5.4 2 Truncate value. 10 10 P-value =.004 a = .05 Now find the area to the 2.916 7.3962 Since the P-value weinreject is convincing dftH0. There 15 .14 15 right of t <=a, 3.11 the 2 2 evidence that the mean salary for2.male purchasing 916the 7 .396statistic Compute test for with dfthe = 15. To find the P-value, first managers iscurve higher than mean salary female 9 9 andappropriate P-value find the df. purchasing managers. The Two-Sample t Confidence Interval for the Difference Between Two Population or Treatment Means The general formula for a confidence interval for 1 – 2 when 1) The two samples are independently selected random samples from the populations of interest 2)The sample sizes are large (generally 30 or larger) or the population distributions are (at least approximately) normal. s12 s22 isFor a comparison x1 x2 of(ttwo critical value) use the following treatments, n1 n2 assumptions: The t critical value is based onrandomly assigned to 1) Individuals or objects are V1 V2 2(or vice versa) treatments s12 s22 df V2 where V1 n and V1 2 V22 n2 2) The sample sizes are large (generally 30 or larger) 1 n1 1 n2 1 or the treatment response distributions are df should be truncated to an integer. approximately normal. In a study on food intake after sleep deprivation, men were randomly assigned to one of two treatment groups. The experimental group were required to sleep only 4 hours on each of two nights, while the control group were required to sleep 8 hours on each of two nights. The amount of food intake (Kcal) on the day following the two nights of sleep was measured. Compute a 95% confidence interval for the true difference in the mean food intake for the two sleeping conditions. 4-hour sleep 3585 4470 3068 5338 2221 4791 4435 3187 3901 3868 3869 4878 3632 4518 8-hour sleep 4965 3918 1987 4993 5220 3653 3510 4100 5792 4547 3319 3336 4304 4057 3099 3338 the mean and deviation for x4 = 3924 s4 Find = 829.67 x8standard = 4069.27 s8 = 952.90 each treatment. Food Intake Study Continued . . . 4-hour sleep 3585 4470 3068 5338 2221 4791 4435 3187 3901 3868 3869 4878 3632 4518 8-hour sleep 4965 3918 1987 4993 5220 3653 3510 4100 5792 4547 3319 3336 4304 4057 x4 = 3924 s4 = 829.67 3099 3338 x8 = 4069.27 s8 = 952.90 Assumptions: 1) Men were randomly assigned to two treatment groups Verify the assumptions. 2) The assumption of normal response 4-hour distributions is plausible because 8-hour both boxplots are approximately 4000 symmetrical with no outliers. Food Intake Study Continued . . . 4-hour sleep 3585 4470 3068 5338 2221 4791 4435 3187 3901 3868 3869 4878 3632 4518 8-hour sleep 4965 3918 1987 4993 5220 3653 3510 4100 5792 4547 3319 3336 4304 4057 3099 3338 x4 = 3924 s4 upon = 829.67 x8is= there 4069.27 s8 = 952.90 Based this interval, a significant difference in the mean food No, since 0 the is intwo the sleeping confidence interval, 829.67 2conditions? 952 .902 there is not intake for (3924 4069.27 ) 2.052that the mean ( 814for .1, 523 .6) convincing evidence food intake the 15 Calculate 15the interval. two sleep conditions are different. We are 95% confident that the true difference in the Interpret the interval in context. mean food intake for the two sleeping conditions is between -814.1 Kcal and 523.6 Kcal. Pooled t Test • Used when the variances of the two populations are equal (s1 = s2) • CombinesP-values information from both computed using the samples pooled t to create a “pooled” estimate of from the common procedure can be far the variance which is P-value used inif place of the two actual the population variances are not equal. sample standard deviations When the population variances are equal, • Is not widely used tdue to its sensitivity the pooled procedure is better at to any departure from the equal from variance assumption detecting deviations H0 than the two-sample t test. Suppose that an investigator wants to determine if regular aerobic exercise improves blood pressure. A random sample of people who jog regularly and a second random sample of people who do not exercise regularly are selected independently of one another. Can we conclude that the difference in mean blood pressure is attributed to jogging? What about other factors like weight? One way to avoid these difficulties would be to pair subjects by weight then assign one of the pair to jogging and the other to no exercise. Summary of the Paired t test for Comparing Two Population or Treatment Means Null Hypothesis: H0: d = hypothesized value xd hypothesiz ed value he hypothesized Where d tis Tthe mean of the value is Test Statistic: sd differences usually in 0the – meaning paired n that there is no difference. Where n is the number observations of sample differences and xd and sd are the mean and standard deviation of the sample differences. This test is based on df = n – 1. Alternative Hypothesis: Ha: d > hypothesized value Ha: d < hypothesized value Ha: d ≠ hypothesized value P-value: Area to the right of calculated t Area to the left of calculated t 2(area to the right of t) if +t or 2(area to the left of t) if -t Summary of the Paired t test for Comparing Two Population or Treatment Means Continued . . . Assumptions: 1. The samples are paired. 2. The n sample differences can be viewed as a random sample from a population of differences. 3. The number of sample differences is large (generally at least 30) or the population distribution of differences is (at least approximately) normal. Is this an example of paired samples? An engineering association wants to see if there is a difference in the mean annual salary for electrical engineers and chemical engineers. A random sample of electrical engineers is surveyed about their annual income. Another random sample of chemical engineers is surveyed about their annual income. No, there is no pairing of individuals, you have two independent samples Is this an example of paired samples? A pharmaceutical company wants to test its new weight-loss drug. Before giving the drug to volunteers, company researchers weigh each person. After a month of using the drug, each person’s weight is measured again. Yes, you have two observations on each individual, resulting in paired data. Can playing chess improve your memory? In a study, students who had not previously played chess participated in a program in which they took chess lessons and played chess daily for 9 months. Each student took a memory test before starting the chess program and again at the end of the 9-month period. If we had subtracted Post-test the alternative Student 1 2 3 4minus 5 Pre-test, 6 7 then 8 9 10 11 12 hypothesis be the Pre-test 510 610 640 675 600 550 610would 625 450 720mean 575 675 difference greater than540 0. 680 Post-test 850 790 850 775 700 775 700 is850 690 775 Difference -340 -180 -210 -100 -100 -225 -90 -225 -240 -55 35 H0: d = 0 First, find the differences Ha: State d < 0 the hypotheses. pre-test minus post-test. Where d is the mean memory score difference between students with no chess training and students who have completed chess training -5 Playing Chess Continued . . . Student 1 2 Pre-test 510 Post-test 850 790 850 775 700 775 700 850 690 775 540 680 Ha: d < 0 4 5 6 7 8 9 10 11 12 610 640 675 600 550 610 625 450 720 575 675 Difference -340 -180 H0: d = 0 3 -210 -100 -100 -225 -90 -225 -240 -55 35 -5 Where d is the mean memory score difference between students with no chess training and students who have completed chess training Assumptions: 1) Although the sample of students is not a random sample, the Verify assumptions investigator believed that it was reasonable to view the 12 sample differences as representative of all such differences. 2) A boxplot of the differences is approximately symmetrical with no outliers so the assumption of normality is plausible. Playing Chess Continued . . . Student 1 2 Pre-test 510 Post-test 850 790 850 775 700 775 700 850 690 775 540 680 Ha: d < 0 4 5 6 7 8 9 10 11 12 610 640 675 600 550 610 625 450 720 575 675 Difference -340 -180 H0: d = 0 3 -210 -100 -100 -225 -90 -225 -240 -55 35 -5 Where d is the mean memory score difference between students with no chess training and students State conclusion in who have completed chess the training Test Statistic: context. 144 .6 0 Compute the test statistic t 4.56 109.74 and P-value. 12 P-value ≈ 0 df = 11 a = .05 Since the P-value < a, we reject H0. There is convincing evidence to suggest that the mean memory score after chess training is higher than the mean memory score before training. Paired t Confidence Interval for d When 1. 2. 3. The samples are paired. The n sample differences can be viewed as a random sample from a population of differences. The number of sample differences is large (generally at least 30) or the population distribution of differences is (at least approximately) normal. the paired t interval for d is sd xd (t critical value) n Where df = n - 1 Playing Chess Revisited . . . Student 1 2 3 4 5 6 7 8 9 10 11 12 Pre-test 510 Post-test 850 790 850 775 700 775 700 850 690 775 540 680 610 640 675 600 550 610 625 450 720 575 675 Difference -340 -180 -210 -100 -100 -225 -90 -225 -240 -55 35 109.74 144.6 1.796 ( 201.5, 87.69) 12 Compute a 90% confidence interval for the are 90%in confident that the true meanWe difference memory scores before difference in memory scores chessmean training and the memory scores after before chess and the memory chesstraining training. scores after chess training is between -201.5 and -87.69. -5 Large-Sample Inferences Concerning the Difference Between Two Population or Treatment Proportions Some people seem to think that duct tape can fix anything . . . even remove warts! Investigators at Madigan Army Medical Center tested using duct tape to remove warts versus the more traditional freezing treatment. Suppose that the duct tape treatment will successfully remove 50% of warts and that the traditional freezing treatment will successfully remove 60% of warts. Let’s investigate the sampling distribution of pfreeze - ptape pfreeze = the true proportion ptape = the true proportion of of warts that are warts that are successfully removed successfully removed by freezing by using duct tape Randomly take pfreeze = .6 one of theptape = .5 Suppose we repeatedly treated Suppose sample we repeatedly treated 100 warts using the duct tapewarts 100 proportions for using the traditional method and calculatedthe the freezing freezing treatment and treatment and the proportion of proportion of warts that are calculated of the successfully removed. Weone would warts that are successfully sample have the .6 sampling removed. have the .6(.4)distribution proportions for We would .5 s pˆ .5(.5) s 100 . of ptape thesampling duct tape distribution of pˆpfreeze 100 treatment and find the pfreeze - ptape difference. Doing this repeatedly, we will create the sampling .6(.4) .5(.5) s pˆfreeze pˆtape distribution of 100 100 (pfreeze – ptape) freeze tape .1 Properties of the Sampling Distribution of p1 – p2 If two random samples are selected independently of a When performing one another, the following properties hold:the value hypothesis test,forwe Since p1 will null and p2use arethe unknown, 1. pˆ1 pˆ2 p1 p2 hypothesis that pp11 we will combine Use: This says that the sampling distribution of p1 – p2 is centered at and equal. We andp2p2are estimate ˆ ˆ p1 – p2 so p1 – p2 is an unbiased statistic for estimating p – p . nto p n p 22 2 1 11 ˆ p will not know theof the common value c p1 (1 p1 ) p2 (1 p2 ) n1 nfor 2 common value p1 2. s pˆ1 pˆ2 p and p 1 2 n1 n2 and p2. 3. If both n1 and n2 are large (that is, if n1p1 > 10, n1(1 – p1) > 10, n2p2 > 10, and n2(1 – p2) > 10), then p1 and p2 each have a sampling distribution that is approximately normal, and their difference p1 – p2 also has a sampling distribution that is approximately normal. Summary of Large-Sample z Test for p1 – p2 = 0 Null Hypothesis: H0: p1 – p2 = 0 Test Statistic: Use: z n1 pˆ1 n2 pˆ2 pˆc n1 n2 Alternative Hypothesis: Ha: p1 – p2 > 0 Ha: p1 – p2 < 0 Ha: p1 – p2 ≠ 0 pˆ1 pˆ2 ( p1 p2 ) pˆc (1 pˆc ) pˆc (1 pˆc ) n1 n2 P-value: area to the right of calculated z area to the left of calculated z 2(area to the right of z) if +z or 2(area to the left of z) if -z Another Way to Write Hypothesis statements: H00:: pp11 -=pp22= 0 H H p11 >- p p22 > 0 Haa:: p Haa:: p H p11 <- p p22 < 0 H p11 ≠- pp22≠ 0 Haa:: p Be sure to define both p1 & p2! Summary of Large-Sample z Test for p1 – p2 = 0 Continued . . . Assumption: 1) The samples are independently chosen p1 and are unknown we must use randomSince samples orp2treatments were p1 at andrandom p2 to verify that the samples are assigned to individuals or objects large enough. 2) Both sample sizes are large n1p1 > 10, n1(1 – p1) > 10, n2p2 > 10, n2(1 – p2) > 10 Investigators at Madigan Army Medical Center tested using duct tape to remove warts. Patients with warts were randomly assigned to either the duct tape treatment or to the more traditional freezing treatment. Those in the duct tape group wore duct tape over the wart for 6 days, then removed the tape, soaked the area in water, and used an emery board to scrape the area. This process was repeated for a maximum of 2 months or until the wart was gone. The data follows: n Number with wart successfully removed Liquid nitrogen freezing 100 60 Duct tape 104 88 Treatment Do these data suggest that freezing is less successful than duct tape in removing warts? Duct Tape Continued . . . Treatment n Number with wart successfully removed Liquid nitrogen freezing 100 60 Duct tape 104 88 H0: p1 – p2 = 0 Ha: p1 – p2 < 0 Where p1 is the true proportion of warts that would be successfully removed by freezing and p2 is the true proportion of warts that would be successfully removed by duct tape Assumptions: 1) Subjects were randomly assigned to the two treatments. 2) The sample sizes are large enough because: n1p1 = 100(.6) = 60 > 10 n1(1 – p1) = 100(.4) = 40 > 10 n2p2 = 100(.85) = 85 > 10 n2(1 – p2) = 100(.15) = 15 > 10 Duct Tape Continued . . . Treatment n Number with wart successfully removed Liquid nitrogen freezing 100 60 Duct tape 104 88 H0: p1 – p2 = 0 Ha: p1 – p2 < 0 z .6 .85 0 .73(.27) .73(.27) 100 104 pˆc 4.03 60 88 .73 100 104 P-value ≈ 0 a = .01 Since the P-value < a, we reject H0. There is convincing evidence to suggest the proportion of warts successfully removed is lower for freezing than for the duct tape treatment. A Large-Sample Confidence Interval for p1 – p2 When 1)The samples are independently chosen random samples or treatments were assigned at random to individuals or objects 2) Both sample sizes are large n1p1 > 10, n1(1 – p1) > 10, n2p2 > 10, n2(1 – p2) > 10 a large-sample confidence interval for p1 – p2 is pˆ pˆ z critical value 1 2 pˆ1 (1 pˆ1 ) pˆ2 (1 pˆ2 ) n1 n2 The article “Freedom of What?” (Associated Press, February 1, 2005) described a study in which high school students and high school teachers were asked whether they agreed with the following statement: “Students should be allowed to report controversial issues in their student newspapers without the approval of school authorities.” It was reported that 58% of students surveyed and 39% of teachers surveyed agreed with the statement. The two samples – 10,000 high school students and 8000 high school teachers – were selected from schools across the country. Compute a 90% confidence interval for the difference in proportion of students who agreed with the statement and the proportion of teachers who agreed with the statement. Newspaper Problem Continued . . . p1 = .58 p2 = .39 this confidence there 1) Assume Based that it isonreasonable to regard interval, these two does samples as being independently selected and representative of the populations appear to be a significant difference in proportion of interest. of students who agreed with the statement and 2) Both sample sizes are large enough the proportion of teachers who agreed with the n1p1 = 10000(.58) > 10, n1(1 – p1) = 10000(.42) > 10, statement? Explain. n2p2 = 8000(.39) > 10, n2(1 – p2) = 8000(.61) > 10 .58(.42) .39(.61) (.58 .39) 1.645 (.178, .202) 10000 8000 We are 90% confident that the difference in proportion of students who agreed with the statement and the proportion of teachers who agreed with the statement is between .178 and .202.