Stats: Modeling the World – Chapter 22 Chapter 22: Comparing Two Proportions For independent random variables, the standard deviation of the difference of two proportions is _________________________. Example: In a study, researchers wanted to know how much of a difference it made on a male driver wearing his seatbelt provided the passenger was a male or a female. In the study, there were 4208 male drivers that had a female passenger where 2777 wore their seatbelt, and there were 2763 male drivers that had a male passenger where 1363 wore their seatbelt. Create a 95% confidence interval. Stats: Modeling the World – Chapter 22 In hypothesis testing, we would set our null hypothesis to be ________ or _________. Our hypothesis is about a new parameter, the _______________ in proportions. We will need a standard error for that – do we already know this? ______ and ______. We know the standard error of the difference in proportions is ππΈ(πΜ1 − πΜ2 ) = √ πΜ1 πΜ1 π1 + πΜ2 πΜ2 . π2 The null hypothesis states these proportions are _________. To do a hypothesis test, we ____________ the null hypothesis is true, so there should be just a ______________ value of ____ in the SE formula. Combining the counts like this to get an overall proportion is called _____________. We find the combined proportion by doing πΜππππππ = π1 πΜ1 +π2 πΜ2 . π1 +π2 The Standard Error for the pooled proportion is ππΈππππππ (πΜ1 − πΜ2 ) = √ πΜππππππ πΜππππππ π1 + πΜππππππ πΜππππππ π2 Is there really a difference between a male driver wearing his seatbelt based on the gender of the passenger? . Stats: Modeling the World – Chapter 22 Chapter 22: Comparing Two Proportions For independent random variables, the standard deviation of the difference of two proportions is SE ο¨ pˆ1 ο pˆ 2 ο© ο½ p1q1 p2 q2 . ο« n1 n2 Example: In a study, researchers wanted to know how much of a difference it made on a male driver wearing his seatbelt provided the passenger was a male or a female. In the study, there were 4208 male drivers that had a female passenger where 2777 wore their seatbelt, and there were 2763 male drivers that had a male passenger where 1363 wore their seatbelt. Create a 95% confidence interval. p1 : The proportion of male drivers who wear a seatbelt when next to a female passenger. p2 : The proportion of male drivers who wear a seatbelt when next to a male passenger. p1 ο p2 : The difference in the proportions of male drivers who wear a seatbelt with a female passenger than a male passenger. Independence assumption: driver behavior is independent from car to car. Randomization condition: the sample is random. 10% condition: the samples include far fewer than 10% of all male drivers accompanied by male or female passengers. Independent groups assumption: there is no reason to believe that seatbelt use among drivers with male passengers and those with female passengers are not independent. Success/Failure condition: Among the male drivers with female passengers, 2777 wore seatbelts and 1431 did not; of those driving with male passengers, 1363 wore seatbelts and 1400 did not. Each group contained far more than 10 successes and 10 failures. Under these conditions, the sampling distribution of the difference between the sample proportions is approximately Normal. We can construct a two-proportion z-interval with 95% confidence. ο¨ pˆ1 ο pˆ 2 ο© ο± z * ο΄ pˆ 1qˆ1 pˆ 2 qˆ 2 ο« n1 n2 ο½ ο¨0.660 ο 0.493ο© ο± ο¨1.96 ο© ο» ο¨.143, .191ο© ο¨0.660ο©ο¨0.340ο© ο« ο¨0.493ο©ο¨0.507 ο© 4280 2763 I am 95% confident that the proportion of male drivers who wear seatbelts when driving next to a female passenger is between 14.3 and 19.1 percentage points higher than the proportion who wear seatbelts when driving next to a male passenger. Stats: Modeling the World – Chapter 22 In hypothesis testing, we would set our null hypothesis to be p1 ο½ p2 or p1 ο p2 ο½ 0 . Our hypothesis is about a new parameter, the difference in proportions. We will need a standard error for that – do we already know this? Yes and no. We know the standard error of the difference in proportions is ππΈ(πΜ1 − πΜ2 ) = √ πΜ1 πΜ1 π1 + πΜ2 πΜ2 . π2 The null hypothesis states these proportions are equal. To do a hypothesis test, we assume the null hypothesis is true, so there should be just a single value of pΜ in the SE formula. Combining the counts like this to get an overall proportion is called pooling. We find the combined proportion by doing πΜππππππ = π1 πΜ1 +π2 πΜ2 . π1 +π2 The Standard Error for the pooled proportion is ππΈππππππ (πΜ1 − πΜ2 ) = √ πΜππππππ πΜππππππ π1 + πΜππππππ πΜππππππ π2 . Is there really a difference between a male driver wearing his seatbelt based on the gender of the passenger? H 0 : p1 ο½ p2 H A : p1 οΉ p2 All assumptions and conditions were met previously with the confidence interval, so a two-proportion ztest can be used. pˆ pooled ο½ Success1 ο« Success2 2777 ο« 1363 4140 ο½ ο½ ο» .594 n1 ο« n2 4208 ο« 2763 6971 SE ο¨ pˆ pooled ο© ο½ ο¨.594ο©ο¨.406ο© ο« ο¨.594ο©ο¨.406ο© ο½ .0120 4208 2763 ο¨.660 ο .493ο© ο 0 ο½ 13.8874 zο½ .0120 ο¨ ο© With a z-value this large, the corresponding P-value is 1.2096 ο΄ 10ο43 or 0. If the null hypothesis were true, then probability of getting a difference in the proportions as extreme as what we observed is 0. There is enough evidence to suggest that the proportion of male drivers that wear a seatbelt while next to a female passenger is significantly higher than while next to a male passenger.