Stats: Modeling the World – Chapter 22 Chapter 22: Comparing Two

advertisement
Stats: Modeling the World – Chapter 22
Chapter 22: Comparing Two Proportions
For independent random variables, the standard deviation of the difference of two proportions is
_________________________.
Example:
In a study, researchers wanted to know how much of a difference it made on a male driver wearing his
seatbelt provided the passenger was a male or a female. In the study, there were 4208 male drivers
that had a female passenger where 2777 wore their seatbelt, and there were 2763 male drivers that had
a male passenger where 1363 wore their seatbelt.
Create a 95% confidence interval.
Stats: Modeling the World – Chapter 22
In hypothesis testing, we would set our null hypothesis to be ________ or _________.
Our hypothesis is about a new parameter, the _______________ in proportions.
We will need a standard error for that – do we already know this? ______ and ______.
We know the standard error of the difference in proportions is 𝑆𝐸(𝑝̂1 − 𝑝̂2 ) = √
𝑝̂1 π‘žΜ‚1
𝑛1
+
𝑝̂2 π‘žΜ‚2
.
𝑛2
The null hypothesis states these proportions are _________. To do a hypothesis test, we ____________
the null hypothesis is true, so there should be just a ______________ value of ____ in the SE formula.
Combining the counts like this to get an overall proportion is called _____________.
We find the combined proportion by doing π‘Μ‚π‘π‘œπ‘œπ‘™π‘’π‘‘ =
𝑛1 𝑝̂1 +𝑛2 𝑝̂2
.
𝑛1 +𝑛2
The Standard Error for the pooled proportion is π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑝̂1 − 𝑝̂2 ) = √
π‘Μ‚π‘π‘œπ‘œπ‘™π‘’π‘‘ π‘žΜ‚π‘π‘œπ‘œπ‘™π‘’π‘‘
𝑛1
+
π‘Μ‚π‘π‘œπ‘œπ‘™π‘’π‘‘ π‘žΜ‚π‘π‘œπ‘œπ‘™π‘’π‘‘
𝑛2
Is there really a difference between a male driver wearing his seatbelt based on the gender of the
passenger?
.
Stats: Modeling the World – Chapter 22
Chapter 22: Comparing Two Proportions
For independent random variables, the standard deviation of the difference of two proportions is
SE  pˆ1 ο€­ pˆ 2  ο€½
p1q1 p2 q2
.

n1
n2
Example:
In a study, researchers wanted to know how much of a difference it made on a male driver wearing his
seatbelt provided the passenger was a male or a female. In the study, there were 4208 male drivers
that had a female passenger where 2777 wore their seatbelt, and there were 2763 male drivers that had
a male passenger where 1363 wore their seatbelt.
Create a 95% confidence interval.
p1 : The proportion of male drivers who wear a seatbelt when next to a female passenger.
p2 : The proportion of male drivers who wear a seatbelt when next to a male passenger.
p1 ο€­ p2 : The difference in the proportions of male drivers who wear a seatbelt with a female passenger
than a male passenger.
Independence assumption: driver behavior is independent from car to car.
Randomization condition: the sample is random.
10% condition: the samples include far fewer than 10% of all male drivers accompanied by male or
female passengers.
Independent groups assumption: there is no reason to believe that seatbelt use among drivers with
male passengers and those with female passengers are not independent.
Success/Failure condition: Among the male drivers with female passengers, 2777 wore seatbelts and
1431 did not; of those driving with male passengers, 1363 wore seatbelts and 1400 did not. Each group
contained far more than 10 successes and 10 failures.
Under these conditions, the sampling distribution of the difference between the sample proportions is
approximately Normal. We can construct a two-proportion z-interval with 95% confidence.
 pˆ1 ο€­ pˆ 2  ο‚± z * ο‚΄
pˆ 1qˆ1 pˆ 2 qˆ 2

n1
n2
ο€½ 0.660 ο€­ 0.493 ο‚± 1.96 
ο‚» .143, .191
0.6600.340  0.4930.507 
4280
2763
I am 95% confident that the proportion of male drivers who wear seatbelts when driving next to a
female passenger is between 14.3 and 19.1 percentage points higher than the proportion who wear
seatbelts when driving next to a male passenger.
Stats: Modeling the World – Chapter 22
In hypothesis testing, we would set our null hypothesis to be p1 ο€½ p2 or p1 ο€­ p2 ο€½ 0 .
Our hypothesis is about a new parameter, the difference in proportions.
We will need a standard error for that – do we already know this? Yes and no.
We know the standard error of the difference in proportions is 𝑆𝐸(𝑝̂1 − 𝑝̂2 ) = √
𝑝̂1 π‘žΜ‚1
𝑛1
+
𝑝̂2 π‘žΜ‚2
.
𝑛2
The null hypothesis states these proportions are equal. To do a hypothesis test, we assume the null
hypothesis is true, so there should be just a single value of pΜ‚ in the SE formula.
Combining the counts like this to get an overall proportion is called pooling.
We find the combined proportion by doing π‘Μ‚π‘π‘œπ‘œπ‘™π‘’π‘‘ =
𝑛1 𝑝̂1 +𝑛2 𝑝̂2
.
𝑛1 +𝑛2
The Standard Error for the pooled proportion is π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑝̂1 − 𝑝̂2 ) = √
π‘Μ‚π‘π‘œπ‘œπ‘™π‘’π‘‘ π‘žΜ‚π‘π‘œπ‘œπ‘™π‘’π‘‘
𝑛1
+
π‘Μ‚π‘π‘œπ‘œπ‘™π‘’π‘‘ π‘žΜ‚π‘π‘œπ‘œπ‘™π‘’π‘‘
𝑛2
.
Is there really a difference between a male driver wearing his seatbelt based on the gender of the
passenger?
H 0 : p1 ο€½ p2
H A : p1 ο‚Ή p2
All assumptions and conditions were met previously with the confidence interval, so a two-proportion ztest can be used.
pˆ pooled ο€½
Success1  Success2 2777  1363 4140
ο€½
ο€½
ο‚» .594
n1  n2
4208  2763 6971
SE  pˆ pooled  ο€½
.594.406  .594.406 ο€½ .0120
4208
2763
.660 ο€­ .493 ο€­ 0 ο€½ 13.8874
zο€½
.0120


With a z-value this large, the corresponding P-value is 1.2096 ο‚΄ 10ο€­43 or 0. If the null hypothesis were
true, then probability of getting a difference in the proportions as extreme as what we observed is 0.
There is enough evidence to suggest that the proportion of male drivers that wear a seatbelt while next
to a female passenger is significantly higher than while next to a male passenger.
Download