Confidence Intervals for the Difference between Two Population

advertisement
Confidence Intervals for
p1 - p2 and µ1 - µ2
Selected Sections of Chapters 22 and 24
1
Inference about Two Populations
• We are interested in:
– Confidence intervals for the difference between two
proportions.
– Confidence intervals for the difference between two
means.
2
Confidence Intervals for the difference
p1 – p2 between two population
proportions
• In this section we deal with two populations whose data
are qualitative.
• For qualitative data we compare the population
proportions of the occurrence of a certain event.
• Examples
– Comparing the effectiveness of new drug versus older one
– Comparing market share before and after advertising
campaign
– Comparing defective rates between two machines
3
Parameter and Statistic
• Parameter
– When the data are qualitative, we can only count the
occurrences of a certain event in the two
populations, and calculate proportions.
– The parameter we want to estimate is p1 – p2.
• Statistic
– An estimator of p1 – p2 is p̂1  p̂ 2 (the difference
between the sample proportions).
4
Point Estimator:
p̂1  p̂ 2
• Two random samples are drawn from two populations.
• The number of successes in each sample is recorded.
• The sample proportions are computed.
Sample 1
Sample size n1
Number of successes x1
Sample proportion
pˆ 1 =
x1
n1
Sample 2
Sample size n2
Number of successes x2
Sample proportion
x2
p̂ 2 =
n2
5
Confidence Interval for p1  p2
( pˆ 1  pˆ 2 )  z
*
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )

n1
n2
where z * is the appropriat e value from
the z - table that depends on the
confidence level
6
Example: confidence interval for p1 – p2
• Estimating the cost of life saved
– Two drugs are used to treat heart attack victims:
• Streptokinase (available since 1959, costs $460)
• t-PA (genetically engineered, costs $2900).
– The maker of t-PA claims that its drug outperforms
Streptokinase.
– An experiment was conducted in 15 countries.
• 20,500 patients were given t-PA
• 20,500 patients were given Streptokinase
• The number of deaths by heart attacks was recorded.
7
Example: confidence interval for p1 – p2
(cont.)
• Experiment results
– A total of 1497 patients treated with Streptokinase
died.
– A total of 1292 patients treated with t-PA died.
• Estimate the difference in the death rates when
using Streptokinase and when using t-PA.
8
Example: confidence interval for p1 – p2
(cont.)
• Solution
– The problem objective: Compare the outcomes of
two treatments.
– The data are qualitative (a patient lived or died)
– The parameter to be estimated is p1 – p2.
• p1 = death rate with Streptokinase
• p2 = death rate with t-PA
9
Example: confidence interval for p1 – p2
(cont.)
• Compute: Manually
1497
1292
= .0730 , p̂ 2 =
= .0630
– Sample proportions: p̂1 =
20500
20500
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )
( pˆ1  pˆ 2 )  1.96

n1
n2
– The 95% confidence interval estimate is
.0730  .0630  1.96
.0730(1  .0730) .0630(1  .0630)

= .0100  .0049
20500
20500
(.0051, .0149)
10
Example: confidence interval for p1 – p2
(cont.)
• Interpretation
– The interval (.0051, .0149) for p1 – p2 does not
contain 0; it is entirely positive, which indicates that
p1, the death rate for streptokinase, is greater than
p2, the death rate for t-PA.
– We estimate that the death rate for streptokinase is
between .51% and 1.49% higher than the death rate
for t-PA.
11
Example: 95% confidence interval for p1 – p2
The age at which a woman gives birth to her first child may be an
important factor in the risk of later developing breast cancer. An
international study conducted by WHO selected women with at least one
birth and recorded if they had breast cancer or not and whether they had
their first child before their 30th birthday or after.
Age at
First
Birth >
30
Cancer
Sample
Size
683
3220
Age at
1498
First
Birth <=
30
The parameter to be estimated is p1 – p2.
p1 = cancer rate when age at 1st birth >30
p2 = cancer rate when age at 1st birth <=30
21.2%
p̂1
10,245
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )
( pˆ1  pˆ 2 )  1.96

n1
n2
14.6%
p̂2
We estimate that the cancer rate when
age at first birth > 30 is between .05
and .082 higher than when age <= 30.
(.212  .146)  1.96
.212(.788)
3220

.146(.854)
10, 245
.066  1.96(.008) or .066  .016
(.05,.082)
12
Confidence Intervals for the
Difference between Two Population
Means µ1 - µ2: Independent Samples
• Two random samples are drawn from the two
populations of interest.
• Because we compare two population means, we
use the statistic x  x.
1
2
13
Population 1
Parameters: µ1 and 12
(values are unknown)
Sample size: n1
Statistics: x1 and s12
Population 2
Parameters: µ2 and 22
(values are unknown)
Sample size: n2
Statistics: x2 and s22
Estimate µ1 µ2 with x1 x2
14
Confidence Interval for m1 – m2
Confidence interval
2
2
1  2
n
n
1
2
where z * is the value from the z-table
( x  x )  z*
1 2
that corresponds to the confidence level
Note: when the values of 12 and 22 are unknown, the
sample variances s12 and s22 computed from the data
can be used.
15
Example: confidence interval for m1 – m2
– Do people who eat high-fiber cereal for
breakfast consume, on average, fewer
calories for lunch than people who do not eat
high-fiber cereal for breakfast?
– A sample of 150 people was randomly drawn.
Each person was identified as a consumer or
a non-consumer of high-fiber cereal.
– For each person the number of calories
consumed at lunch was recorded.
16
Example: confidence interval for m1 – m2
Consmers Non-cmrs
568
498
589
681
540
646
636
739
539
596
607
529
637
617
633
555
.
.
.
.
705
819
706
509
613
582
601
608
787
573
428
754
741
628
537
748
.
.
.
.
Solution:
• The parameter to be tested is
the difference between two means.
• The claim to be tested is:
The mean caloric intake of consumers (m1)
is less than that of non-consumers (m2).
n1 = 43, x1 = 604.02; n2 = 107, x2 = 633.239
• Use s12 = 4,103 for 12 and s22 = 10,670
for 22
17
Example: confidence interval for m1 – m2
• The confidence interval estimator for the
difference between two means is
( x  x )  z*
1 2
2
2
1  2
n
n
1
2
4103 10670
= (604.02  633.239)  1.96

43
107
= 29.21  27.38 =  56.59,  1.83
18
Interpretation
• The 95% CI is (-56.59, -1.83).
• We are 95% confident that the interval
(-56.59, -1.83) contains the true but unknown
difference m1 – m2
• Since the interval is entirely negative (that is,
does not contain 0), there is evidence from the
data that µ1 is less than µ2. We estimate that
non-consumers of high-fiber breakfast consume
on average between 1.83 and 56.59 more
calories for lunch.
19
Does smoking damage the lungs of children exposed
to parental smoking?
Forced vital capacity (FVC) is the volume (in milliliters) of
air that an individual can exhale in 6 seconds.
FVC was obtained for a sample of children not exposed to
parental smoking and a group of children exposed to
parental smoking.
Parental smoking
FVC
Yes
No
x
s
n
75.5
9.3
30
88.2
15.1
30

We want to know whether parental smoking decreases
children’s lung capacity as measured by the FVC test.
Is the mean FVC lower in the population of children
exposed to parental smoking?
Parental smoking
FVC x
s
n
Yes
75.5
9.3
30
No
88.2
15.1
30

95% confidence interval for (µ1 − µ2):
s12 s22
( x1  x2 )  z *

n1 n2
9.32 15.12
= (75.5  88.2)  1.96

30
30
12.7  1.96*3.24
12.7  6.35  (19.05,  6.35)
m1 = mean FVC of children
with a smoking parent;
m2 = mean FVC of children
without a smoking parent
We are 95% confident that lung capacity in children of smoking parents
is between 19.05 and 6.35 milliliters LESS than in children without a
smoking parent.
Bunny Rabbits and Pirates on the Box
• The data below show the sugar content (as a
percentage of weight) of 10 brands of cereal
randomly selected from a supermarket shelf that is
at a child’s eye level and 8 brands selected from the
top shelf.
Eye level
40.3
55
Top
20
2.2 7.5
45.7
43.3 50.3
45.9
53.5
43
4.4
16.6
14.5
10
22.2
44.2
44
Create and interpret a 95% confidence interval for the difference
m1 – m2 in mean sugar content, where m1 is the mean sugar content
of cereal at a child’s eye level and m2 is the mean sugar content of
cereal on the top shelf.
22
Eye
level
40.3
55
45.7
43.3
50.3
45.9
53.5
43
Top
20
2.2
7.5
4.4
22.2
16.6
14.5
10
44.2
44
Eye level: x1 = 46.52, s12 = 23.24, n1 = 10
top: x2 = 12.18, s = 53.32, n2 = 8
2
2
95% confidence interval:
( x1  x2 )  1.96
 12
n1

 22
n2
23.24 53.32
(46.52  12.18)  1.96

10
8
34.34  5.88  (28.46, 40.22)
23
Interpretation
• We are 95% confident that the interval
(28.46, 40.22) contains the true but
unknown value of m1 – m2.
• Note that the interval is entirely positive
(does not contain 0); therefore, it appears
that the mean amount of sugar m1 in cereal
on the shelf at a child’s eye level is larger
than the mean amount m2 on the top shelf.
24
Do left-handed people have a shorter life-expectancy than
right-handed people?
 Some psychologists believe that the stress of being lefthanded in a right-handed world leads to earlier deaths among
left-handers.
 Several studies have compared the life expectancies of lefthanders and right-handers.
 One such study resulted in the data shown in the table.
Handedness
Mean age at death
Left
Right
star left-handed quarterback
Steve Young
x
s
n
66.8
25.3
99
75.2
15.1
888
left-handed presidents

We will use the data to construct a confidence interval
for the difference in mean life expectancies for left-
handers and right-handers.
Is the mean life expectancy of left-handers less
than the mean life expectancy of right-handers?
Handedness
Mean age at death
s
n
Left
66.8
25.3
99
Right
75.2
15.1
888
95% confidence interval for (µ1 − µ2):
s12 s22
( x1  x2 )  z *

n1 n2
(25.3) 2 (15.1) 2
= (66.8  75.2)  1.96

99
888
8.4  1.96* 2.59
8.4  5.08  (13.48,  3.32)
The “Bambino”,left-handed hitter
Babe Ruth, baseball’s all-time
best hitter
m1 = mean life expectancy of
left-handers;
m2 = mean life expectancy of
right-handers
We are 95% confident that the mean life expectancy for lefthanders is between 3.32 and 13.48 years LESS than the mean
life expectancy for right-handers.
Example: confidence interval for m1 – m2
• Example
– An ergonomic chair can be assembled using two
different sets of operations (Method A and Method B)
– The operations manager would like to know whether
the assembly time under the two methods differ.
27
Example: confidence interval for m1 – m2
• Example
– Two samples are randomly and independently selected
• A sample of 25 workers assembled the chair using method A.
• A sample of 25 workers assembled the chair using method B.
• The assembly times were recorded
– Do the assembly times of the two methods differs?
28
Example: confidence interval for m1 – m2
Assembly times in Minutes
Method A Method B
6.8
5.2
Solution
5.0
6.7
• The parameter of interest is the difference
7.9
5.7
5.2
6.6
between two population means.
7.6
8.5
5.0
6.5
• The claim to be tested is whether a difference
5.9
5.9
5.2
6.7
between the two methods exists.
6.5
6.6
.
.
.
.
• Use s12 = .848 for 12 and s22 = 1.303
.
.
2
for

2
.
.
29
Example: confidence interval for m1 – m2
A 95% confidence interval for m1 - m2 is calculated as follows:
( x1  x2 )  z *
 12
n1

 22
n2
.848 1.303
= 6.288  6.016  1.96

25
25
= 0.272  0.5749 = [ 0.3029, 0.8469]
We are 95% confident that the interval (-0.3029 , 0.8469)
contains the true but unknown m1 - m2
Notice: “Zero” is included in the confidence interval
30
Download