ST 361 CI for mean difference

advertisement
ST 361 Estimation --- Interval Estimation for 1   2 (§7.5)
Topics:
I. Interval estimation: confidence interval
II. (Two-sided) Confidence interval for estimating population mean  (§7.2, 7.4)
(a) When the population SD  is known: use Z distribution
(b) When the population SD  is NOT known: use t distribution
III. (Two-sided) confidence interval for estimating population proportion  (§7.3)
IV. Two-sided confidence interval for estimating population mean difference 1   2 (§7.5)
(a) When the population SD’s 1 ,  2 are known
(b) When the population SD’s  1 ,  2 are NOT unknown
-----------------------------------------------------------------------------------------------------------------------IV. Inference on the difference of two population means:
Motivating example: A public health researcher is interested to learn if the average blood
pressure of blue-collar workers is different from that of white-collar worker.
Scenario I: A random sample of 35 blue-collar workers was collected, and the sample mean
systolic blood pressure and sample SD were 138mmHg and 17, respectively.
Suppose that for the population of white-collar workers, the mean is 145mmHg.
To answer the question of interest, we can calculate a 95% (other confidence levels may be
used too) CI for 1 , the mean systolic blood pressure of the population of blue-collar workers
and see if 145 is in that interval. A 95% CI of 1 :
Scenario II: A random sample of 35 blue-collar workers was collected, and the sample mean
systolic blood pressure and sample SD were 138mmHg and 17, respectively.
Because the population mean systolic blood pressure of white-collar workers is
not known, another sample of 40 while-collar workers was collected, and the
sample mean and sample SD were 143mmHg and 20 respectively.
To answer the question of interest (i.e. 1  2 ), we can calculate a 95% CI for 1  2 and
see if 0 is in the interval?
Problem: How do we calculate a CI for 1  2 for a given confidence level?
1
Assume 2 independent samples are obtained from 2 populations:
Population 1 with mean 1 and SD  1 . A sample obtained from Population 1 has sample
mean x1 and sample SD s1
Population 2 with mean  2 and SD  2 . A sample obtained from Population 2 has mean
x 2 and SD s2
 Question of interest: Do the two populations have the same mean, i.e., 1  2 ?
(1) A good point estimate for 1   2 is : x1  x2
(2) Sampling distribution of x1  x2  :
  x1  x2   1  2
(regardless the distribution of x1 and x 2 )
So x1  x2  is a unbiased estimator of 1   2  .
  x1  x2 

 12
n1
2
x1  x2

 22
n2
  
2
x1
(regardless the distribution of x1 and x 2 )
2
x2
  x1  x2    
2
x1

(3)
 x1  x2  ~ N (1  2 ,

2
x2
 12
n1
 12
n1

 22
n2
 12

n1

 22
n2
Interval Estimation------- assume

 22
n2
) if x1 ~Normal and x 2 ~Normal
x1  x2  ~ Normal
In general, the Confidence Interval for 1   2  is
 x1  x2    critical value   x  x
1
2
However, since 1 ,  2 are (usually) unknown, we replace them by the sample standard
deviations s1 and s2 , respectively.
2
Focus on the case of  1 and  2 unknown.
The Confidence Interval for 1   2  is
 x1  x2    t  critical value  
with degree of freedom (df) =
s12 s22

n1 n2
 SE1 2   SE2 2 


4
4
SE1
SE2

n1  1
n2  1

 
2

where SE1 
s1
s
and SE2  2
n1
n2
 The round down to the nearest integer.
Ex. (Back to the motivating example). What is the 95% confidence interval for the mean
difference of the blood pressure between blue-collar workers and white-collar workers?
Note that df 
 SE1 2   SE2 2 


 SE    SE 
4
1
n1  1
2
4
 72.94 =72.
2
n2  1
We have: n1  35, x1  138, s1  17, n2  40, x2  143, s2  20
Point estimate of 1   2 : x1  x2 =138 – 143 = -5.
Estimated standard error of x1  x2 :
s12 s22
17 2 202



 4.27
n1 n2
35 40
t-critical value = 2.
A 95% CI for 1   2 : [-5 – 2*4.27, -5+2*4.27] = [-13.54, 3.54], which contains 0. So it is
reasonable to think that the mean systolic BP between blue-collar and white collar workers
are the same.
3
Ex. Gas prices tend to be higher in the West coast. Let 1 be the mean gas price in the
East coast, and  2 be that in the West coast. Data were shown in the table below.
East
West
n (weeks)
25
20
x
1.95
2.10
Sample SD s
0.12
0.15
( Note that df 
 SE1 2   SE2 2 


4
4
SE1
SE2

n1  1
n2  1

 

2
 35.97 )
(a) What assumptions do we need in order to have the mean difference follow a normal
distribution? Answer: either the gas prices in bother east and west coast are normally
distributed or n1 and n2 are large (greater than 30)
(b) Calculate the 95% confidence interval of the mean difference.
Pointe estimate of 1  2 : x1  x2 = 1.95 - 2.10 = -0.15
Estimated standard error of x1  x2 :
s12 s22
0.122 0.152



 0.0412
n1 n2
25
20
t-critical value = 2.042 (use df=30)
A 95 CI for 1  2 : [-0.15 – 2.042*0.0412, -0.15 + 2.042*0.0412] = [-0.23, -0.07]
(c) How would you explain your results?
You would suggest that the average gas price in east coast is lower than that in west coast
since the 95% CI for 1   2 is in the left of zero.
4
Download