ST 361 Estimation --- Interval Estimation for 1 2 (§7.5) Topics: I. Interval estimation: confidence interval II. (Two-sided) Confidence interval for estimating population mean (§7.2, 7.4) (a) When the population SD is known: use Z distribution (b) When the population SD is NOT known: use t distribution III. (Two-sided) confidence interval for estimating population proportion (§7.3) IV. Two-sided confidence interval for estimating population mean difference 1 2 (§7.5) (a) When the population SD’s 1 , 2 are known (b) When the population SD’s 1 , 2 are NOT unknown -----------------------------------------------------------------------------------------------------------------------IV. Inference on the difference of two population means: Motivating example: A public health researcher is interested to learn if the average blood pressure of blue-collar workers is different from that of white-collar worker. Scenario I: A random sample of 35 blue-collar workers was collected, and the sample mean systolic blood pressure and sample SD were 138mmHg and 17, respectively. Suppose that for the population of white-collar workers, the mean is 145mmHg. To answer the question of interest, we can calculate a 95% (other confidence levels may be used too) CI for 1 , the mean systolic blood pressure of the population of blue-collar workers and see if 145 is in that interval. A 95% CI of 1 : Scenario II: A random sample of 35 blue-collar workers was collected, and the sample mean systolic blood pressure and sample SD were 138mmHg and 17, respectively. Because the population mean systolic blood pressure of white-collar workers is not known, another sample of 40 while-collar workers was collected, and the sample mean and sample SD were 143mmHg and 20 respectively. To answer the question of interest (i.e. 1 2 ), we can calculate a 95% CI for 1 2 and see if 0 is in the interval? Problem: How do we calculate a CI for 1 2 for a given confidence level? 1 Assume 2 independent samples are obtained from 2 populations: Population 1 with mean 1 and SD 1 . A sample obtained from Population 1 has sample mean x1 and sample SD s1 Population 2 with mean 2 and SD 2 . A sample obtained from Population 2 has mean x 2 and SD s2 Question of interest: Do the two populations have the same mean, i.e., 1 2 ? (1) A good point estimate for 1 2 is : x1 x2 (2) Sampling distribution of x1 x2 : x1 x2 1 2 (regardless the distribution of x1 and x 2 ) So x1 x2 is a unbiased estimator of 1 2 . x1 x2 12 n1 2 x1 x2 22 n2 2 x1 (regardless the distribution of x1 and x 2 ) 2 x2 x1 x2 2 x1 (3) x1 x2 ~ N (1 2 , 2 x2 12 n1 12 n1 22 n2 12 n1 22 n2 Interval Estimation------- assume 22 n2 ) if x1 ~Normal and x 2 ~Normal x1 x2 ~ Normal In general, the Confidence Interval for 1 2 is x1 x2 critical value x x 1 2 However, since 1 , 2 are (usually) unknown, we replace them by the sample standard deviations s1 and s2 , respectively. 2 Focus on the case of 1 and 2 unknown. The Confidence Interval for 1 2 is x1 x2 t critical value with degree of freedom (df) = s12 s22 n1 n2 SE1 2 SE2 2 4 4 SE1 SE2 n1 1 n2 1 2 where SE1 s1 s and SE2 2 n1 n2 The round down to the nearest integer. Ex. (Back to the motivating example). What is the 95% confidence interval for the mean difference of the blood pressure between blue-collar workers and white-collar workers? Note that df SE1 2 SE2 2 SE SE 4 1 n1 1 2 4 72.94 =72. 2 n2 1 We have: n1 35, x1 138, s1 17, n2 40, x2 143, s2 20 Point estimate of 1 2 : x1 x2 =138 – 143 = -5. Estimated standard error of x1 x2 : s12 s22 17 2 202 4.27 n1 n2 35 40 t-critical value = 2. A 95% CI for 1 2 : [-5 – 2*4.27, -5+2*4.27] = [-13.54, 3.54], which contains 0. So it is reasonable to think that the mean systolic BP between blue-collar and white collar workers are the same. 3 Ex. Gas prices tend to be higher in the West coast. Let 1 be the mean gas price in the East coast, and 2 be that in the West coast. Data were shown in the table below. East West n (weeks) 25 20 x 1.95 2.10 Sample SD s 0.12 0.15 ( Note that df SE1 2 SE2 2 4 4 SE1 SE2 n1 1 n2 1 2 35.97 ) (a) What assumptions do we need in order to have the mean difference follow a normal distribution? Answer: either the gas prices in bother east and west coast are normally distributed or n1 and n2 are large (greater than 30) (b) Calculate the 95% confidence interval of the mean difference. Pointe estimate of 1 2 : x1 x2 = 1.95 - 2.10 = -0.15 Estimated standard error of x1 x2 : s12 s22 0.122 0.152 0.0412 n1 n2 25 20 t-critical value = 2.042 (use df=30) A 95 CI for 1 2 : [-0.15 – 2.042*0.0412, -0.15 + 2.042*0.0412] = [-0.23, -0.07] (c) How would you explain your results? You would suggest that the average gas price in east coast is lower than that in west coast since the 95% CI for 1 2 is in the left of zero. 4