EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics 1/49 Estimation: Additional Topics Chapter Topics Population Means, Dependent Samples Population Means, Independent Samples Population Proportions Population Variance Proportion 1 vs. Proportion 2 Variance of a normal distribution Examples: Same group before vs. after treatment Group 1 vs. independent Group 2 2/49 1. Dependent Samples Tests Means of 2 Related Populations Dependent samples Paired or matched samples Repeated measures (before/after) Use difference between paired values: di = xi - yi Eliminates Variation Among Subjects Assumptions: Both Populations Are Normally Distributed 3/49 Mean Difference The ith paired difference is di , where Dependent samples di = xi - yi The point estimate for the population mean paired difference is d : The sample standard deviation is: n d d i 1 i n n Sd 2 (d d ) i i1 n 1 n is the number of matched pairs in the sample 4/49 Confidence Interval for Mean Difference Dependent samples The confidence interval for difference between population means, μd , is d t n1,α/2 Sd Sd μd d t n1,α/2 n n Where n = the sample size (number of matched pairs in the paired sample) 5/49 Confidence Interval for Mean Difference (continued) Dependent samples The margin of error is ME t n1,α/2 sd n tn-1,/2 is the value from the Student’s t distribution with (n – 1) degrees of freedom for which α P(t n1 t n1,α/2 ) 2 6/49 Paired Samples Example Six people sign up for a weight loss program. You collect the following data: Person 1 2 3 4 5 6 Weight: Before (x) After (y) 136 205 157 138 175 166 125 195 150 140 165 160 Difference, di 11 10 7 -2 10 6 42 di d = n = 7.0 Sd 2 (d d ) i n 1 4.82 7/49 Paired Samples Example (continued) For a 95% confidence level, the appropriate t value is tn-1,/2 = t5,0.025 = 2.571 The 95% confidence interval for the difference between means, μd , is d t n1,α/2 7 (2.571) Sd S μd d t n1,α/2 d n n 4.82 4.82 μd 7 (2.571) 6 6 1.94 μd 12.06 Since this interval contains zero, we can not be 95% confident, given this limited data, that the weight loss program helps people lose weight 8/49 2. Difference Between Two Means Population means, independent samples Goal: Form a confidence interval for the difference between two population means, μx – μy Different data sources Unrelated Independent Sample selected from one population has no effect on the sample selected from the other population The point estimate is the difference between the two sample means: x–y 9/49 Difference Between Two Means (continued) Population means, independent samples σx2 and σy2 known Confidence interval uses z/2 σx2 and σy2 unknown σx2 and σy2 assumed equal σx2 and σy2 assumed unequal Confidence interval uses a value from the Student’s t distribution 10/49 2. σx2 and σy2 Known Population means, independent samples σx2 and σy2 known σx2 and σy2 unknown Assumptions: * Samples are randomly and independently drawn both population distributions are normal Population variances are known 11/49 σx2 and σy2 Known (continued) When σx and σy are known and both populations are normal, the variance of X – Y is Population means, independent samples 2 σx2 and σy2 known σx2 and σy2 unknown * σ 2X Y 2 σy σx nx ny …and the random variable Z (x y) (μX μY ) 2 σ 2x σ y nX nY has a standard normal distribution 12/49 Confidence Interval, σx2 and σy2 Known Population means, independent samples σx2 and σy2 known σx2 and σy2 unknown (x y) z α/2 * The confidence interval for μx – μy is: σ 2X σ 2Y σ 2X σ 2Y μX μY (x y) z α/2 nx ny nx ny 13/49 3.a) σx2 and σy2 Unknown, Assumed Equal Assumptions: Population means, independent samples Samples are randomly and independently drawn σx2 and σy2 known Populations are normally distributed σx2 and σy2 unknown σx2 and σy2 assumed equal * Population variances are unknown but assumed equal σx2 and σy2 assumed unequal 14/49 σx2 and σy2 Unknown, Assumed Equal (continued) Forming interval estimates: Population means, independent samples The population variances are assumed equal, so use the two sample standard deviations and pool them to estimate σ σx2 and σy2 known σx2 and σy2 unknown σx2 and σy2 assumed equal σx2 and σy2 assumed unequal * use a t value with (nx + ny – 2) degrees of freedom 15/49 σx2 and σy2 Unknown, Assumed Equal (continued) Population means, independent samples The pooled variance is σx2 and σy2 known σx2 and σy2 unknown σx2 and σy2 assumed equal * sp2 (n x 1)s 2x (n y 1)s 2y nx ny 2 σx2 and σy2 assumed unequal 16/49 Confidence Interval, σx2 and σy2 Unknown, Equal σx2 and σy2 unknown σx2 and σy2 assumed equal * The confidence interval for μ1 – μ2 is: sp2 sp2 σx2 and σy2 assumed unequal (x y) t nx ny 2,α/2 Where sp2 sp2 nx ny μX μY (x y) t nx ny 2,α/2 nx sp2 ny (n x 1)s 2x (n y 1)s 2y nx ny 2 17/49 Pooled Variance Example You are testing two computer processors for speed. Form a confidence interval for the difference in CPU speed. You collect the following speed data (in Mhz): CPUx Number Tested 17 Sample mean 3,004 Sample std dev 74 CPUy 14 2,538 56 Assume both populations are normal with equal variances, and use 95% confidence 18/49 Calculating the Pooled Variance The pooled variance is: 2 2 n 1 S n 1 S 17 1742 14 1562 x x y y 2 S p (n x 1) (ny 1) (17 - 1) (14 1) 4427.03 The t value for a 95% confidence interval is: tnx ny 2 , α/2 t 29 , 0.025 2.045 19/49 Calculating the Confidence Limits The 95% confidence interval is (x y) t nx ny 2,α/2 (3004 2538) (2.054) sp2 nx sp2 ny μX μY (x y) t nx ny 2,α/2 sp2 nx sp2 ny 4427.03 4427.03 4427.03 4427.03 μX μY (3004 2538) (2.054) 17 14 17 14 416.69 μX μY 515.31 We are 95% confident that the mean difference in CPU speed is between 416.69 and 515.31 Mhz. 20/49 3.b) σx2 and σy2 Unknown, Assumed Unequal Assumptions: Population means, independent samples Samples are randomly and independently drawn σx2 and σy2 known Populations are normally distributed σx2 and σy2 unknown Population variances are unknown and assumed unequal σx2 and σy2 assumed equal σx2 and σy2 assumed unequal * 21/49 σx2 and σy2 Unknown, Assumed Unequal (continued) Forming interval estimates: Population means, independent samples The population variances are assumed unequal, so a pooled variance is not appropriate σx2 and σy2 known use a t value with degrees of freedom, where σx2 and σy2 unknown 2 σx2 and σy2 assumed equal σx2 and σy2 assumed unequal * s2x s2y ( ) ( ) n y n x v 2 2 2 2 s sx /(n x 1) y /(n y 1) n nx y 22/49 Confidence Interval, σx2 and σy2 Unknown, Unequal σx2 and σy2 unknown σx2 and σy2 assumed equal σx2 and σy2 assumed unequal (x y) t ,α/2 * The confidence interval for μ1 – μ2 is: 2 2 s2x s y s2x s y μX μY (x y) t ,α/2 nx ny nx ny Where v s2x s2y ( ) ( ) n y n x 2 2 s2 s2x /(n x 1) y /(n y 1) n nx y 2 23/49 Two Population Proportions Population proportions Goal: Form a confidence interval for the difference between two population proportions, Px – Py Assumptions: Both sample sizes are large (generally at least 40 observations in each sample) The point estimate for the difference is pˆ x pˆ y 24/49 Two Population Proportions (continued) Population proportions The random variable Z (pˆ x pˆ y ) (p x p y ) pˆ x (1 pˆ x ) pˆ y (1 pˆ y ) nx ny is approximately normally distributed 25/49 Confidence Interval for Two Population Proportions Population proportions The confidence limits for Px – Py are: (pˆ x pˆ y ) Z / 2 pˆ x (1 pˆ x ) pˆ y (1 pˆ y ) nx ny 26/49 Example: Two Population Proportions Form a 90% confidence interval for the difference between the proportion of men and the proportion of women who have college degrees. In a random sample, 26 of 50 men and 28 of 40 women had an earned college degree 27/49 Example: Two Population Proportions (continued) Men: ˆp x 26 0.52 50 Women: ˆp y 28 0.70 40 pˆ x (1 pˆ x ) pˆ y (1 pˆ y ) 0.52(0.48) 0.70(0.30) 0.1012 nx ny 50 40 For 90% confidence, Z/2 = 1.645 28/49 Example: Two Population Proportions (continued) The confidence limits are: (pˆ x pˆ y ) Zα/2 pˆ x (1 pˆ x ) pˆ y (1 pˆ y ) nx ny (0.52 0.70) 1.645(0.1012) so the confidence interval is -0.3465 < Px – Py < -0.0135 Since this interval does not contain zero we are 90% confident that the two proportions are not equal 29/49 Confidence Intervals for the Population Variance Population Variance Goal: Form a confidence interval for the population variance, σ2 The confidence interval is based on the sample variance, s2 Assumed: the population is normally distributed 30/49 Confidence Intervals for the Population Variance (continued) Population Variance The random variable 2 n1 (n 1)s 2 σ 2 follows a chi-square distribution with (n – 1) degrees of freedom 2 The chi-square value n1, denotes the number for which P( χn21 χn21, α ) α 31/49 Confidence Intervals for the Population Variance (continued) Population Variance The (1 - )% confidence interval for the population variance is (n 1)s (n 1)s 2 σ 2 2 χn1, α/2 χn1, 1 - α/2 2 2 32/49 Example You are testing the speed of a computer processor. You collect the following data (in Mhz): CPUx Sample size 17 Sample mean 3,004 Sample std dev 74 Assume the population is normal. Determine the 95% confidence interval for σx2 33/49 Finding the Chi-square Values n = 17 so the chi-square distribution has (n – 1) = 16 degrees of freedom = 0.05, so use the the chi-square values with area 0.025 in each tail: 2 χ n21, α/2 χ16 , 0.025 28.85 χ probability α/2 =0 .025 2 n 1, 1 - α/2 χ 2 16 , 0.975 6.91 probability α/2 = 0.025 216 = 6.91 216 = 28.85 216 34/49 Calculating the Confidence Limits The 95% confidence interval is 2 (n 1)s 2 (n 1)s 2 σ 2 2 χn1, α/2 χn1, 1 - α/2 2 (17 1)(74) 2 (17 1)(74) σ2 28.85 6.91 3,037 σ2 12,683 Converting to standard deviation, we are 95% confident that the population standard deviation of CPU speed is between 55.1 and 112.6 Mhz 35/49 Sample PHStat Output 36/49 Sample PHStat Output (continued) Input Output 37/49 Sample Size Determination Determining Sample Size For the Mean For the Proportion 38/49 Margin of Error The required sample size can be found to reach a desired margin of error (ME) with a specified level of confidence (1 - ) The margin of error is also called sampling error the amount of imprecision in the estimate of the population parameter the amount added and subtracted to the point estimate to form the confidence interval 39/49 Sample Size Determination Determining Sample Size For the Mean x z α/2 σ n Margin of Error (sampling error) ME z α/2 σ n 40/49 Sample Size Determination (continued) Determining Sample Size For the Mean ME z α/2 σ n Now solve for n to get z σ n 2 ME 2 α/2 2 41/49 Sample Size Determination (continued) To determine the required sample size for the mean, you must know: z σ n ME 2 2 α/2 2 The desired level of confidence (1 - ), which determines the z/2 value The acceptable margin of error (sampling error), ME The standard deviation, σ 42/49 Required Sample Size Example If = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence? z σ (1.645) (45) n 219.19 2 2 ME 5 2 α/2 2 2 2 So the required sample size is n = 220 (Always round up) 43/49 Sample Size Determination Determining Sample Size For the Proportion pˆ z α/2 pˆ (1 pˆ ) n ME z α/2 pˆ (1 pˆ ) n Margin of Error (sampling error) 44/49 Sample Size Determination (continued) Determining Sample Size ME z α/2 For the Proportion pˆ (1 pˆ ) n pˆ (1 pˆ ) cannot be larger than 0.25, when p̂ = 0.5 Substitute 0.25 for pˆ (1 pˆ ) and solve for n to get 0.25 z n 2 ME 2 α/2 45/49 Sample Size Determination (continued) The sample and population proportions, p̂ and P, are generally not known (since no sample has been taken yet) P(1 – P) = 0.25 generates the largest possible margin of error (so guarantees that the resulting sample size will meet the desired level of confidence) To determine the required sample size for the proportion, you must know: The desired level of confidence (1 - ), which determines the critical z/2 value The acceptable sampling error (margin of error), ME Estimate P(1 – P) = 0.25 46/49 Required Sample Size Example How large a sample would be necessary to estimate the true proportion defective in a large population within ±3%, with 95% confidence? 47/49 Required Sample Size Example (continued) Solution: For 95% confidence, use z0.025 = 1.96 ME = 0.03 Estimate P(1 – P) = 0.25 2 α/2 2 0.25 z n ME 2 (0.25)(1.96) 1,067.11 2 (0.03) So use n = 1,068 48/49 PHStat Sample Size Options 49/49