Chapter 9

EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics 1/49 Estimation: Additional Topics Chapter Topics Population Means, Dependent Samples Population Means, Independent Samples Population Proportions Population Variance Proportion 1 vs. Proportion 2 Variance of a normal distribution Examples: Same group before vs. after treatment Group 1 vs. independent Group 2 2/49 1. Dependent Samples Tests Means of 2 Related Populations Dependent samples    Paired or matched samples Repeated measures (before/after) Use difference between paired values: di = xi - yi  Eliminates Variation Among Subjects  Assumptions:  Both Populations Are Normally Distributed 3/49 Mean Difference The ith paired difference is di , where Dependent samples di = xi - yi The point estimate for the population mean paired difference is d : The sample standard deviation is: n d d i 1 i n n Sd  2 (d  d )  i i1 n 1 n is the number of matched pairs in the sample 4/49 Confidence Interval for Mean Difference Dependent samples The confidence interval for difference between population means, μd , is d  t n1,α/2 Sd Sd  μd  d  t n1,α/2 n n Where n = the sample size (number of matched pairs in the paired sample) 5/49 Confidence Interval for Mean Difference (continued) Dependent samples  The margin of error is ME  t n1,α/2 sd n  tn-1,/2 is the value from the Student’s t distribution with (n – 1) degrees of freedom for which α P(t n1  t n1,α/2 )  2 6/49 Paired Samples Example  Six people sign up for a weight loss program. You collect the following data: Person 1 2 3 4 5 6 Weight: Before (x) After (y) 136 205 157 138 175 166 125 195 150 140 165 160 Difference, di 11 10 7 -2 10 6 42 di  d = n = 7.0 Sd  2 (d  d )  i n 1  4.82 7/49 Paired Samples Example (continued)  For a 95% confidence level, the appropriate t value is tn-1,/2 = t5,0.025 = 2.571  The 95% confidence interval for the difference between means, μd , is d  t n1,α/2 7  (2.571) Sd S  μd  d  t n1,α/2 d n n 4.82 4.82  μd  7  (2.571) 6 6  1.94  μd  12.06 Since this interval contains zero, we can not be 95% confident, given this limited data, that the weight loss program helps people lose weight 8/49 2. Difference Between Two Means Population means, independent samples Goal: Form a confidence interval for the difference between two population means, μx – μy  Different data sources  Unrelated  Independent  Sample selected from one population has no effect on the sample selected from the other population  The point estimate is the difference between the two sample means: x–y 9/49 Difference Between Two Means (continued) Population means, independent samples σx2 and σy2 known Confidence interval uses z/2 σx2 and σy2 unknown σx2 and σy2 assumed equal σx2 and σy2 assumed unequal Confidence interval uses a value from the Student’s t distribution 10/49 2. σx2 and σy2 Known Population means, independent samples σx2 and σy2 known σx2 and σy2 unknown Assumptions: *  Samples are randomly and independently drawn  both population distributions are normal  Population variances are known 11/49 σx2 and σy2 Known (continued) When σx and σy are known and both populations are normal, the variance of X – Y is Population means, independent samples 2 σx2 and σy2 known σx2 and σy2 unknown * σ 2X Y 2 σy σx   nx ny …and the random variable Z (x  y)  (μX  μY ) 2 σ 2x σ y  nX nY has a standard normal distribution 12/49 Confidence Interval, σx2 and σy2 Known Population means, independent samples σx2 and σy2 known σx2 and σy2 unknown (x  y)  z α/2 * The confidence interval for μx – μy is: σ 2X σ 2Y σ 2X σ 2Y   μX  μY  (x  y)  z α/2  nx ny nx ny 13/49 3.a) σx2 and σy2 Unknown, Assumed Equal Assumptions: Population means, independent samples  Samples are randomly and independently drawn σx2 and σy2 known  Populations are normally distributed σx2 and σy2 unknown σx2 and σy2 assumed equal *  Population variances are unknown but assumed equal σx2 and σy2 assumed unequal 14/49 σx2 and σy2 Unknown, Assumed Equal (continued) Forming interval estimates: Population means, independent samples  The population variances are assumed equal, so use the two sample standard deviations and pool them to estimate σ σx2 and σy2 known σx2 and σy2 unknown σx2 and σy2 assumed equal σx2 and σy2 assumed unequal *  use a t value with (nx + ny – 2) degrees of freedom 15/49 σx2 and σy2 Unknown, Assumed Equal (continued) Population means, independent samples The pooled variance is σx2 and σy2 known σx2 and σy2 unknown σx2 and σy2 assumed equal * sp2  (n x  1)s 2x  (n y  1)s 2y nx  ny  2 σx2 and σy2 assumed unequal 16/49 Confidence Interval, σx2 and σy2 Unknown, Equal σx2 and σy2 unknown σx2 and σy2 assumed equal * The confidence interval for μ1 – μ2 is: sp2 sp2 σx2 and σy2 assumed unequal (x  y)  t nx ny 2,α/2 Where sp2  sp2 nx  ny  μX  μY  (x  y)  t nx ny 2,α/2 nx  sp2 ny (n x  1)s 2x  (n y  1)s 2y nx  ny  2 17/49 Pooled Variance Example You are testing two computer processors for speed. Form a confidence interval for the difference in CPU speed. You collect the following speed data (in Mhz): CPUx Number Tested 17 Sample mean 3,004 Sample std dev 74 CPUy 14 2,538 56 Assume both populations are normal with equal variances, and use 95% confidence 18/49 Calculating the Pooled Variance The pooled variance is: 2 2     n  1 S  n  1 S  17  1742  14  1562 x x y y 2 S   p (n x  1)  (ny  1) (17 - 1)  (14  1)  4427.03 The t value for a 95% confidence interval is: tnx ny 2 , α/2  t 29 , 0.025  2.045 19/49 Calculating the Confidence Limits  The 95% confidence interval is (x  y)  t nx ny 2,α/2 (3004  2538)  (2.054) sp2 nx  sp2 ny  μX  μY  (x  y)  t nx ny 2,α/2 sp2 nx  sp2 ny 4427.03 4427.03 4427.03 4427.03   μX  μY  (3004  2538)  (2.054)  17 14 17 14 416.69  μX  μY  515.31 We are 95% confident that the mean difference in CPU speed is between 416.69 and 515.31 Mhz. 20/49 3.b) σx2 and σy2 Unknown, Assumed Unequal Assumptions: Population means, independent samples  Samples are randomly and independently drawn σx2 and σy2 known  Populations are normally distributed σx2 and σy2 unknown  Population variances are unknown and assumed unequal σx2 and σy2 assumed equal σx2 and σy2 assumed unequal * 21/49 σx2 and σy2 Unknown, Assumed Unequal (continued) Forming interval estimates: Population means, independent samples  The population variances are assumed unequal, so a pooled variance is not appropriate σx2 and σy2 known  use a t value with  degrees of freedom, where σx2 and σy2 unknown 2 σx2 and σy2 assumed equal σx2 and σy2 assumed unequal *  s2x s2y  ( )  ( ) n y   n x v 2 2 2 2   s  sx    /(n x  1)   y  /(n y  1) n   nx   y 22/49 Confidence Interval, σx2 and σy2 Unknown, Unequal σx2 and σy2 unknown σx2 and σy2 assumed equal σx2 and σy2 assumed unequal (x  y)  t ,α/2 * The confidence interval for μ1 – μ2 is: 2 2 s2x s y s2x s y   μX  μY  (x  y)  t ,α/2  nx ny nx ny Where v  s2x s2y  ( )  ( ) n y   n x 2 2  s2   s2x    /(n x  1)   y  /(n y  1) n   nx   y 2 23/49 Two Population Proportions Population proportions Goal: Form a confidence interval for the difference between two population proportions, Px – Py Assumptions: Both sample sizes are large (generally at least 40 observations in each sample) The point estimate for the difference is pˆ x  pˆ y 24/49 Two Population Proportions (continued) Population proportions  The random variable Z (pˆ x  pˆ y )  (p x  p y ) pˆ x (1 pˆ x ) pˆ y (1 pˆ y )  nx ny is approximately normally distributed 25/49 Confidence Interval for Two Population Proportions Population proportions The confidence limits for Px – Py are: (pˆ x  pˆ y )  Z / 2 pˆ x (1 pˆ x ) pˆ y (1 pˆ y )  nx ny 26/49 Example: Two Population Proportions Form a 90% confidence interval for the difference between the proportion of men and the proportion of women who have college degrees.  In a random sample, 26 of 50 men and 28 of 40 women had an earned college degree 27/49 Example: Two Population Proportions (continued) Men: ˆp x  26  0.52 50 Women: ˆp y  28  0.70 40 pˆ x (1 pˆ x ) pˆ y (1 pˆ y ) 0.52(0.48) 0.70(0.30)     0.1012 nx ny 50 40 For 90% confidence, Z/2 = 1.645 28/49 Example: Two Population Proportions (continued) The confidence limits are: (pˆ x  pˆ y )  Zα/2 pˆ x (1  pˆ x ) pˆ y (1  pˆ y )  nx ny  (0.52  0.70)  1.645(0.1012) so the confidence interval is -0.3465 < Px – Py < -0.0135 Since this interval does not contain zero we are 90% confident that the two proportions are not equal 29/49 Confidence Intervals for the Population Variance Population Variance  Goal: Form a confidence interval for the population variance, σ2  The confidence interval is based on the sample variance, s2  Assumed: the population is normally distributed 30/49 Confidence Intervals for the Population Variance (continued) Population Variance The random variable  2 n1 (n  1)s  2 σ 2 follows a chi-square distribution with (n – 1) degrees of freedom 2  The chi-square value n1,  denotes the number for which P( χn21  χn21, α )  α 31/49 Confidence Intervals for the Population Variance (continued) Population Variance The (1 - )% confidence interval for the population variance is (n  1)s (n  1)s 2 σ  2 2 χn1, α/2 χn1, 1 - α/2 2 2 32/49 Example You are testing the speed of a computer processor. You collect the following data (in Mhz): CPUx Sample size 17 Sample mean 3,004 Sample std dev 74 Assume the population is normal. Determine the 95% confidence interval for σx2 33/49 Finding the Chi-square Values  n = 17 so the chi-square distribution has (n – 1) = 16 degrees of freedom   = 0.05, so use the the chi-square values with area 0.025 in each tail: 2 χ n21, α/2  χ16 , 0.025  28.85 χ probability α/2 =0 .025 2 n 1, 1 - α/2 χ 2 16 , 0.975  6.91 probability α/2 = 0.025 216 = 6.91 216 = 28.85 216 34/49 Calculating the Confidence Limits  The 95% confidence interval is 2 (n  1)s 2 (n  1)s 2  σ  2 2 χn1, α/2 χn1, 1 - α/2 2 (17  1)(74) 2 (17  1)(74)  σ2  28.85 6.91 3,037  σ2  12,683 Converting to standard deviation, we are 95% confident that the population standard deviation of CPU speed is between 55.1 and 112.6 Mhz 35/49 Sample PHStat Output 36/49 Sample PHStat Output (continued) Input Output 37/49 Sample Size Determination Determining Sample Size For the Mean For the Proportion 38/49 Margin of Error  The required sample size can be found to reach a desired margin of error (ME) with a specified level of confidence (1 - )  The margin of error is also called sampling error  the amount of imprecision in the estimate of the population parameter  the amount added and subtracted to the point estimate to form the confidence interval 39/49 Sample Size Determination Determining Sample Size For the Mean x  z α/2 σ n Margin of Error (sampling error) ME  z α/2 σ n 40/49 Sample Size Determination (continued) Determining Sample Size For the Mean ME  z α/2 σ n Now solve for n to get z σ n 2 ME 2 α/2 2 41/49 Sample Size Determination (continued)  To determine the required sample size for the mean, you must know: z σ n ME 2 2 α/2 2  The desired level of confidence (1 - ), which determines the z/2 value  The acceptable margin of error (sampling error), ME  The standard deviation, σ 42/49 Required Sample Size Example If  = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence? z σ (1.645) (45) n   219.19 2 2 ME 5 2 α/2 2 2 2 So the required sample size is n = 220 (Always round up) 43/49 Sample Size Determination Determining Sample Size For the Proportion pˆ  z α/2 pˆ (1 pˆ ) n ME  z α/2 pˆ (1  pˆ ) n Margin of Error (sampling error) 44/49 Sample Size Determination (continued) Determining Sample Size ME  z α/2 For the Proportion pˆ (1  pˆ ) n pˆ (1 pˆ ) cannot be larger than 0.25, when p̂ = 0.5 Substitute 0.25 for pˆ (1 pˆ ) and solve for n to get 0.25 z n 2 ME 2 α/2 45/49 Sample Size Determination (continued)  The sample and population proportions, p̂ and P, are generally not known (since no sample has been taken yet)  P(1 – P) = 0.25 generates the largest possible margin of error (so guarantees that the resulting sample size will meet the desired level of confidence)  To determine the required sample size for the proportion, you must know:  The desired level of confidence (1 - ), which determines the critical z/2 value  The acceptable sampling error (margin of error), ME  Estimate P(1 – P) = 0.25 46/49 Required Sample Size Example How large a sample would be necessary to estimate the true proportion defective in a large population within ±3%, with 95% confidence? 47/49 Required Sample Size Example (continued) Solution: For 95% confidence, use z0.025 = 1.96 ME = 0.03 Estimate P(1 – P) = 0.25 2 α/2 2 0.25 z n ME 2 (0.25)(1.96)   1,067.11 2 (0.03) So use n = 1,068 48/49 PHStat Sample Size Options 49/49

Chapter 9

Related documents

Products

Support

Chapter 9

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib