Matakuliah Tahun Versi : A0064 / Statistik Ekonomi : 2005 : 1/1 Pertemuan 17 Pembandingan Dua Populasi-1 1 Learning Outcomes Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : • Membandingkan dua observasi yang berpasangan dan pengujian perbedaan antara dua rata-rata populasi 2 Outline Materi • Pembandingan Observasi yang Berpasangan • Pengujian Perbedaan antara Dua Ratarata Populasi 3 COMPLETE BUSINESS STATISTICS 8 • • • • • • 8-4 5th edi tion The Comparison of Two Populations Using Statistics Paired-Observation Comparisons A Test for the Difference between Two Population Means Using Independent Random Samples A Large-Sample Test for the Difference between Two Population Proportions The F Distribution and a Test for the Equality of Two Population Variances Summary and Review of Terms McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-5 5th edi tion 8-1 Using Statistics • Inferences about differences between parameters of two populations Paired-Observations Observe the same group of persons or things – At two different times: “before” and “after” – Under two different sets of circumstances or “treatments” Independent Samples • Observe different groups of persons or things – At different times or under different sets of circumstances McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-6 5th edi tion 8-2 Paired-Observation Comparisons • Population parameters may differ at two different times or under two different sets of circumstances or treatments because: The circumstances differ between times or treatments The people or things in the different groups are themselves different • By looking at paired-observations, we are able to minimize the “between group” , extraneous variation. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-7 BUSINESS STATISTICS 5th edi tion Paired-Observation Comparisons of Means Test statistic for the paired - observations t test : D s n where D is the sample average difference between each t D0 D pair of observations, s is the sample standard deviation D of these differences, and the sample size, n, is the number of pairs of observations. The symbol is the population D0 mean difference under the null hypothesis. When the null hypothesis is true and the population mean difference is , D0 the statistic has a t distribution with (n - 1) degrees of freedom. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-8 BUSINESS STATISTICS 5th edi tion Example 8-1 A random sample of 16 viewers of Home Shopping Network was selected for an experiment. All viewers in the sample had recorded the amount of money they spent shopping during the holiday season of the previous year. The next year, these people were given access to the cable network and were asked to keep a record of their total purchases during the holiday season. Home Shopping Network managers want to test the null hypothesis that their service does not increase shopping volume, versus the alternative hypothesis that it does. Shopper Previous 1 334 2 150 3 520 4 95 5 212 6 30 7 1055 8 300 9 85 10 129 11 40 12 440 13 610 14 208 15 880 16 25 McGraw-Hill/Irwin Current 405 125 540 100 200 30 1200 265 90 206 18 489 590 310 995 75 Diff 71 -25 20 5 -12 0 145 -35 5 77 -22 49 -20 102 115 50 H0: D 0 H1: D > 0 df = (n-1) = (16-1) = 15 Test Statistic: t D D 0 sD n Critical Value: t0.05 = 1.753 Do not reject H0 if : t 1.753 Reject H0 if: t > 1.753 Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-9 BUSINESS STATISTICS 5th edi tion Example 8-1: Solution D D 32.81 0 0 t 2.354 sD 55.75 t = 2.354 > 1.753, so H0 is rejected and we conclude that there is evidence that shopping volume by network viewers has increased, with a p-value between 0.01 an 0.025. The Template output gives a more exact p-value of 0.0163. See the next slide for the output. 16 n t Distribution: df=15 0.4 f(t) 0.3 0.2 Nonrejection Region 0.1 Rejection Region 0.0 -5 0 1.753 = t0.05 5 2.131 = t0.025 t 2.602 = t0.01 2.354= test statistic McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-10 5th edi tion Example 8-1: Template for Testing Paired Differences McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-11 5th edi tion Example 8-2 It has recently been asserted that returns on stocks may change once a story about a company appears in The Wall Street Journal column “Heard on the Street.” An investments analyst collects a random sample of 50 stocks that were recommended as winners by the editor of “Heard on the Street,” and proceeds to conduct a two-tailed test of whether or not the annualized return on stocks recommended in the column differs between the month before and the month after the recommendation. For each stock the analysts computes the return before and the return after the event, and computes the difference in the two return figures. He then computes the average and standard deviation of the differences. H0: D 0 H1: D > 0 n = 50 D = 0.1% sD = 0.05% Test Statistic: z D D 0.1 0 0 z 14.14 sD 0.05 n D D 0 sD n McGraw-Hill/Irwin 50 p - value: p ( z 14.14 ) 0 This test result is highly significant, and H 0 may be rejected at any reasonable level of significance. Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-12 BUSINESS STATISTICS 5th edi tion Confidence Intervals for Paired Observations A (1 - ) 100% confidence interval for the mean difference D : s D t D 2 n where t is the value of the t distributi on with (n - 1) degrees of freedom that cuts off an 2 area of to its right, When the sample size is large, we may use z instead. . 2 2 McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-13 5th edi tion Confidence Intervals for Paired Observations – Example 8-2 95% confidence interval for the data in Example 8 2 : s 0.05 D D z 0.1 1.96 01 . (196 . )(.0071) n 50 2 01 . 0.014 [0.086,0114 . ] Note that this confidence interval does not include the value 0. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-14 5th edi tion Confidence Intervals for Paired Observations – Example 8-2 Using the Template McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-15 BUSINESS STATISTICS 5th edi tion 8-3 A Test for the Difference between Two Population Means Using Independent Random Samples • When paired data cannot be obtained, use independent random samples drawn at different times or under different circumstances. Large sample test if: • Both n1 30 and n2 30 (Central Limit Theorem), or • Both populations are normal and 1 and 2 are both known Small sample test if: • Both populations are normal and 1 and 2 are unknown McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-16 5th edi tion Comparisons of Two Population Means: Testing Situations • • • I: Difference between two population means is 0 1= 2 • H0: 1 -2 = 0 • H1: 1 -2 0 II: Difference between two population means is less than 0 1 2 • H0: 1 -2 0 • H1: 1 -2 0 III: Difference between two population means is less than D 1 2+D • H0: 1 -2 D • H1: 1 -2 D McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-17 BUSINESS STATISTICS 5th edi tion Comparisons of Two Population Means: Test Statistic Large-sample test statistic for the difference between two population means: z ( x x ) ( ) 1 2 1 2 1 n 1 2 0 2 2 n 2 The term (1- 2)0 is the difference between 1 an 2 under the null hypothesis. Is is equal to zero in situations I and II, and it is equal to the prespecified value D in situation III. The term in the denominator is the standard deviation of the difference between the two sample means (it relies on the assumption that the two samples are independent). McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-18 BUSINESS STATISTICS 5th edi tion Two-Tailed Test for Equality of Two Population Means: Example 8-3 Is there evidence to conclude that the average monthly charge in the entire population of American Express Gold Card members is different from the average monthly charge in the entire population of Preferred Visa cardholders? Population1 : Preferred Visa H 0 : 0 1 2 H : 0 1 1 2 n = 1200 1 x = 452 1 = 212 1 Population 2 : Gold Card ( x x ) ( ) 2 1 2 0 ( 452 523) 0 z 1 2 2 2 2 212 185 1 2 1200 800 n n 1 2 71 80.2346 71 7.926 8.96 n = 800 2 x = 523 p - value : p(z < -7.926) 0 2 = 185 2 McGraw-Hill/Irwin H 0 is rejected at any common level of significan ce Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-19 BUSINESS STATISTICS 5th edi tion Example 8-3: Carrying Out the Test Standard Normal Distribution 0.4 f(z) 0.3 0.2 0.1 0.0 -z0.01=-2.576 Rejection Region Test Statistic=-7.926 McGraw-Hill/Irwin 0 Nonrejection Region z z0.01=2.576 Rejection Region Since the value of the test statistic is far below the lower critical point, the null hypothesis may be rejected, and we may conclude that there is a statistically significant difference between the average monthly charges of Gold Card and Preferred Visa cardholders. Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-20 5th edi tion Example 8-3: Using the Template McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-21 5th edi tion Two-Tailed Test for Difference Between Two Population Means: Example 8-4 Is there evidence to substantiate Duracell’s claim that their batteries last, on average, at least 45 minutes longer than Energizer batteries of the same size? Population1 : Duracell H : 45 0 1 2 H : 45 1 1 2 n = 100 1 x = 308 1 = 84 1 Population 2 : Energizer ( x x ) ( ) 2 1 2 0 (308 254) 45 z 1 2 2 2 2 84 67 1 2 100 100 n n 1 2 9 115.45 9 0.838 10.75 n = 100 2 x = 254 2 = 67 2 McGraw-Hill/Irwin p - value : p(z > 0.838) = 0.201 H may not be rejected at any common 0 level of significan ce Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-22 5th edi tion Two-Tailed Test for Difference Between Two Population Means: Example 8-4 – Using the Template Is there evidence to substantiate Duracell’s claim that their batteries last, on average, at least 45 minutes longer than Energizer batteries of the same size? McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-23 5th edi tion Confidence Intervals for the Difference between Two Population Means A large-sample (1-)100% confidence interval for the difference between two population means, 1- 2 , using independent random samples: (x x ) z 1 2 2 2 2 1 2 n n 1 2 A 95% confidence interval using the data in example 8-3: (x x ) z 1 2 2 McGraw-Hill/Irwin 2 2 2 1852 212 1 2 (523 452) 1.96 [53.44,88.56] 1200 800 n n 1 2 Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-24 BUSINESS STATISTICS 5th edi tion 8-4 A Test for the Difference between Two Population Means: Assuming Equal Population Variances • If we might assume that the population variances 12 and 22 are equal (even though unknown), then the two sample variances, s12 and s22, provide two separate estimators of the common population variance. Combining the two separate estimates into a pooled estimate should give us a better estimate than either sample variance by itself. ** * * * * ** ** x1 Deviation from the mean. One for each sample data point. } } Deviation from the mean. One for each sample data point. * * * * Sample 1 From sample 1 we get the estimate s12 with (n1-1) degrees of freedom. * ** * * ** * * x2 ** * * Sample 2 From sample 2 we get the estimate s22 with (n2-1) degrees of freedom. From both samples together we get a pooled estimate, sp2 , with (n1-1) + (n2-1) = (n1+ n2 -2) total degrees of freedom. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-25 5th edi tion Pooled Estimate of the Population Variance A pooled estimate of the common population variance, based on a sample variance s12 from a sample of size n1 and a sample variance s22 from a sample of size n2 is given by: 2 2 ( n 1 ) s ( n 1 ) s 1 2 2 s2p 1 n1 n2 2 The degrees of freedom associated with this estimator is: df = (n1+ n2-2) The pooled estimate of the variance is a weighted average of the two individual sample variances, with weights proportional to the sizes of the two samples. That is, larger weight is given to the variance from the larger sample. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-26 5th edi tion Using the Pooled Estimate of the Population Variance The estimate of the standard deviation of (x1 x 2 ) is given by: 1 2 1 sp n1 n2 Test statistic for the difference between two population means, assuming equal population variances: (x1 x 2 ) ( 1 2 ) 0 t= 1 2 1 sp n n 1 2 where ( 1 2 ) 0 is the difference between the two population means under the null hypothesis (zero or some other number D). The number of degrees of freedom of the test statistic is df = ( n1 n2 2 ) (the 2 number of degrees of freedom associated with s p , the pooled estimate of the population variance. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-27 BUSINESS STATISTICS 5th edi tion Example 8-5 Do the data provide sufficient evidence to conclude that average percentage increase in the CPI differs when oil sells at these two different prices? H 0 : 1 2 0 H1: 1 2 0 Population 1: Oil price = $27.50 n1 = 14 ( x1 x 2 ) ( 1 2 ) 0 t ( n1 1) s12 ( n2 1) s22 1 1 n1 n2 2 n1 n2 0.107 0.107 2.154 0.00247 0.0497 x1 = 0.317% s1 = 0.12% Population 2: Oil price = $20.00 n2 = 9 x 2 = 0.21% s 2 = 0.11% Critical point: t df = (n n 2 ) (14 9 2 ) 21 1 2 McGraw-Hill/Irwin = 2.080 0.025 H 0 may be rejected at the 5% level of significance Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-28 5th edi tion Example 8-5: Using the Template Do the data provide sufficient evidence to conclude that average percentage increase in the CPI differs when oil sells at these two different prices? P-value = 0.0430, so reject H0 at the 5% significance level. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-29 BUSINESS STATISTICS 5th edi tion Example 8-6 The manufacturers of compact disk players want to test whether a small price reduction is enough to increase sales of their product. Is there evidence that the small price reduction is enough to increase sales of compact disk players? H : 0 0 2 1 H : 0 1 2 1 t Population 1: Before Reduction n 1 = 15 x 1 = $6598 s1 = $844 Population 2: After Reduction n 2 = 12 ( x x ) ( ) 2 1 2 1 0 ( n 1) s 2 ( n 1) s 2 1 1 1 1 2 2 n n n n 2 1 2 1 2 ( 6870 6598) 0 (14)8442 (11)6692 1 1 15 12 15 12 2 272 89375.25 272 0.91 298.96 x 2 = $6870 s 2 = $669 Critical point : t = 1.316 0.10 df = (n n 2 ) (15 12 2 ) 25 1 2 McGraw-Hill/Irwin H may not be rejected even at the 10% level of significan ce 0 Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-30 5th edi tion Example 8-6: Using the Template P-value = 0.1858, so do not reject H0 at the 5% significance level. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-31 BUSINESS STATISTICS 5th edi tion Example 8-6: Continued t Distribution: df =25 0.4 f(t) 0.3 0.2 0.1 0.0 -5 -4 -3 -2 -1 Nonrejection Region 0 1 2 3 4 t0.10=1.316 Rejection Region 5 t Since the test statistic is less than t0.10, the null hypothesis cannot be rejected at any reasonable level of significance. We conclude that the price reduction does not significantly affect sales. Test Statistic=0.91 McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 8-32 BUSINESS STATISTICS 5th edi tion Confidence Intervals Using the Pooled Variance A (1-) 100% confidence interval for the difference between two population means, 1- 2 , using independent random samples and assuming equal population variances: ( x1 x2 ) t 2 1 sp n1 n2 1 2 A 95% confidence interval using the data in Example 8-6: ( x1 x 2 ) t 2 sp 1 1 n1 n2 ( 6870 6598 ) 2 .06 ( 595835)( 0.15) [ 343.85,887 .85] 2 McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 8-33 5th edi tion Confidence Intervals Using the Pooled Variance and the TemplateExample 8-6 Confidence Interval McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 Penutup • Pembahasan materi dilanjutkan dengan Materi Pokok 18 (Pembandingan Dua Populasi-2) 34