2023/6/27 Chapter 8. Comparison of Two Populations Difference Between Two Means 𝑥1 − 𝑥2 is (approximately) distributed when the sample sizes are large. 𝐸(𝑥1 − 𝑥2 ) = 𝜇1 − 𝜇2 𝑉(𝑥1 − 𝑥2 ) = 𝜎12 𝑛1 + 𝜎22 𝑛2 There are two case for Test Statistics for 𝜇1 − 𝜇2 : when 𝜎12 = 𝜎22 1 and when 𝜎12 ≠ 𝜎22 . When 𝜎12 = 𝜎22 , we use t test. 𝑡 = (𝑥̅ 1 −𝑥̅ 2 )− (𝜇1 −𝜇2 ) 1 1 √𝑠𝑝2 (𝑛 + 2 ) 1 ν = n1 + n2 – 2 where 2 2 ( ) ( ) 𝑛 − 1 𝑠 + 𝑛 − 1 𝑠 1 2 1 2 𝑠𝑝2 = 𝑛1 + 𝑛2 − 2 – 𝑠𝑝2 is called the pooled variance estimator. It is the weighted average of the two sample variances with the number of degrees of freedom used as weights. The confidence interval (𝑥̅1 − 𝑥̅2 ) ± 𝑡𝛼/2 √𝑠𝑝2 ( 1 1 + ) 𝑛1 𝑛2 For the statistic when variances of population are different: 2 𝑠12 𝑠22 (𝑥̅1 − 𝑥̅2 ) ± 𝑡𝛼/2 √( + ) 𝑛1 𝑛2 Testing for Population Variance – H0: – HA: 𝜎12 𝜎22 𝜎12 𝜎22 =1 ≠1 – F-test with n1-1, n2-1 degrees of freedom. Ex8-1) Comparing salaries for finance and marketing majors. Here is salary record from randomly sampled 50 recently graduated students: 25 for each major. Can we infer that finance majors obtain higher salaries than do marketing majors? > Ex8_1 <- read_excel("NaverCloud/R/data/Ex8-1.xlsx") > View(Ex8_1) > attach(Ex8-1) 3 > var.test(Finance,Marketing) F test to compare two variances data: Finance and Marketing F = 1.3745, num df = 24, denom df = 24, p-value = 0.4416 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.6056997 3.1191228 sample estimates: ratio of variances 1.374501 The F-statistics (ratio of variance) is 1.3745. The p-value is 0.4416 > 0.05, so we can say the variances of the two samples are equal. > t.test(Finance, Marketing, var.equal=TRUE) Two Sample t-test data: Finance and Marketing t = 1.0422, df = 48, p-value = 0.3026 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 4 -4833.352 15235.352 sample estimates: mean of x mean of y 65623.8 60422.8 The difference between the sample means is $65623-$60422 = $5,201 (quite large). But the standard deviation for pooled data is also large ($4,991). T statistic is small 1.04 < 1.96; we cannot infer that finance majors attract high salaries from this data. Ex 8-2) Does the business do better after the change if the new boss id the offspring of the owner or does the business do better when an outsider is made chief executive officer. In pursuit of an answer, researchers randomly selected 140 firms between 1994 and 2002, 30% of which passed ownership to an offspring and 5 70% of which appointed an outsider as CEO. The change in the operating income as a proportion of assets before and after the change was recorded. Do these data allow us to infer that the effect of making an offspring CEO is different from the effect of hiring outsider as CEO? > > > > Ex8_2 <- read_excel("NaverCloud/R/data/Ex8-2.xlsx") View(Ex8_2) attach(Ex8_2) var.test(Offspring,Outsider) F test to compare two variances data: Offspring and Outsider F = 0.47138, num df = 41, denom df = 97, p-value = 0.008095 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.2875692 0.8170022 sample estimates: ratio of variances 0.4713825 6 There is enough evidence to infer the population variance differ. > t.test(Offspring, Outsider) Welch Two Sample t-test data: Offspring and Outsider t = -3.2196, df = 110.75, p-value = 0.001685 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.1581458 -0.5136909 sample estimates: mean of x mean of y -0.100000 1.235918 The t-statistic is -3.22, and p-value is 0.0017. Accordingly, we conclude there is sufficient evidence to infer that the mean changes in operating income differ WE estimate that the man change in operating incomes for 7 outsiders exceeds the mean change in the operating income for offspring lies between 0.51 and 2.16 percentage point. Matched Pairs Experiments Ex 8-3) We redo the experiment by grouping students according to their GPA: from Group 1 (for highest-grade students) down to Group 25 (lowest grade students). Matching the same grade students, we can calculate the difference of salaries for each pair. > > > > Ex8_3 <- read_excel("NaverCloud/R/data/Ex8-3.xlsx") View(Ex8_3) attach(Ex8_3) var.test(Finance,Marketing) F test to compare two variances 8 data: Finance and Marketing F = 0.9479, num df = 24, denom df = 24, p-value = 0.8968 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.4177082 2.1510380 sample estimates: ratio of variances 0.9478956 > t.test(Finance,Marketing,var.equal=TRUE,paired=TRUE) Paired t-test data: Finance and Marketing t = 3.8097, df = 24, p-value = 0.0008511 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2320.816 7808.224 sample estimates: mean of the differences 5064.52 > detach(Ex8_3) T-statistic is t = 3.81 with p-value of 0.0009. There is now 9 overwhelming evidence to infer that finance majors obtain higher salaries. We estimate that the mean salary offer to finance majors exceeds the mean salary offer to marketing majors by an amount of $2,321 and $7,808. 10