343 Chapter 13. Confidence intervals and Test of hypothesis comparing two parameters Problem PS274 1. Report the mean π1 = 13 and the standard deviation π1 = 3 of a normal random variable in cells A1 and A2. Do likewise for a second normal random variable in cells B1 and B2: π2 = 10 and π2 = 2. Generate 25 values of the first normal variable in the range A3:A27 and 16 values of the second variable in the range B3:B18. Compute the mean of the first sample in cell E3 and the mean of the second sample in cell F3; 2. Compute a 95% C.I. for the difference π1 − π2 assuming the variances π12 and π22 are known in cell D7 (lower bound) and cell D8 (upper bound). Example for cell D7: = πΈ3 − πΉ3 − πππ π. π. πΌππ(0.975) ∗ πππ π(π΄2 ∗ π΄2⁄25 + π΅2 ∗ π΅2). Likewise for cell D8. Compute the width of the interval in cell D9; 3. As in Step 2 for the 99% C.I. in cells D12 to D14; 4. Repeat the simulation a number of times, compare each time the width of the intervals and check whether the value 0 is contained in the interval; 5. Change the mean of the second normal variable to 12 and check the results; 6. Restore the value of the mean of the second variable to 10, change the standard deviation of the first variable to 8, of the second variable to 4 and check the influence of the increased standard deviations on the width of the intervals. Assignment PA274 Generate data as in Step 1 above. Apply a two-sided hypothesis test π»0 : π1 − π2 = 0 assuming variances known. Use the following decision rule (π∗ = π − π£πππ’π): when π∗ < .01: ππππππ‘ π»0 π π‘ππππππ¦, when 0.01 < π∗ < 0.05, ππππππ‘ π»0 , when 0.05 < π∗ < 0.1, ππππππ‘ π»0 ππππππ¦, when π∗ > .1, ππππππ‘ π»0 . Apply a one-sided test with π»π΄ : π1 − π2 > 0 and a significance level of 0.05. 344 Problem PS275 1. Generate data as in Step 1 of Problem PS274 changing π2 to the value 3. Compute the means of both samples in cells E3 and F3, the variances in cells E4 and F4. Compute the pooled sample variance in cell F5: = (24 ∗ πΈ4 + 15 ∗ πΉ4)⁄(25 + 16 − 2); 2. Compute a 95% C.I. for the difference π1 − π2 assuming the variances π12 and π22 are unknown but equal in cell D8 (lower bound) and cell D9 (upper bound). Example for cell D8: = πΈ3 − πΉ3 − π. πΌππ. 2π(0.05; 25 + 16 − 2) ∗ πππ π(πΉ5 ∗ (1⁄25 + 1⁄16)). Likewise for cell D9. Compute the width of the interval in cell D10; 3. Perform a two-sided test for π»0 : π1 − π2 = 0 with significance level 0.05. Compute the T-ratio in cell D13: = (πΈ3 − πΉ3)⁄πππ π(πΉ5 ∗ (1⁄25 + 1⁄16)) and the p-value in cell D14: = π. π·πΌππ. 2π(π·13; 25 + 16 − 2); 4. Use an alternative formula available in Excel to compute the p-value in cell D15: = π. ππΈππ(π΄3: π΄27; π΅3: π΅18; 2; 2). Report the decision in cell D16: = πΌπΉ (π·15 < .01; "ππππππ‘ π»0 π π‘ππππππ¦"; πΌπΉ(π·15 < .05; "ππππππ‘ π»0"; πΌπΉ(π·15 < .1; "ππππππ‘ π»0 ππππππ¦"; "ππππππ‘ π»0"))); 5. Change the mean of the second normal variable to 12.5 and repeat the simulation; 6. Keep the second mean at 12.5 and decrease the value of both standard deviations to 0.5. Repeat the simulation a few times (key F9). Assignment PA275 Generate data as in Step 1 above with the mean of the second normal variable equal to 12. Perform a two-sided test to compare the means in two ways: test 1 assumes the variances known as in Assignment PA274 and test 2 assumes the variances unknown as above. Repeat the test several times and notice that the outcomes may differ. 345 Problem PS276 1. Generate data as in Step 1 of Problem PS274 changing π1 to the value 5. Compute the means of both samples in cells E3 and F3, the variances in cells E4 and F4. Compute the pooled sample variance in cell F5: = (24 ∗ πΈ4 + 15 ∗ πΉ4)⁄(25 + 16 − 2); 2. Compute a 95% C.I. for the difference π1 − π2 assuming the variances π12 and π22 are not equal (Behrens-Fisher problem). To do this, first compute π 12 ⁄π1 and π 22 ⁄π2 in cells E8 and F8 and the degrees of freedom using the formula = ππππΈπ (πΈ8 + πΉ8; 2)⁄(ππππΈπ (πΈ8; 2)⁄24 + ππππΈπ (πΉ8; 2)⁄15) in cell E9. Compute the lower bound of the interval in cell E11, the upper bound in cell E12. Example for cell E11: = πΈ3 − πΉ3 − π. πΌππ. 2π(0.05; πΈ9) ∗ πππ π(πΈ8 + πΈ9) and likewise for the upper bound. Compute the width of the interval in cell E13; 3. Compute a 95% C.I. for the difference π1 − π2 assuming the variances π12 and π22 are not equal (wrong assumption) in cells I11 and I12. Compute the width of the interval in cell I13. Compare the results with Step 2; 4. Perform a two-sided test for π»0 : π1 − π2 = 0. Compute the T-ratio in cell D16: = (πΈ3 − πΉ3)⁄πππ π(πΈ8 + πΉ8). Compute the p-value in cell D17: = π. π·πΌππ. 2π(π·16; πΈ9); 5. Use an alternative formula available in Excel to compute the p-value in cell D19: = π. ππΈππ(π΄3: π΄27; π΅3: π΅18; 2; 3). The p-values in Step 4 and 5 may slightly differ probably due to a different handling of non-integer degrees of freedom. Report the decision in cell D20 similar to Step 4 in Problem PS275; 6. Perform a two-sided test assuming equal variances by computing the p-value using the T.TEST function of Excel (cell I17). Compare the result with Step 5. Assignment PA276 As the T-Test above is an approximate procedure, the dof are sometimes computed according to a slightly different formula: = (π 12 ⁄π1 +π 22 ⁄π2 ) 2 2 2 (π 12 ⁄π1 ) ⁄π1 +(π 22 ⁄π2 ) ⁄π2 − 2. Use this formula to compute the dof for the data above and compare with the result in Step 2. 346 Problem PS277 1. Report the mean π1 = 10 and the standard deviation π1 = 2.5 of a normal random variable in cells A1 and A2. Generate 20 values of the normal variable in the range A4:A23. Generate dependent values of a second variable in the range B4:B23 as follows (example for cell B4): = πππ π. πΌππ(π π΄ππ·( ); π΄4 − 1.5; 2) and likewise for cells B5:B23. Compute the pairwise differences in cells C4:C23. Compute the mean of the first sample in cell F3, the mean of the second sample in cell G3 and the mean of the differences in cell H3. Compute the sample variances in cells F4 to H4. Compute the pooled sample variance in cell F5; 2. Perform a two-sided test for the equality of means for dependent samples. Therefore compute the ratio π»3⁄πππ π(π»4⁄20) in cell E7 and the p-value in cell E8: = π. π·πΌππ. 2π(π΄π΅π(πΈ7; 19)). An alternative to compute the p-value in Excel in cell E9: = π. ππΈππ(π΄4: π΄23; π΅4: π΅23; 2; 1). Report the conclusion in cell E10: = πΌπΉ (πΈ9 < .01; "ππππππ‘ π»0 π π‘ππππππ¦"; πΌπΉ(πΈ9 < .05; "ππππππ‘ π»0"; πΌπΉ(πΈ9 < .1; "ππππππ‘ π»0 ππππππ¦"; "ππππππ‘ π»0"))); 3. Perform a two-sided test for the equality of means for independent samples (wrong assumption) assuming equal variances (use the T.TEST function). Compute the pvalue in cell E13 and the conclusion in cell E14 (see Problem PS276). Notice the strong effect of the dependence in the samples; 4. Perform a two-sided test for the equality of means for independent samples (wrong assumption) assuming unknown variances (use the T.TEST function). Compute the p-value in cell E17 and the conclusion in cell E18 (see Problem PS276). Notice also here the strong effect of the dependence in the sample. Assignment PA277 Generate data as in Step 1 above. Perform a two-sided test for the equality of the population means assuming dependent data. Compare the result with a two-sided test assuming 1. independent samples with known population variances; 2. independent samples with unknown but equal population variances. 347 Problem PS278 We test the equality of two variances of normally distributed data. 1. Generate data as in Step 1 in Problem PS274. Compute the variances of both samples in cells E3 and F3; 2. Compute the ratio of both variances in cell E6: = πΈ3⁄πΉ3 and the p-value for a twosided test for the equality of the two variances in cell E7: = πΌπΉ (πΈ6 > 1,2 ∗ πΉ. π·πΌππ. π π(πΈ6; 24; 15); 2 ∗ (1 − πΉ. π·πΌππ. 2π(πΈ6; 24; 15))). Draw the conclusion in cell E8: = πΌπΉ (πΈ7 < .01; "ππππππ‘ π»0 π π‘ππππππ¦"; πΌπΉ(πΈ7 < .05; "ππππππ‘ π»0"; πΌπΉ(πΈ7 < .1; "ππππππ‘ π»0 ππππππ¦"; "ππππππ‘ π»0"))); 3. An alternative and more direct way of computing the p-value for the two-sided test is possible using = πΉ. ππΈππ(π΄3: π΄27; π΅3: π΅18). Notice that Excel assumes that a twosided test is required; 4. Perform a one-sided test of the form: π»0 : π12 = π22 versus π»π΄ : π12 > π22 . Compute the p-value in cell E13: = πΉ. π·πΌππ. π π(πΈ6; 24; 15) and the conclusion in cell E14 similar to cell E8. Assignment PA278 Repeat Step 1 above. Compute a 95% confidence interval for the ratio π12 ⁄π22 . 348 Problem PS279 Test of the equality of two proportions or fractions. 1. Report the (unknown) true proportion of successes of a first population in cell A2 (0.4), of a second population in cell B2 (0.3). Generate 80 successes/failures from the first population in the range A3:A82. Example for cell A3: = πΌπΉ(π π΄ππ·( ) < π΄$2; "π"; "πΉ") . Do likewise for 75 observations from population 2 in the range B3:B77; 2. Compute the sample proportion of successes in both samples in cells D3 and E3. Example for cell D3: = πΆπππππΌπΉ(π΄3: π΄82; "π")⁄80 and similarly for cell E3. Assuming that both proportions in the populations are equal, compute the estimate of the equal proportion in cell E4: = (80 ∗ π·3 + 75 ∗ πΈ3)⁄155; 3. Compute the test ratio in cell D7: = (π·3 − πΈ3)⁄πππ π(πΈ4 ∗ (1 − πΈ4) ∗ (1⁄80 + 1⁄75)) and the p-value for a two-sided test of the equality of the two proportions in cell D8: = 2 ∗ (1 − πππ π. π. π·πΌππ(π΄π΅π(π·7); 1)). Report the conclusion in cell D9 similar to the problems above. Notice that for these sample sizes the null hypothesis of equal proportions is often accepted; 4. Change the population proportion of the second population to 0.2 and check the results. Assignment PA279 Repeat Step 1 above. Construct a 95% confidence interval for the difference π1 − π2 of the population proportions. 349 Problem PS280 Consider the data set ‘Rice’. Assume the weights for the different fillers are (approximately) normally distributed. 1. Compute the sample variances for the weights for fillers 1 and 2 in cells G2 and G3, the pooled sample variance in cell J2; 2. Test the equality of the variances of the weights of filler1 and filler2 in cell G4: = πΉ. ππΈππ(ππππππ1; ππππππ2) where filler1 and filler2 are the names of the data for the two fillers. Notice that this function applies a two-sided test for the equality of the variances. Answer: π − π£πππ’π = .3581. The hypothesis that both variances are equal cannot be rejected; 3. Compute a 95% C.I. for the ratio π12 ⁄π22 . The lower bound in cell G6: = πΉ. πΌππ(0.025; 19; 19) ∗ πΊ2⁄πΊ3 and likewise for the upper bound; 4. Compute the sample means of the weights for filler 1 and 2 in cells G9 and G10; 5. Apply a two-sided test for the equality of the means of the weights of filler1 and filler2 in cell G12 (assume variances equal): = π. ππΈππ(ππππππ1; ππππππ2; 2; 2). Answer: π − π£πππ’π = 1.1623πΈ − 04. The hypothesis that both means are equal is strongly rejected; 6. Apply a two-sided test for the equality of the means of the weights of filler 1 and 2 assuming the variances are unequal in cell G13: = π. ππΈππ(ππππππ1; ππππππ2; 2; 3). Compare the p-value with Step5; Answer: π − π£πππ’π = 1.2402πΈ − 04. The p-values in Steps 5 and 6 are quite close and lead to the same conclusion; 7. Derive a 95% C.I. for the difference of the means for filler 1 and 12. The lower bound in cell G15: = πΊ9 − πΊ10 − π. πΌππ. 2π(0.05; 38) ∗ πππ π(π½2 ∗ (1⁄20 + 1⁄20)) and likewise for the upper bound. Assignment PA280 Apply Steps 1 to 7 above for the fillers 1 and 4 of the data set ‘Rice’. 350 Problem PS281 Consider the data set ‘Sabena’. We compare the mean delay at arrival between flights originating in Marseille (LINE STATION-DEP=MRS) and Florence (LINE STATIONDEP=FLR). 1. Generate in column R the delay times at arrival for flights originating in Marseille. Example for cell R2: = πΌπΉ(πΊ2 = "MRS"; π2; ""). Do likewise in column S for flights originating in Florence; 2. Compute the mean and the variance of delay times of flights originating in Marseille in cells T2 and T3. Do likewise for the flights originating in Florence in cells U2 and U3; 3. Cell T7: apply a two-sided test for the equality of the variances of the delays at arrival between flights originating in Marseille and in Florence (assume normality of delay times, see later): = πΉ. ππΈππ(π 2: π 3854; π2: π3854). Answer: π − π£πππ’π = 1.2070πΈ − 22. Decision: reject the equality of the variances strongly; 4. Cell T10: apply a two-sided test for the equality of the means of the delays at arrival between flights originating in Marseille and in Florence (assume normality of delay times, see later): = π. ππΈππ(π 2: π 3854; π2: π3854; 2; 3). Answer: π − π£πππ’π = 0.0242. Decision: reject the equality of the means; 5. Cell T13: : apply a two-sided test for the equality of the means of the delays at arrival between flights originating in Marseille and in Florence assuming equal variances (which is actually false): = π. ππΈππ(π 2: π 3854; π2: π3854; 2; 2). Answer: π − π£πππ’π = 0.0236. Notice that the p-values do not differ much assuming equal or unequal variances because of the fairly large sample sizes of 217 and 215 data points). Assignment PA281 Apply Steps 1 to 5 for the flights originating in Marseille and Edinburgh. 351 Problem PS282 Consider the data set ‘Sabena’. Assume the first 50 observations to be a random sample of the data. 1. Use the sample of 50 observations to test the hypothesis that the average delay at departure equals the average delay at arrival. Clearly delays at departure and at arrival cannot be considered independent. Therefore to compute the p-value for a two-sided test, use the instruction = π. ππΈππ(π½2: π½51; π2: π51; 2; 1), say in cell S5; Answer: π − π£πππ’π = 1.0135πΈ − 3 (significantly different means); 2. Assume now that delays at departure and arrival are independent (wrong assumption). Test again the equality of average delays using the sample of 50 observations, both assuming equal and unequal variances. Use the function π. ππΈππ with last argument 2 and 3. Answer: π − π£πππ’π = 8.3096πΈ − 02 assuming equal variances and π − π£πππ’π = 8.3176πΈ − 02 assuming unequal variances. Notice the (much) larger p-values compared to Step 1; 3. Check the dependence of both delays by computing the sample correlation coefficient. Answer: ππππ = 0.759; 4. Derive the p-value in Step 1 in an alternative way as follows: compute the difference of the delays for the 50 observations in column R, rows 2 to 51. Use the instruction = πππ π(50) π. π·πΌππ. 2π (π΄ππΈπ π΄πΊπΈ(π 2: π 51) ∗ πππ·πΈπ.π(π 2:π 51) ; 49). Compare with the p-value in Step 1. Assignment PA282 Apply Steps 1 to 4 above using the next 50 observations as a random sample. 352 Problem PS283 Consider the data set ‘Sabena’. We compare the proportion of flights with a delay of more than 5 minutes at arrival for flights originating in Marseille (LINE STATION-DEP=MRS) and Florence (LINE STATION-DEP=FLR). 1. Generate in column R the value 1 for flights from Marseille with a delay of more than 5 minutes. Example for cell R2: = πΌπΉ(π΄ππ·(πΊ2 = "ππ π";π2 > 5); 1; ""). Do likewise in column S for flights originating in Florence. 2. Count the number of flights originating from Marseille in cell T2: = πΆππππ(πΊ2: πΊ3854, "ππ π") and do likewise for the flights from Florence in cell U2. Compute the proportion of flights with a delay of more than 5 minutes originating from Marseille in cell T3: = πππ(π 2: π 3854)⁄π2. Do the same for the flights from Florence in cell U3. Compute the pooled proportion in cell U4: = (π2 ∗ π3 + π2 ∗ π3)⁄(π2 + π3) Answer: Marseille: 0.6590, Florence: 0.6093, pooled proportion: 0.6343; 3. Compute a 95% C.I. for the difference of the proportions of late flights from Marseille and Florence. For the lower bound in cell T6: = π3 − π3 − πππ π. π. πΌππ(0.975) ∗ πππ π(π3 ∗ (1 − π3)⁄π2 + π3 ∗ (1 − π3)⁄π2). Do likewise for the upper bound in cell T7. Answer: lower bound: -0.0410, upper bound: 0.1404; 4. Apply a two-sided test for the equality of proportions of late flights from Marseille and Florence. First compute the ratio in cell T10: = (π3 − π3)⁄πππ π(π4 ∗ (1 − π4) ∗ (1⁄π2 + 1⁄π2)). Compute the p-value in cell T11: = 2 ∗ (1 − πππ π. π. π·πΌππ(π10; ππ ππΈ)). Answer: π − π£πππ’π = 0.2837. Equality of the proportions cannot be rejected; Assignment PA283 Apply Steps 1 to 4 for the flights originating in Marseille and Edinburgh.