Ch13. Confidence interval and Testing hypothesis two

advertisement
343
Chapter 13. Confidence intervals and Test of hypothesis comparing two
parameters
Problem PS274
1. Report the mean πœ‡1 = 13 and the standard deviation 𝜎1 = 3 of a normal random
variable in cells A1 and A2. Do likewise for a second normal random variable in cells
B1 and B2: πœ‡2 = 10 and 𝜎2 = 2. Generate 25 values of the first normal variable in the
range A3:A27 and 16 values of the second variable in the range B3:B18. Compute the
mean of the first sample in cell E3 and the mean of the second sample in cell F3;
2. Compute a 95% C.I. for the difference πœ‡1 − πœ‡2 assuming the variances 𝜎12 and 𝜎22 are
known in cell D7 (lower bound) and cell D8 (upper bound). Example for cell D7: =
𝐸3 − 𝐹3 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(0.975) ∗ 𝑆𝑄𝑅𝑇(𝐴2 ∗ 𝐴2⁄25 + 𝐡2 ∗ 𝐡2). Likewise for cell
D8. Compute the width of the interval in cell D9;
3. As in Step 2 for the 99% C.I. in cells D12 to D14;
4. Repeat the simulation a number of times, compare each time the width of the
intervals and check whether the value 0 is contained in the interval;
5. Change the mean of the second normal variable to 12 and check the results;
6. Restore the value of the mean of the second variable to 10, change the standard
deviation of the first variable to 8, of the second variable to 4 and check the influence
of the increased standard deviations on the width of the intervals.
Assignment PA274
Generate data as in Step 1 above. Apply a two-sided hypothesis test 𝐻0 : πœ‡1 − πœ‡2 = 0
assuming variances known. Use the following decision rule (𝑝∗ = 𝑝 − π‘£π‘Žπ‘™π‘’π‘’): when 𝑝∗ <
.01: π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘ π‘‘π‘Ÿπ‘œπ‘›π‘”π‘™π‘¦, when 0.01 < 𝑝∗ < 0.05, π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 , when 0.05 < 𝑝∗ <
0.1, π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘šπ‘–π‘™π‘‘π‘™π‘¦, when 𝑝∗ > .1, π‘Žπ‘π‘π‘’π‘π‘‘ 𝐻0 .
Apply a one-sided test with 𝐻𝐴 : πœ‡1 − πœ‡2 > 0 and a significance level of 0.05.
344
Problem PS275
1. Generate data as in Step 1 of Problem PS274 changing 𝜎2 to the value 3. Compute the
means of both samples in cells E3 and F3, the variances in cells E4 and F4. Compute
the pooled sample variance in cell F5: = (24 ∗ 𝐸4 + 15 ∗ 𝐹4)⁄(25 + 16 − 2);
2. Compute a 95% C.I. for the difference πœ‡1 − πœ‡2 assuming the variances 𝜎12 and 𝜎22 are
unknown but equal in cell D8 (lower bound) and cell D9 (upper bound). Example for
cell D8: = 𝐸3 − 𝐹3 − 𝑇. 𝐼𝑁𝑉. 2𝑇(0.05; 25 + 16 − 2) ∗ 𝑆𝑄𝑅𝑇(𝐹5 ∗ (1⁄25 + 1⁄16)).
Likewise for cell D9. Compute the width of the interval in cell D10;
3. Perform a two-sided test for 𝐻0 : πœ‡1 − πœ‡2 = 0 with significance level 0.05. Compute
the T-ratio in cell D13: = (𝐸3 − 𝐹3)⁄𝑆𝑄𝑅𝑇(𝐹5 ∗ (1⁄25 + 1⁄16)) and the p-value in
cell D14: = 𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(𝐷13; 25 + 16 − 2);
4. Use an alternative formula available in Excel to compute the p-value in cell D15: =
𝑇. 𝑇𝐸𝑆𝑇(𝐴3: 𝐴27; 𝐡3: 𝐡18; 2; 2). Report the decision in cell D16:
= 𝐼𝐹 (𝐷15 < .01; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘ π‘‘π‘Ÿπ‘œπ‘›π‘”π‘™π‘¦"; 𝐼𝐹(𝐷15 < .05; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0"; 𝐼𝐹(𝐷15 <
.1; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘šπ‘–π‘™π‘‘π‘™π‘¦"; "π‘Žπ‘π‘π‘’π‘π‘‘ 𝐻0")));
5. Change the mean of the second normal variable to 12.5 and repeat the simulation;
6. Keep the second mean at 12.5 and decrease the value of both standard deviations to
0.5. Repeat the simulation a few times (key F9).
Assignment PA275
Generate data as in Step 1 above with the mean of the second normal variable equal to 12.
Perform a two-sided test to compare the means in two ways: test 1 assumes the variances
known as in Assignment PA274 and test 2 assumes the variances unknown as above.
Repeat the test several times and notice that the outcomes may differ.
345
Problem PS276
1. Generate data as in Step 1 of Problem PS274 changing 𝜎1 to the value 5. Compute the
means of both samples in cells E3 and F3, the variances in cells E4 and F4. Compute
the pooled sample variance in cell F5: = (24 ∗ 𝐸4 + 15 ∗ 𝐹4)⁄(25 + 16 − 2);
2. Compute a 95% C.I. for the difference πœ‡1 − πœ‡2 assuming the variances 𝜎12 and 𝜎22 are
not equal (Behrens-Fisher problem). To do this, first compute 𝑠12 ⁄𝑛1 and 𝑠22 ⁄𝑛2 in
cells E8 and F8 and the degrees of freedom using the formula
= π‘ƒπ‘‚π‘ŠπΈπ‘…(𝐸8 + 𝐹8; 2)⁄(π‘ƒπ‘‚π‘ŠπΈπ‘…(𝐸8; 2)⁄24 + π‘ƒπ‘‚π‘ŠπΈπ‘…(𝐹8; 2)⁄15) in cell E9.
Compute the lower bound of the interval in cell E11, the upper bound in cell E12.
Example for cell E11: = 𝐸3 − 𝐹3 − 𝑇. 𝐼𝑁𝑉. 2𝑇(0.05; 𝐸9) ∗ 𝑆𝑄𝑅𝑇(𝐸8 + 𝐸9) and
likewise for the upper bound.
Compute the width of the interval in cell E13;
3. Compute a 95% C.I. for the difference πœ‡1 − πœ‡2 assuming the variances 𝜎12 and 𝜎22 are
not equal (wrong assumption) in cells I11 and I12. Compute the width of the interval
in cell I13. Compare the results with Step 2;
4. Perform a two-sided test for 𝐻0 : πœ‡1 − πœ‡2 = 0. Compute the T-ratio in cell D16:
= (𝐸3 − 𝐹3)⁄𝑆𝑄𝑅𝑇(𝐸8 + 𝐹8). Compute the p-value in cell D17: =
𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(𝐷16; 𝐸9);
5. Use an alternative formula available in Excel to compute the p-value in cell D19: =
𝑇. 𝑇𝐸𝑆𝑇(𝐴3: 𝐴27; 𝐡3: 𝐡18; 2; 3). The p-values in Step 4 and 5 may slightly differ
probably due to a different handling of non-integer degrees of freedom. Report the
decision in cell D20 similar to Step 4 in Problem PS275;
6. Perform a two-sided test assuming equal variances by computing the p-value using
the T.TEST function of Excel (cell I17). Compare the result with Step 5.
Assignment PA276
As the T-Test above is an approximate procedure, the dof are sometimes computed
according to a slightly different formula: =
(𝑠12 ⁄𝑛1 +𝑠22 ⁄𝑛2 )
2
2
2
(𝑠12 ⁄𝑛1 ) ⁄𝑛1 +(𝑠22 ⁄𝑛2 ) ⁄𝑛2
− 2. Use this formula to
compute the dof for the data above and compare with the result in Step 2.
346
Problem PS277
1. Report the mean πœ‡1 = 10 and the standard deviation 𝜎1 = 2.5 of a normal random
variable in cells A1 and A2. Generate 20 values of the normal variable in the range
A4:A23. Generate dependent values of a second variable in the range B4:B23 as
follows (example for cell B4): = 𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑅𝐴𝑁𝐷( ); 𝐴4 − 1.5; 2) and likewise for
cells B5:B23. Compute the pairwise differences in cells C4:C23. Compute the mean of
the first sample in cell F3, the mean of the second sample in cell G3 and the mean of
the differences in cell H3. Compute the sample variances in cells F4 to H4. Compute
the pooled sample variance in cell F5;
2. Perform a two-sided test for the equality of means for dependent samples. Therefore
compute the ratio 𝐻3⁄𝑆𝑄𝑅𝑇(𝐻4⁄20) in cell E7 and the p-value in cell E8: =
𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(𝐴𝐡𝑆(𝐸7; 19)).
An alternative to compute the p-value in Excel in cell E9: =
𝑇. 𝑇𝐸𝑆𝑇(𝐴4: 𝐴23; 𝐡4: 𝐡23; 2; 1).
Report the conclusion in cell E10: = 𝐼𝐹 (𝐸9 < .01; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘ π‘‘π‘Ÿπ‘œπ‘›π‘”π‘™π‘¦"; 𝐼𝐹(𝐸9 <
.05; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0"; 𝐼𝐹(𝐸9 < .1; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘šπ‘–π‘™π‘‘π‘™π‘¦"; "π‘Žπ‘π‘π‘’π‘π‘‘ 𝐻0")));
3. Perform a two-sided test for the equality of means for independent samples (wrong
assumption) assuming equal variances (use the T.TEST function). Compute the pvalue in cell E13 and the conclusion in cell E14 (see Problem PS276). Notice the
strong effect of the dependence in the samples;
4. Perform a two-sided test for the equality of means for independent samples (wrong
assumption) assuming unknown variances (use the T.TEST function). Compute the
p-value in cell E17 and the conclusion in cell E18 (see Problem PS276). Notice also
here the strong effect of the dependence in the sample.
Assignment PA277
Generate data as in Step 1 above. Perform a two-sided test for the equality of the population
means assuming dependent data. Compare the result with a two-sided test assuming
1. independent samples with known population variances;
2. independent samples with unknown but equal population variances.
347
Problem PS278
We test the equality of two variances of normally distributed data.
1. Generate data as in Step 1 in Problem PS274. Compute the variances of both samples
in cells E3 and F3;
2. Compute the ratio of both variances in cell E6: = 𝐸3⁄𝐹3 and the p-value for a twosided test for the equality of the two variances in cell E7:
= 𝐼𝐹 (𝐸6 > 1,2 ∗ 𝐹. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝐸6; 24; 15); 2 ∗ (1 − 𝐹. 𝐷𝐼𝑆𝑇. 2𝑇(𝐸6; 24; 15))). Draw
the conclusion in cell E8: = 𝐼𝐹 (𝐸7 < .01; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘ π‘‘π‘Ÿπ‘œπ‘›π‘”π‘™π‘¦"; 𝐼𝐹(𝐸7 <
.05; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0"; 𝐼𝐹(𝐸7 < .1; "π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝐻0 π‘šπ‘–π‘™π‘‘π‘™π‘¦"; "π‘Žπ‘π‘π‘’π‘π‘‘ 𝐻0")));
3. An alternative and more direct way of computing the p-value for the two-sided test
is possible using = 𝐹. 𝑇𝐸𝑆𝑇(𝐴3: 𝐴27; 𝐡3: 𝐡18). Notice that Excel assumes that a twosided test is required;
4. Perform a one-sided test of the form: 𝐻0 : 𝜎12 = 𝜎22 versus 𝐻𝐴 : 𝜎12 > 𝜎22 . Compute the
p-value in cell E13: = 𝐹. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝐸6; 24; 15) and the conclusion in cell E14 similar
to cell E8.
Assignment PA278
Repeat Step 1 above. Compute a 95% confidence interval for the ratio 𝜎12 ⁄𝜎22 .
348
Problem PS279
Test of the equality of two proportions or fractions.
1. Report the (unknown) true proportion of successes of a first population in cell A2
(0.4), of a second population in cell B2 (0.3). Generate 80 successes/failures from the
first population in the range A3:A82. Example for cell A3:
= 𝐼𝐹(𝑅𝐴𝑁𝐷( ) < 𝐴$2; "𝑆"; "𝐹") . Do likewise for 75 observations from population 2
in the range B3:B77;
2. Compute the sample proportion of successes in both samples in cells D3 and E3.
Example for cell D3: = πΆπ‘‚π‘ˆπ‘π‘‡πΌπΉ(𝐴3: 𝐴82; "𝑆")⁄80 and similarly for cell E3.
Assuming that both proportions in the populations are equal, compute the estimate
of the equal proportion in cell E4: = (80 ∗ 𝐷3 + 75 ∗ 𝐸3)⁄155;
3. Compute the test ratio in cell D7: =
(𝐷3 − 𝐸3)⁄𝑆𝑄𝑅𝑇(𝐸4 ∗ (1 − 𝐸4) ∗ (1⁄80 + 1⁄75)) and the p-value for a two-sided
test of the equality of the two proportions in cell D8:
= 2 ∗ (1 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(𝐴𝐡𝑆(𝐷7); 1)). Report the conclusion in cell D9 similar to
the problems above. Notice that for these sample sizes the null hypothesis of equal
proportions is often accepted;
4. Change the population proportion of the second population to 0.2 and check the
results.
Assignment PA279
Repeat Step 1 above. Construct a 95% confidence interval for the difference πœ‹1 − πœ‹2 of the
population proportions.
349
Problem PS280
Consider the data set ‘Rice’. Assume the weights for the different fillers are (approximately)
normally distributed.
1. Compute the sample variances for the weights for fillers 1 and 2 in cells G2 and G3,
the pooled sample variance in cell J2;
2. Test the equality of the variances of the weights of filler1 and filler2 in cell G4: =
𝐹. 𝑇𝐸𝑆𝑇(π‘“π‘–π‘™π‘™π‘’π‘Ÿ1; π‘“π‘–π‘™π‘™π‘’π‘Ÿ2) where filler1 and filler2 are the names of the data for the
two fillers. Notice that this function applies a two-sided test for the equality of the
variances.
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = .3581. The hypothesis that both variances are equal cannot be
rejected;
3. Compute a 95% C.I. for the ratio 𝜎12 ⁄𝜎22 . The lower bound in cell G6: =
𝐹. 𝐼𝑁𝑉(0.025; 19; 19) ∗ 𝐺2⁄𝐺3 and likewise for the upper bound;
4. Compute the sample means of the weights for filler 1 and 2 in cells G9 and G10;
5. Apply a two-sided test for the equality of the means of the weights of filler1 and
filler2 in cell G12 (assume variances equal): = 𝑇. 𝑇𝐸𝑆𝑇(π‘“π‘–π‘™π‘™π‘’π‘Ÿ1; π‘“π‘–π‘™π‘™π‘’π‘Ÿ2; 2; 2).
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 1.1623𝐸 − 04. The hypothesis that both means are equal is
strongly rejected;
6. Apply a two-sided test for the equality of the means of the weights of filler 1 and 2
assuming the variances are unequal in cell G13: = 𝑇. 𝑇𝐸𝑆𝑇(π‘“π‘–π‘™π‘™π‘’π‘Ÿ1; π‘“π‘–π‘™π‘™π‘’π‘Ÿ2; 2; 3).
Compare the p-value with Step5;
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 1.2402𝐸 − 04. The p-values in Steps 5 and 6 are quite close
and lead to the same conclusion;
7. Derive a 95% C.I. for the difference of the means for filler 1 and 12. The lower bound
in cell G15: = 𝐺9 − 𝐺10 − 𝑇. 𝐼𝑁𝑉. 2𝑇(0.05; 38) ∗ 𝑆𝑄𝑅𝑇(𝐽2 ∗ (1⁄20 + 1⁄20)) and
likewise for the upper bound.
Assignment PA280
Apply Steps 1 to 7 above for the fillers 1 and 4 of the data set ‘Rice’.
350
Problem PS281
Consider the data set ‘Sabena’. We compare the mean delay at arrival between flights
originating in Marseille (LINE STATION-DEP=MRS) and Florence (LINE STATIONDEP=FLR).
1. Generate in column R the delay times at arrival for flights originating in Marseille.
Example for cell R2: = 𝐼𝐹(𝐺2 = "MRS"; 𝑄2; ""). Do likewise in column S for flights
originating in Florence;
2. Compute the mean and the variance of delay times of flights originating in Marseille
in cells T2 and T3. Do likewise for the flights originating in Florence in cells U2 and
U3;
3. Cell T7: apply a two-sided test for the equality of the variances of the delays at
arrival between flights originating in Marseille and in Florence (assume normality of
delay times, see later): = 𝐹. 𝑇𝐸𝑆𝑇(𝑅2: 𝑅3854; 𝑆2: 𝑆3854).
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 1.2070𝐸 − 22. Decision: reject the equality of the variances
strongly;
4. Cell T10: apply a two-sided test for the equality of the means of the delays at arrival
between flights originating in Marseille and in Florence (assume normality of delay
times, see later): = 𝑇. 𝑇𝐸𝑆𝑇(𝑅2: 𝑅3854; 𝑆2: 𝑆3854; 2; 3).
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 0.0242. Decision: reject the equality of the means;
5. Cell T13: : apply a two-sided test for the equality of the means of the delays at arrival
between flights originating in Marseille and in Florence assuming equal variances
(which is actually false): = 𝑇. 𝑇𝐸𝑆𝑇(𝑅2: 𝑅3854; 𝑆2: 𝑆3854; 2; 2).
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 0.0236. Notice that the p-values do not differ much assuming
equal or unequal variances because of the fairly large sample sizes of 217 and 215
data points).
Assignment PA281
Apply Steps 1 to 5 for the flights originating in Marseille and Edinburgh.
351
Problem PS282
Consider the data set ‘Sabena’. Assume the first 50 observations to be a random sample of
the data.
1. Use the sample of 50 observations to test the hypothesis that the average delay at
departure equals the average delay at arrival. Clearly delays at departure and at
arrival cannot be considered independent. Therefore to compute the p-value for a
two-sided test, use the instruction = 𝑇. 𝑇𝐸𝑆𝑇(𝐽2: 𝐽51; 𝑄2: 𝑄51; 2; 1), say in cell S5;
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 1.0135𝐸 − 3 (significantly different means);
2. Assume now that delays at departure and arrival are independent (wrong
assumption). Test again the equality of average delays using the sample of 50
observations, both assuming equal and unequal variances. Use the function 𝑇. 𝑇𝐸𝑆𝑇
with last argument 2 and 3.
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 8.3096𝐸 − 02 assuming equal variances and 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ =
8.3176𝐸 − 02 assuming unequal variances. Notice the (much) larger p-values
compared to Step 1;
3. Check the dependence of both delays by computing the sample correlation
coefficient.
Answer: π‘π‘œπ‘Ÿπ‘Ÿ = 0.759;
4. Derive the p-value in Step 1 in an alternative way as follows: compute the difference
of the delays for the 50 observations in column R, rows 2 to 51. Use the instruction =
𝑆𝑄𝑅𝑇(50)
𝑇. 𝐷𝐼𝑆𝑇. 2𝑇 (𝐴𝑉𝐸𝑅𝐴𝐺𝐸(𝑅2: 𝑅51) ∗ 𝑆𝑇𝐷𝐸𝑉.𝑆(𝑅2:𝑅51) ; 49). Compare with the p-value in
Step 1.
Assignment PA282
Apply Steps 1 to 4 above using the next 50 observations as a random sample.
352
Problem PS283
Consider the data set ‘Sabena’. We compare the proportion of flights with a delay of more
than 5 minutes at arrival for flights originating in Marseille (LINE STATION-DEP=MRS) and
Florence (LINE STATION-DEP=FLR).
1. Generate in column R the value 1 for flights from Marseille with a delay of more than
5 minutes. Example for cell R2: = 𝐼𝐹(𝐴𝑁𝐷(𝐺2 = "𝑀𝑅𝑆";𝑄2 > 5); 1; ""). Do likewise
in column S for flights originating in Florence.
2. Count the number of flights originating from Marseille in cell T2: =
πΆπ‘‚π‘ˆπ‘π‘‡(𝐺2: 𝐺3854, "𝑀𝑅𝑆") and do likewise for the flights from Florence in cell U2.
Compute the proportion of flights with a delay of more than 5 minutes originating
from Marseille in cell T3: = π‘†π‘ˆπ‘€(𝑅2: 𝑅3854)⁄𝑇2. Do the same for the flights from
Florence in cell U3. Compute the pooled proportion in cell U4: =
(𝑇2 ∗ 𝑇3 + π‘ˆ2 ∗ π‘ˆ3)⁄(𝑇2 + 𝑇3)
Answer: Marseille: 0.6590, Florence: 0.6093, pooled proportion: 0.6343;
3. Compute a 95% C.I. for the difference of the proportions of late flights from Marseille
and Florence. For the lower bound in cell T6: = 𝑇3 − π‘ˆ3 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(0.975) ∗
𝑆𝑄𝑅𝑇(𝑇3 ∗ (1 − 𝑇3)⁄𝑇2 + π‘ˆ3 ∗ (1 − π‘ˆ3)⁄π‘ˆ2). Do likewise for the upper bound in
cell T7.
Answer: lower bound: -0.0410, upper bound: 0.1404;
4. Apply a two-sided test for the equality of proportions of late flights from Marseille
and Florence. First compute the ratio in cell T10: =
(𝑇3 − π‘ˆ3)⁄𝑆𝑄𝑅𝑇(π‘ˆ4 ∗ (1 − π‘ˆ4) ∗ (1⁄𝑇2 + 1⁄π‘ˆ2)). Compute the p-value in cell T11:
= 2 ∗ (1 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(𝑇10; π‘‡π‘…π‘ˆπΈ)).
Answer: 𝑝 − π‘£π‘Žπ‘™π‘’π‘’ = 0.2837. Equality of the proportions cannot be rejected;
Assignment PA283
Apply Steps 1 to 4 for the flights originating in Marseille and Edinburgh.
Download