Ch14. Multinomial populations

advertisement
353
Chapter 14. Multinomial populations
Problem PS284
A population P consists of three types of elements: 50% of type 1 (T1), 20% of type 2 (T2)
and 30% of type 3 (T3).
1. Generate a random sample of 100 elements of P: random numbers in cells A3:A102,
the proportions 0.5, 0.2 and 0.3 in cells B2, C2 and D2,
T1-elements in column B. Example for cell B3: = 𝐼𝐹(𝐴3 < $𝐡$2; "𝑇1"; "").
T2-elements in column C. Example for cell C3:
= 𝐼𝐹(𝐴𝑁𝐷(𝐡3 = ""; 𝐴3 < $𝐡$2 + $𝐢$2); "𝑇2"; "")).
T3-elements in column D. Example for cell D3:
= 𝐼𝐹(𝐴𝑁𝐷(𝐡3 = ""; 𝐢3 = ""); "𝑇3"; "");
2. Compute the sample proportions of T1, T2 and T3 in cells H2, I2 and J2. Example for
cell H2: = πΆπ‘‚π‘ˆπ‘π‘‡πΌπΉ(𝐡3: 𝐡102; "𝑇1")/100 and likewise for T2 and T3 elements.
Compare the sample proportions with the exact proportions;
3. Compute the estimated sample variances of the proportions in cells H4:J4. Example
for cell H4: = 𝐻4 ∗ (1 − 𝐻4)⁄100 and likewise for cells I4 and J4;
4. Compute an estimate of the covariance between sample proportions of T1 and T2 in
cell H6: = − 𝐻2 ∗ 𝐼2⁄100. Do likewise for the estimated covariance between the
sample proportions of T1 and T3 in cell H7 and for the sample proportions of T2 and
T3 in cell H8;
5. Compute an estimate of the correlation between sample proportions of T1 and T2 in
cell H11: = − 𝐻6⁄𝑆𝑄𝑅𝑇(𝐻4 ∗ 𝐼4). Do likewise for the estimated correlation between
the sample proportions of T1 and T3 in cell H12 and for the sample proportions of
T2 and T3 in cell H13;
6. Compute an estimate of the variance of the difference of the sample proportions T1
and T2 in cell H16: = 𝐻4 + 𝐼4 − 2 ∗ 𝐻6. Do likewise for T1 and T3 proportions in cell
H17 and for T2 and T3 proportions in cell H18;
7. Compute a 95% C.I. for the difference in proportions of T1 and T2 elements in the
population using the sample information. The lower bound of the interval in cell
354
H21: = 𝐻2 − 𝐼2 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(0.975) ∗ 𝑆𝑄𝑅𝑇(𝐻16) and similarly for the upper
bound in cell I21. Do likewise for the difference I proportions of T1 and T3 in cells
H22 and I22 and for T2 and T3 in cells H23 and I23. Check whether the true
proportions are within the 95% confidence intervals;
8. Test the hypothesis that the proportions in the population of the three types are
equal. Work as follows: report the sample frequencies of the three types in cells
H26:J26 and the expected frequencies assuming equal proportions in cells H27:J27
(values of 100⁄3). Compute a Chi-square sample statistic as follows: in cell H29 the
instruction = π‘ƒπ‘‚π‘ŠπΈπ‘…(𝐻26 − 𝐻27; 2)⁄𝐻27 and likewise in cells I29 and J29.
Compute the Chi-square sample statistic in cell K29: = π‘†π‘ˆπ‘€(𝐻29: 𝐽29). The p-value
of the test can now be computed in cell H30: = 𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝐾29; 2). Draw a
conclusion for the test using the p-value;
9. The p-value can also be computed more directly using the function CHITEST and the
observed and expected frequencies. Compute the p-value again in cell H31: =
𝐢𝐻𝐼𝑇𝐸𝑆𝑇(𝐻26: 𝐽26; 𝐻27: 𝐽27) and compare with the p-value in Step 8.
Assignment PA284
1. Change the probabilities in the analysis above to 0.4, 0.3, 0.3 and check and comment
on the results;
2. Change the probabilities in the analysis above to 0.6, 0.4, 0 and check and comment
on the results.
355
Problem PS285
Consider the data set ‘Sabena’. We investigate whether the country of departure of the flight
and the delay at arrival in Brussels can be considered to be independent.
1. Determine in column R the country of departure of the flight: GB (Great Britain) with
airports NCL, BHX, GLA, EDI, LCY, LBA and BRS, GERM(any) with airports HAJ, THF,
HAM, DUS, FRA(nce) with airports MRS, SXB, BOD and TLS, ITA(ly) with airports
NAP, FLR, TRN, DEN(mark) with airport CPH and HUN(gary) with airport BUD.
Example for cell R2: = 𝐼𝐹 (𝑂𝑅(𝐺2 =
{"𝑁𝐢𝐿"; "𝐡𝐻𝑋"; "𝐺𝐿𝐴"; "𝐸𝐷𝐼"; "πΏπΆπ‘Œ"; "𝐿𝐡𝐴"; "𝐡𝑅𝑆"}) = π‘‡π‘…π‘ˆπΈ; "𝐺𝐡"; 𝐼𝐹 (𝑂𝑅(𝐺2 =
{"𝐻𝐴𝐽"; "𝑇𝐻𝐹"; "𝐻𝐴𝑀"; "π·π‘ˆπ‘†"}) = π‘‡π‘…π‘ˆπΈ; "𝐺𝐸𝑅𝑀"; 𝐼𝐹 (𝑂𝑅(𝐺2 =
{"𝑀𝑅𝑆"; "𝑆𝑋𝐡"; "𝐡𝑂𝐷"; "𝑇𝐿𝑆"}) = π‘‡π‘…π‘ˆπΈ; "𝐹𝑅𝐴"; 𝐼𝐹(𝑂𝑅(𝐺2 =
{"𝑁𝐴𝑃"; "𝐹𝐿𝑅"; "𝑇𝑅𝑁"}) = π‘‡π‘…π‘ˆπΈ; "𝐼𝑇𝐴"; 𝐼𝐹(𝑂𝑅(𝐺2 = "𝐢𝑃𝐻") =
π‘‡π‘…π‘ˆπΈ; "𝐷𝐸𝑁"; "π»π‘ˆπ‘"))))) and likewise for R3:R3854;
2. Take a random sample of 200 flights from column B in the range T2:T201. Example
for cell T2: = 𝐼𝑁𝐷𝐸𝑋($𝐡$2: $𝐡$3854; π‘…π΄π‘π·π΅πΈπ‘‡π‘ŠπΈπΈπ‘(1; 3853)) and likewise for
cells T3:T201. In column U, determine the country of departure of the flights
selected in column T. Example for cell U2: =
𝐼𝑁𝐷𝐸𝑋($𝑅$2: $𝑅$3854; 𝑀𝐴𝑇𝐢𝐻(𝑇2; $𝐡$2: $𝐡$3854; 0)) and likewise for cells
U3:U201. Give the name ‘country’ to the range U2:U201.
In column V, determine the delay at arrival of the flights selected in column R.
Example for cell V2: = 𝐼𝑁𝐷𝐸𝑋($𝑄$2: $𝑄$3854; 𝑀𝐴𝑇𝐢𝐻(𝑇2; $𝐡$2: $𝐡$3854; 0)) and
likewise for cells V3:V201. Give the name ‘delay’ to the range V2:V201;
3. Set up a table of observed frequencies of countries and delays. Report the names of
the countries in cells X3:X8. In cell Y2 to AC2 report the delay at arrival. In cell Y2: in
time (delay smaller than or equal to 0), cell Z2: 0-5 (at most 5 minutes late), cell AA2:
5-10 (more than 5 minutes late but at most 10), cell AB2: 10-20 (between 10 and 20
minutes late), cell AC2: >20 (more than 20 minutes late). In the range Y3:AC8,
compute the number of flights in the sample originating at the country in cells X3:X8
and a delay in cells Y2:AC2. Example for cell Z3: =
πΆπ‘‚π‘ˆπ‘π‘‡πΌπΉπ‘†(π‘π‘œπ‘’π‘›π‘‘π‘Ÿπ‘¦; GB; π‘‘π‘’π‘™π‘Žπ‘¦; " > 0"; π‘‘π‘’π‘™π‘Žπ‘¦; " ≤ 5") and likewise for the
remaining cells in the table. Make the row sums of the table in cells AD3:AD8, the
356
column sums in cells Y9:AC9;
4. Set up the table of expected frequencies in cells Y13:AC18. As in Step 3, report the
names of the countries in X13:X18, the delay in cells Y12:AC12. Example for the
expected frequency in cell Y13: = $𝐴𝐷3 ∗ π‘Œ$9/200 and likewise for the remaining
cells of the table;
5. Compute the p-value for the test of independence of flight of departure and delay at
arrival in cell X21: = 𝐢𝐻𝐼𝑆𝑄. 𝑇𝐸𝑆𝑇(π‘Œ3: 𝐴𝐢8; π‘Œ13: 𝐴𝐢18). Interpret the result;
6. A more elaborate way to compute the p-value computes the Chi-square test statistic.
Set up a new table to compute the sum of squared differences between observed and
expected frequencies divided by the expected frequencies, cell Y24:AC28. Example
for cell Y24: π‘ƒπ‘‚π‘ŠπΈπ‘…(π‘Œ3 − π‘Œ13; 2)⁄π‘Œ13 and likewise for the remaining cells in the
table. Compute the Chi-square statistic in cell X31: = π‘†π‘ˆπ‘€(π‘Œ24: 𝐴𝐢28) and the pvalue in cell X32: = 𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝑋31; 20). Compare the p-values in Step 5 and
Step 6.
Assignment PA285
Repeat the steps above to investigate whether the country of departure of the flight and the
delay at departure can be considered to be independent.
357
Problem PS286
1. We investigate the preferences about the sweetness of chocolates for men, women
and children. A sample of 250 women, 200 men and 100 children gave the following
results for the preferences of the three groups:
Taste
Very sweet
Sweet
Bitter
Very bitter
Total
Women
102
65
51
32
250
Men
57
43
72
28
200
Children
62
21
10
7
100
2. Copy this table in Excel in the range B2:E7 and estimate the probabilities for the
various tastes assuming that the preferences of women, men and children are the
same. Example for cell G3 (very sweet): = π‘†π‘ˆπ‘€(𝐢3: 𝐸3)⁄π‘†π‘ˆπ‘€(𝐢$7: 𝐸$7) and
likewise for cells G4:G6;
3. Construct a table in the range B10:E15 containing the expected frequencies
assuming the preferences are the same within each group by using the estimated
probabilities computed in Step 2. Example for cell C11 (very sweet, women): = 𝐢$7 ∗
$𝐺3 and similarly for the remaining cells;
4. Compute the p-value for the test of homogeneity of women, men and children with
respect to sweetness of chocolate in cell B18: = 𝐢𝐻𝐼𝑆𝑄. 𝑇𝐸𝑆𝑇(𝐢3: 𝐸6; 𝐢11: 𝐸14) .
The result is 7.03277E-08 which means that the assumption of equal distributions of
preferences for the three groups can be strongly rejected;
5. A more elaborate way to compute the p-value computes the Chi-square test statistic.
Set up a new table to compute the sum of squared differences between observed and
expected frequencies divided by the expected frequencies, range G10:J14. Example
for cell H11: = π‘ƒπ‘‚π‘ŠπΈπ‘…(𝐢3 − 𝐢11; 2)⁄𝐢11 and likewise for the remaining cells.
Compute the Chi-square statistic in cell G18: =SUM(H11:J14) and the p-value in cell
G19: = 𝐢𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝐺18; 6). Compare the result with the result in Step 4.
358
Assignment PA286
An investigation into the preferences of 80 men and 120 women with respect to the color of
packaging of a specific product resulted in the following:
Woman
Man
Total
Blue
56
40
96
Green
36
20
56
Red
20
10
30
Orange
8
10
18
Total
120
80
200
Use a test of homogeneity to investigate whether the preferences of men and women are
the same with respect to the color of the packaging.
Download