353 Chapter 14. Multinomial populations Problem PS284 A population P consists of three types of elements: 50% of type 1 (T1), 20% of type 2 (T2) and 30% of type 3 (T3). 1. Generate a random sample of 100 elements of P: random numbers in cells A3:A102, the proportions 0.5, 0.2 and 0.3 in cells B2, C2 and D2, T1-elements in column B. Example for cell B3: = πΌπΉ(π΄3 < $π΅$2; "π1"; ""). T2-elements in column C. Example for cell C3: = πΌπΉ(π΄ππ·(π΅3 = ""; π΄3 < $π΅$2 + $πΆ$2); "π2"; "")). T3-elements in column D. Example for cell D3: = πΌπΉ(π΄ππ·(π΅3 = ""; πΆ3 = ""); "π3"; ""); 2. Compute the sample proportions of T1, T2 and T3 in cells H2, I2 and J2. Example for cell H2: = πΆπππππΌπΉ(π΅3: π΅102; "π1")/100 and likewise for T2 and T3 elements. Compare the sample proportions with the exact proportions; 3. Compute the estimated sample variances of the proportions in cells H4:J4. Example for cell H4: = π»4 ∗ (1 − π»4)⁄100 and likewise for cells I4 and J4; 4. Compute an estimate of the covariance between sample proportions of T1 and T2 in cell H6: = − π»2 ∗ πΌ2⁄100. Do likewise for the estimated covariance between the sample proportions of T1 and T3 in cell H7 and for the sample proportions of T2 and T3 in cell H8; 5. Compute an estimate of the correlation between sample proportions of T1 and T2 in cell H11: = − π»6⁄πππ π(π»4 ∗ πΌ4). Do likewise for the estimated correlation between the sample proportions of T1 and T3 in cell H12 and for the sample proportions of T2 and T3 in cell H13; 6. Compute an estimate of the variance of the difference of the sample proportions T1 and T2 in cell H16: = π»4 + πΌ4 − 2 ∗ π»6. Do likewise for T1 and T3 proportions in cell H17 and for T2 and T3 proportions in cell H18; 7. Compute a 95% C.I. for the difference in proportions of T1 and T2 elements in the population using the sample information. The lower bound of the interval in cell 354 H21: = π»2 − πΌ2 − πππ π. π. πΌππ(0.975) ∗ πππ π(π»16) and similarly for the upper bound in cell I21. Do likewise for the difference I proportions of T1 and T3 in cells H22 and I22 and for T2 and T3 in cells H23 and I23. Check whether the true proportions are within the 95% confidence intervals; 8. Test the hypothesis that the proportions in the population of the three types are equal. Work as follows: report the sample frequencies of the three types in cells H26:J26 and the expected frequencies assuming equal proportions in cells H27:J27 (values of 100⁄3). Compute a Chi-square sample statistic as follows: in cell H29 the instruction = ππππΈπ (π»26 − π»27; 2)⁄π»27 and likewise in cells I29 and J29. Compute the Chi-square sample statistic in cell K29: = πππ(π»29: π½29). The p-value of the test can now be computed in cell H30: = πΆπ»πΌππ. π·πΌππ. π π(πΎ29; 2). Draw a conclusion for the test using the p-value; 9. The p-value can also be computed more directly using the function CHITEST and the observed and expected frequencies. Compute the p-value again in cell H31: = πΆπ»πΌππΈππ(π»26: π½26; π»27: π½27) and compare with the p-value in Step 8. Assignment PA284 1. Change the probabilities in the analysis above to 0.4, 0.3, 0.3 and check and comment on the results; 2. Change the probabilities in the analysis above to 0.6, 0.4, 0 and check and comment on the results. 355 Problem PS285 Consider the data set ‘Sabena’. We investigate whether the country of departure of the flight and the delay at arrival in Brussels can be considered to be independent. 1. Determine in column R the country of departure of the flight: GB (Great Britain) with airports NCL, BHX, GLA, EDI, LCY, LBA and BRS, GERM(any) with airports HAJ, THF, HAM, DUS, FRA(nce) with airports MRS, SXB, BOD and TLS, ITA(ly) with airports NAP, FLR, TRN, DEN(mark) with airport CPH and HUN(gary) with airport BUD. Example for cell R2: = πΌπΉ (ππ (πΊ2 = {"ππΆπΏ"; "π΅π»π"; "πΊπΏπ΄"; "πΈπ·πΌ"; "πΏπΆπ"; "πΏπ΅π΄"; "π΅π π"}) = ππ ππΈ; "πΊπ΅"; πΌπΉ (ππ (πΊ2 = {"π»π΄π½"; "ππ»πΉ"; "π»π΄π"; "π·ππ"}) = ππ ππΈ; "πΊπΈπ π"; πΌπΉ (ππ (πΊ2 = {"ππ π"; "πππ΅"; "π΅ππ·"; "ππΏπ"}) = ππ ππΈ; "πΉπ π΄"; πΌπΉ(ππ (πΊ2 = {"ππ΄π"; "πΉπΏπ "; "ππ π"}) = ππ ππΈ; "πΌππ΄"; πΌπΉ(ππ (πΊ2 = "πΆππ»") = ππ ππΈ; "π·πΈπ"; "π»ππ"))))) and likewise for R3:R3854; 2. Take a random sample of 200 flights from column B in the range T2:T201. Example for cell T2: = πΌππ·πΈπ($π΅$2: $π΅$3854; π π΄ππ·π΅πΈπππΈπΈπ(1; 3853)) and likewise for cells T3:T201. In column U, determine the country of departure of the flights selected in column T. Example for cell U2: = πΌππ·πΈπ($π $2: $π $3854; ππ΄ππΆπ»(π2; $π΅$2: $π΅$3854; 0)) and likewise for cells U3:U201. Give the name ‘country’ to the range U2:U201. In column V, determine the delay at arrival of the flights selected in column R. Example for cell V2: = πΌππ·πΈπ($π$2: $π$3854; ππ΄ππΆπ»(π2; $π΅$2: $π΅$3854; 0)) and likewise for cells V3:V201. Give the name ‘delay’ to the range V2:V201; 3. Set up a table of observed frequencies of countries and delays. Report the names of the countries in cells X3:X8. In cell Y2 to AC2 report the delay at arrival. In cell Y2: in time (delay smaller than or equal to 0), cell Z2: 0-5 (at most 5 minutes late), cell AA2: 5-10 (more than 5 minutes late but at most 10), cell AB2: 10-20 (between 10 and 20 minutes late), cell AC2: >20 (more than 20 minutes late). In the range Y3:AC8, compute the number of flights in the sample originating at the country in cells X3:X8 and a delay in cells Y2:AC2. Example for cell Z3: = πΆπππππΌπΉπ(πππ’ππ‘ππ¦; GB; πππππ¦; " > 0"; πππππ¦; " ≤ 5") and likewise for the remaining cells in the table. Make the row sums of the table in cells AD3:AD8, the 356 column sums in cells Y9:AC9; 4. Set up the table of expected frequencies in cells Y13:AC18. As in Step 3, report the names of the countries in X13:X18, the delay in cells Y12:AC12. Example for the expected frequency in cell Y13: = $π΄π·3 ∗ π$9/200 and likewise for the remaining cells of the table; 5. Compute the p-value for the test of independence of flight of departure and delay at arrival in cell X21: = πΆπ»πΌππ. ππΈππ(π3: π΄πΆ8; π13: π΄πΆ18). Interpret the result; 6. A more elaborate way to compute the p-value computes the Chi-square test statistic. Set up a new table to compute the sum of squared differences between observed and expected frequencies divided by the expected frequencies, cell Y24:AC28. Example for cell Y24: ππππΈπ (π3 − π13; 2)⁄π13 and likewise for the remaining cells in the table. Compute the Chi-square statistic in cell X31: = πππ(π24: π΄πΆ28) and the pvalue in cell X32: = πΆπ»πΌππ. π·πΌππ. π π(π31; 20). Compare the p-values in Step 5 and Step 6. Assignment PA285 Repeat the steps above to investigate whether the country of departure of the flight and the delay at departure can be considered to be independent. 357 Problem PS286 1. We investigate the preferences about the sweetness of chocolates for men, women and children. A sample of 250 women, 200 men and 100 children gave the following results for the preferences of the three groups: Taste Very sweet Sweet Bitter Very bitter Total Women 102 65 51 32 250 Men 57 43 72 28 200 Children 62 21 10 7 100 2. Copy this table in Excel in the range B2:E7 and estimate the probabilities for the various tastes assuming that the preferences of women, men and children are the same. Example for cell G3 (very sweet): = πππ(πΆ3: πΈ3)⁄πππ(πΆ$7: πΈ$7) and likewise for cells G4:G6; 3. Construct a table in the range B10:E15 containing the expected frequencies assuming the preferences are the same within each group by using the estimated probabilities computed in Step 2. Example for cell C11 (very sweet, women): = πΆ$7 ∗ $πΊ3 and similarly for the remaining cells; 4. Compute the p-value for the test of homogeneity of women, men and children with respect to sweetness of chocolate in cell B18: = πΆπ»πΌππ. ππΈππ(πΆ3: πΈ6; πΆ11: πΈ14) . The result is 7.03277E-08 which means that the assumption of equal distributions of preferences for the three groups can be strongly rejected; 5. A more elaborate way to compute the p-value computes the Chi-square test statistic. Set up a new table to compute the sum of squared differences between observed and expected frequencies divided by the expected frequencies, range G10:J14. Example for cell H11: = ππππΈπ (πΆ3 − πΆ11; 2)⁄πΆ11 and likewise for the remaining cells. Compute the Chi-square statistic in cell G18: =SUM(H11:J14) and the p-value in cell G19: = πΆπ»πΌππ. π·πΌππ. π π(πΊ18; 6). Compare the result with the result in Step 4. 358 Assignment PA286 An investigation into the preferences of 80 men and 120 women with respect to the color of packaging of a specific product resulted in the following: Woman Man Total Blue 56 40 96 Green 36 20 56 Red 20 10 30 Orange 8 10 18 Total 120 80 200 Use a test of homogeneity to investigate whether the preferences of men and women are the same with respect to the color of the packaging.