Stat 557 Fall 2000 Assignment 5 Solutions Problem 1 (a) Given the level of factor B , factors A and C are conditionally independent. So the AC and ABC interactions should not be included. At each level of factor C , the odds ratio for the other two factors is 2=3,so factors A and B are not conditionally independent. At each level of factor A, the odds ratio for the other two factors is 0:25, so factors B and C are not conditionally independent. Hence the model is BC log(mijk ) = + Ai + Bj + Ck + AB ij + jk (b) The relative distribution across the joint categories of factors A and B is the same for each level of factor C . Hence, factor C is independent of the joint distribution of factors A and B . Consequently, the model should not have AC , BC , or ABC interactions. Notice that A and B are conditionally independent at each level of C , so the model should not have the AB interaction. We end up with the complete independence model log(mijk ) = + Ai + Bj + Ck : (c) At each level of factor C , the odds ratio is 6. This indicates that the 3-factor interaction should not be in the model. No conditional independence can be found, as the odds ratios in all conditional 2x2 tables dier from 1. Consequently, the model is BC AC log(mijk ) = + Ai + Bj + Ck + AB ij + jk + ik (d) Following the reasoning used in part (c), the model is BC AC log(mijk ) = + Ai + Bj + Ck + AB ij + jk + ik (e) The means in the subtable when k=2 are each 3 times as large as the means in the corresponding cells of the subtable when k=1. This implies that the joint distribution of factors A and B is independent of the level of factor C . Consequently, the loglinear model should not have AC , BC , and ABC interactions. Notice that factors A and B are not conditionally independent given the level of factor C , and the model must include the AB interaction. We have log(mijk ) = + Ai + Bj + Ck + AB ij 1 (f) We can nd the following by checking the subtables: (1) Given the level of C and the level of D, A and B are conditionally independent. (2) Given the level of A and the level of B , C and D are conditionally independent. (3) Given the level of A and the level of D, B and C are conditionally independent. (4) Given the level of B and the level of D, A and C are conditionally independent. From (1), the AB , ABC , ABD, ABCD interactions should be excluded. From (2), the CD, ACD, BCD should be excluded. From (3) and (4), BC and AC interactions should be excluded. Note that A and D are not conditionally independent given the levels of B and C . Also, B and D are not conditionally independent given the levels of A and C . Consequently, the model must include AD and BD interactions. We have BD log(mijk ) = + Ai + Bj + Ck + D` + AD il + jl Problem 2. (a) (b) (c) There are many correct answers. Here is one set. k=1 j=1 j=2 i = 1 10 20 i = 2 20 40 k=2 j=1 j=2 i = 1 20 40 i = 2 40 80 i=1 k=1 k=2 j = 1 10 20 j = 2 30 40 i=2 k=1 k=2 j = 1 20 40 j = 2 60 80 k=1 j=1 j=2 i = 1 10 20 i = 2 20 40 k=2 j=1 j=2 i = 1 10 30 i = 2 40 120 Problem 3. Model (a) (i) AD BC BD CD ABD BCD log(mijkl ) = + Ai + Bj + Ck + Dl + AB ij + il + jk + jl + kl + ijl + jkl (ii) df = 6 2 (iii)fYij lg, fY jklg (iv) Given the time of day (D) and the number of vehicles involved in the accident (B ), the involvement of alcohol (C ) has no association with the general direction of the road (A). + + Model (b) AD BC CD (i) log(mijkl ) = + Ai + Bj + Ck + Dl + AB ij + il + jk + kl (ii) df = 12 (iii) fYij g, fYi lg, fY jk g, and fY klg. (iv) Given the time of day (D) and the number of vehicles involved in the accident (B ), involvement of alcohol (C ) has no association with the general direction of the road (A). ++ ++ + + ++ Given the general direction of the road (A) and the level of alcohol involvement (C ), the number of vehicles involved in the accident (B ) has no association with the time of day (D). .05in] The association between the general direction of the road (A) and the number of vehicles involved in the accident (B ) is the same for any time of day (D) and each level of alcohol involvement (C ). The associations between the accident time (D) and the general direction of the road (A) are consistent across the number of vehicles involved in the accident (B ) and the status of alcohol involvement (C ). The association between the number of vehicles involved (B ) and the status of alcohol involvement (C ) is consistent across time of day and level of alcohol involvement. The association between status of alcohol involvement (C ) and the time of the day (D) is consistent across direction of the road (A) and the number of vehicles involved (B ). Model (c) (i) AC AD BC BD CD ABC BCD log(mijkl) = + Ai + Bj + Ck + Dl + AB ij + ik + il + jk + jl + kl + ijk + jkl (ii) df = 6 (iii) fYijk g, fY + jkl g, + and fYi lg. ++ 3 (iv) All of the two-factor interactions are in the model, so no two factors are conditionally independent given the levels of the other two factors. Since there is no three factor interaction involving factors A and D, however, the association between the time of day when the accidents occurred (D) and the general direction of the road (A) is not aected by either the number of vehicles involved in the accident (B ) or the status of alcohol involvement (C ), All other two factor associations change across the levels of at least one other factor. Model (d) (i) AC AD BC BD CD BCD log(mijkl) = + Ai + Bj + Ck + Dl + AB ij + ik + il + jk + jl + kl + jkl (ii) df = 7 (iii) fY jklg, fYij g, fYi k g, and fYi l g. (iv) Since all of the possible two factor interactions are in this model, there is no conditional independence between any pair of factors given the levels of the other two factors. Three two factor interactions are not involved in any higher order interaction, however, which implies the following: + ++ + + ++ Associations between the general direction of the road (A) and the number of vehicles involved in the accident (B ) are homogeneous across time of day (D) and the status of alcohol involvement (C ). Associations between the status of alcohol involvement (C ) and the general direction of the road (A) are homogeneous with respect to the number of vehicles involved in the accident (B ) and the time of day (D). Associations between the accident time (D) and the general direction of the road (A) are consistent across the number of vehicles involved in the accident (B ) and the status of alcohol involvement (C ). Model (e) (i) BD CD BCD log(mijkl) = + Ai + Bj + Ck + Dl + BC jk + jl + kl + jkl (ii) df = 11 4 (iii) fY jklg, fYi g. (iv) The general direction of the road, factor (A), is not involved in any two factor interaction, this model implies that the general direction of the road has no association with any the other factors. Consequently, general direction of the road (A) is independent of the joint distribution of the number of vehicles involved (B ), the involvement of alcohol (C ), and the time of day when an accident occurs (D). This also implies the weaker condition in which direction of the road (A) is conditionally independent of each of the other factors given any set of levels of the remaining two factors. + ++ problem 4 The best model we found is BC BD CD log(mijkl) = + Ai + Bj + Ck + Dl + AB ij + jk + jl + kl (i) This model was obtained by starting with the complete independence model as the initial model. Then, the Splus function "step" was used to perform a stepwise search. This identied the nal model shown above. It had the best AIC value of the models this search examined. It was also obtained from a backward elimination procedure. Goodness of t tests are = 12:49, df = 12, p-value=0.41. G = 12:02, df = 12, p-value=0.44. This model appears to provide an adequate summary of the accident data. 2 2 (ii) Maximum likelihood estimates of the terms in the model are Coefficients: (Intercept) A B C DAfternoon DEvening CDAfternoon CDEvening B:C Value Std. Error 3.4119085 0.1448269 0.2613648 0.1516147 2.2184195 0.1537919 -2.2568801 0.2691621 -0.9464736 0.2096743 -0.7403832 0.1937870 1.2564227 0.2890848 2.1460464 0.2741173 -1.2642096 0.2292428 t value 23.5585341 1.7238752 14.4248116 -8.3848361 -4.5140183 -3.8206028 4.3462084 7.8289334 -5.5147190 5 BDAfternoon 0.1397284 BDEvening -0.7889321 A:B -0.5047110 0.2202773 0.2113118 0.1659459 0.6343295 -3.7334982 -3.0414199 These estimates were obtained from S-PLUS under the constraints that any main eect term or any interaction term is zero when any factor involved is at its lowest level. There was an error in typing the data table in the assignment. This answer was obtained from the data stored in the le accidents.dat. Slightly dierent answers would be obtained from the table of counts printed on the assignment. Since there are no signicant two factor interactions between general direction of the road (A) and either involvement with alcohol (C ) or time of day (D), the model suggests that general direction of the road (A) is essentially conditionally independent of involvement with alcohol (C ) given the levels of the other two factors, and general direction of the road (A) is conditionally independent of time of day (D) given the levels of the other two factors. The model also suggests that the odds that alcohol is involved in an accident are about exp(1:2564) = 3:5 times greater in the morning than in the afternoon and about exp(2:146) = 8:5 times greater in the morning than in the evening. This would make sense if the morning period includes hours just after midnight when bars close and patrons who have consumed the most alcohol drive home. For this model, these estimated odds ratios are consistent across the levels of the other two factors. The models also suggests that the number of vehicles involved in an accident has some association with each of the other factors. The odds of a multiple vehicle accident are about 65The estimates of the intercept and main eect terms provide information on relative frequencies of levels of individual factors relative to a baseline. This information is often not of great interest and I did not expect you to discuss it, but we will consider it here anyway. From the estimate of the intercept we have exp(3:4119) = 30:3 as the estimated mean count when each factor is at its lowest level (north-south roads, single vehicle accident, in the morning with alcohol involved). Then, exp(^ + ^A) = exp(3:41191 + 0:26136) = 39:4 is the estimated mean number of single vehicle accidents on east-west roads in the morning with alcohol involved. Similarly, exp(^ + ^B ) = exp(3:41191 + 2:21842) = 278:8 is the estimated mean number of multi-vehicle accidents on north-south roads in the morning with alcohol involved. and exp(^ + ^C ) = exp(3:41191 ; 2:25688) = 4:5 is the estimated mean number of single vehicle accidents on east-west roads in the morning where alcohol is not involved. Finally, exp(^ + ^D ) = exp(3:41191 ; :94647) = 11:8 is the estimated mean number of single vehicle accidents on east-west roads in the afternoon with alcohol involved, and 2 2 2 2 6 exp(^ + ^D ) = exp(3:41191 ; :74038) = 14:5 is the estimated mean number of single vehicle accidents on east-west roads in the evening with alcohol involved. 3 Problem 5 Use the complete independence model as the initial model. Then use the Splus function "step" or "stepAIC" to do a stepwise search. Starting with the complete independence model this yields the model BC BD CD AB AD ABC log(mijkl) = + Ai + Bj + Ck + D` + AC ij + jk + j` + k` + ij + i` + ijk The estimated -terms are (Intercept) A2 A3 C2 C3 C4 B2 B3 D2 D3 D4 D5 C2B2 C3B2 C4B2 C2B3 C3B3 C4B3 A2C2 A3C2 A2C3 A3C3 Value Std. Error 1.23856236 0.2738499 -0.53135358 0.3967743 -0.79826947 0.5090301 0.55744228 0.3170151 -0.35571045 0.3906370 2.46629784 0.2732102 1.75449872 0.2564484 -0.25997611 0.3267763 -1.17962762 0.2901775 -0.43383283 0.2541672 0.72702975 0.2068656 0.57192314 0.2145256 0.13288424 0.2882659 0.23444276 0.3678694 -1.93985744 0.2451301 0.52887171 0.3669414 0.49383403 0.4636653 -0.19847702 0.3185657 -0.09550353 0.4866494 -0.06550752 0.6244122 1.18190582 0.5204080 0.27328994 0.7268818 t value 4.52277797 -1.33918337 -1.56821649 1.75840950 -0.91059072 9.02710623 6.84152730 -0.79557831 -4.06519269 -1.70687959 3.51450266 2.66599054 0.46097803 0.63729895 -7.91358353 1.44129733 1.06506581 -0.62303314 -0.19624709 -0.10491069 2.27111372 0.37597578 7 A2C4 A3C4 A2D2 A3D2 A2D3 A3D3 A2D4 A3D4 A2D5 A3D5 B2D2 B3D2 B2D3 B3D3 B2D4 B3D4 B2D5 B3D5 A2B2 A3B2 A2B3 A3B3 C2D2 C3D2 C4D2 C2D3 C3D3 C4D3 C2D4 C3D4 C4D4 C2D5 C3D5 C4D5 A2C2B2 A3C2B2 0.18737090 0.05051041 0.28111119 0.13012669 0.17595853 -0.54236657 -0.19090204 -0.78503175 -0.53179558 -1.17205666 0.65167996 -0.25580425 1.07186414 0.55791147 0.84456071 0.19642261 1.58544418 0.84686839 -0.33361991 -1.28461938 0.98358699 0.65035587 0.66128121 1.03126012 0.84301367 -0.15440082 -0.51080721 -0.31153251 0.16633346 0.05404871 0.07198334 -0.24952236 -0.63160746 -0.37131448 0.88002265 1.19904452 0.4018535 0.5199221 0.1377156 0.1628727 0.1346599 0.1774174 0.1124046 0.1391720 0.1146129 0.1465214 0.1686581 0.1949826 0.1891964 0.2072840 0.1407807 0.1554210 0.1584673 0.1733061 0.4031492 0.5540023 0.4747678 0.6266494 0.2774680 0.2845323 0.2661616 0.2218129 0.2437382 0.2122377 0.1948750 0.2082427 0.1852515 0.1908319 0.2079394 0.1815955 0.5023210 0.6741340 0.46626668 0.09714996 2.04124439 0.79894735 1.30668833 -3.05700935 -1.69834741 -5.64073030 -4.63992667 -7.99921668 3.86391125 -1.31193394 5.66535139 2.69153119 5.99912456 1.26381018 10.00486364 4.88654803 -0.82753453 -2.31879799 2.07172221 1.03783053 2.38326984 3.62440383 3.16730050 -0.69608592 -2.09572068 -1.46784711 0.85353922 0.25954670 0.38857090 -1.30755037 -3.03745928 -2.04473383 1.75191277 1.77864411 8 A2C3B2 0.25138975 A3C3B2 1.72247616 A2C4B2 0.91234459 A3C4B2 1.68109250 A2C2B3 -0.14988308 A3C2B3 0.05845829 A2C3B3 -0.52928578 A3C3B3 0.40421695 A2C4B3 -0.21715966 A3C4B3 0.32171412 0.5381025 0.46717818 0.7717152 2.23201007 0.4209736 2.16722523 0.5767804 2.91461446 0.5838218 -0.25672745 0.7607387 0.07684412 0.6391600 -0.82809585 0.8729021 0.46307248 0.4950065 -0.43870065 0.6506797 0.49442777 Check of the t of the model: X = 155:3, df = 112, p-value=0.0043. G = 160:8, df = 112, p-value=0.0017. These tests appear to reject the t of the model, but the p-values may not be reliable. We found that 32% of the expected counts are smaller than 5 and 5% of the expected counts are smaller than 1. Consequently, the large sample chi-square approximation to the null distribution of these test statistics may not be accurate. We examined the Pearson residuals and the deviance residuals to determine if there are combinations of levels of factors where the model ts poorly. None of these residuals was larger than 3 or smaller than -3. The data exhibit no gross departures from this model. The AIC value for this model is 296:8. This was the model tah most students selected using a mindless adherence to the strategy of minimizing the AIC value. A few students continued the search by adding the most highly signicant term they could nd, in this case the ABD ij` term. This interaction is signicant at the .015 level, but none of the standardized values of these interaction parameters exceeded 1.91 or -1.91. This interaction seems to be of limited importance. For this model, X = 122:22 and G = 130:26, both on 96 degrees of freedom, with p-values :04 and :01, respectively. The AIC value is 298:26. A few more students continued to search and added the AxCxD interaction.This interaction was signicant at the 0:13 level, and none of the standardized values of these interaction parameters exceeded 1.96 in absolute value. This interaction also seems to be of limited importance. For this model, X = 56:33 and G = 63:66, both on 48 degrees of freedom, with p-values :22 and :08, respectively. The AIC value is 326:66. A case could be nade for any of these models. They all provide about the same description of the data. Although, we might have selected one of the larger models, we will only interpret the estimates shown above for the smaller of the three models. 2 2 2 2 2 9 2 Present the estimates for the CxD interactions parameters in a 4x5 table: `=1 `=2 `=3 `=4 `=5 i=1 0 0 0 0 0 i = 2 0 0.66 -0.15 0.17 -0.25 i = 3 0 1.03 -0.51 0.05 -0.63 i = 4 0 0.84 -0.31 0.07 -0.37 This table indictes that incomes tend to be highest in the suburbs of Copenhagen and lowest in the country side and the other three largest cities. This is consistant across levels of marital status and alcohol consumption. The estimates of the BxD interaction parameters shown in the following table indicate that Copenhagen has the lowest proportion of married people while the rurak areas ahve the highest proportions of married people. Copenhagen and its suburbs also have relatively low levels of unmarried people. Consequently, Copenhagen has higher level of widowed people. This is consistent across levels of income and alcohol consumption. `=1 `=2 `=3 `=4 `=5 k=1 0 0 0 0 0 k = 2 0 0.65 1.07 0.84 1.59 k = 3 0 -0.26 0.56 0.20 0.85 The estimates of the AxD interaction parameters shown in the following table indicate that Copenhagen and its suburbs have relatively high levels of alcohol consumption, and alcohol consumption is lowest in small cities and rural areas. This is consistent across levels of income and marital status. `=1 `=2 `=3 `=4 `=5 i=1 0 0 0 0 0 i = 2 0 0.28 0.18 -0.19 -0.53 i = 3 0 0.13 -0.54 -0.79 -1.17 The other two factor interactions are involved in a three factor interaction. To examine patterns, add estimates -terms for a 2-factor interaction to corresponding estimates of terms for the 3-factor interaction at each level of the third factor. For example, the estimates of the AC ik terms shown in the following table provide information on the associations between income and alcohol consumption for widows and widowers. There is some tendency for alcohol consumption among widows and widowers to be higher for larger incomes with the highest level of moderate alcohol consumption among widows and widowers in the 100 ; 150 income category. 10 k=1 k=2 k=3 k=4 i=1 0 0 0 0 i = 2 0 -0.10 1.18 0.19 i = 3 0 -0.06 0.27 0.05 To examine the associations between alcohol consumption and income for married people, add the estimates of ABC i k terms to the previous table to obtain the following table. 2 k=1 k=2 k=3 k=4 i=1 0 0 0 0 i=2 0 0.78 1.33 2.09 i=3 0 1.14 1.99 1.73 This table indicates that the lowest income group has the lowest alcohol consumption among married adults. To examine the associations between alcohol consumption and income for unmarried people, AC add the estimates of ABC i k terms to the corresponding estimates of ik The pattern in the resulting table, shown below, is similar to the pattern for the widows and widowers.. 2 k=1 k=2 k=3 k=4 i=1 0 0 0 0 i = 2 0 -0.25 0.65 -0.03 i=3 0 0.00 0.67 0.37 You can perform a correponding analysis to examine how associations between alcohol consumption and marital status change across income categories. Problem 6 As in most sample size determinations, these questions were somewhat ill posed. You needed to supply some additional information by making some reasonable assumptions. (a) Let X and X be the numbers of voters who support a certain candidate in the September and October samples, respectively. Let N and N be the survey sample sizes in September and October, respectively. Let and be the true proportions of people surporting that candidate in September and October, respectively. Let pi = Xi=Ni (i = 1; 2). When N and N are large, we have approximately p _ N ( ; (1 ; )=N ) and p _ N ( ; (1 ; )=N ). 1 2 1 2 1 1 1 2 2 1 1 1 1 2 11 2 2 2 2 Then, ! (1 ; ) (1 ; ) + N : p ; p _ N ; ; N 2 Then, under Ha 1 2 1 1 2 1 2 1 2 T qp ;1 p;1; (2;;2) _ N (0; 1) N1 + N2 The test problem is H : = versus Ha : > . Under H , 1 1 p ; p _ N 0; N + N (1 ; ) The mle of under H is +X = N p +N p p= X N +N N +N So the test for H at signicance level is: "reject H if p =N1p2;=Np12 p ;p > Z". To ensure this test to at least have power 1 ; , the following must hold. 0 1 p ; p Pr @ q > ZjHaA 1 ; (1=N + 1=N )p(1 ; p) This is equivalent to 1 0 q + ) p (1 ; p ) Z ; ( ; ) ( N N 1 q 2 1 ;1 2 ;2 jHa A 1 ; Pr @T > N1 + N2 i.e. q ( N1 + N2 )p(1 ; p)Z ; ( ; ) q 1 ;1 2 ;2 Z ; = ;Z + N1 N2 That is s s 1 1 ( + )p(1 ; p)Z + (1 ; ) + (1 ; ) Z ; ( ; ) 0 N N N N i.e. s N p + N p s (1 ; ) (1 ; ) 1 1 N p + N p ( + ) 1; + N N N +N N + N Z+ N N Z ;( ; ) 0 2 1 2 (1 0 2 1 ) (1 ) 1 2 2 1 1 1 0 2 0 1 2 1 2 1 1 2 2 1 2 0 0 2 1 2 (1 ) 1 2 (1 (1 ) 1 2 1 2 1 1 1 1 1 1 2 2 1 2 2 ) ) ) 1 1 ) (1 2 (1 1 +1 1 1 1 (1 1 2 1 2 2 1 2 2 2 1 1 1 2 1 2 2 2 1 Notice that i can be estimated by pi (i = 1; 2). So the above inequality is approximately equivalent to s N p + N p s p (1 ; p ) p (1 ; p ) 1 1 N p + N p ( + ) 1; + N N N +N N + N Z+ N N Z ;(p ;p ) 0 1 1 1 2 1 2 2 2 1 1 2 2 1 2 12 1 1 1 2 2 2 2 1 For this problem, N = 1600, p = 0:48, p = 0:51, = 0:05, and = 0:10. You may try to solve the inequality directly, but it is easier to show the left hand side of the inequality is monotonically decreasing with respect to N . Unfortunately, even if we pick N2=300000000 (greater than the adult population of USA), the left hand side of the above inequality is 0.00656, still bigger than 0. Thus, the sample size needed to achieve the desired power is not achievable if the survey is conducted in USA. 1 1 2 2 Many students consider the result from the rst survey as a xed, non-random result and computed the sample size needed for a one sample test. A few other students computed the sample size needed for a two sample test, but in this process they computed a new sample size for the rst sample. This makes no sense because the rst sample has already been completed. (b) The half length of 95% condence interval for is Z : Z: 2 q 0 025 q p (1 ; p )=N . From 2 2 2 p (1 ; p )=N = 0:01 0 025 2 2 2 we have n = Z : p (1 ; p )=0:01 = 9599:8 Thus, the sample size for the October survey must be about 9600. 2 0 025 2 2 2 (c) Let Y be the number of people intending to vote for this candidate before and after the debate; Let Y be the number of people intending to vote against this candidate before and after the debate; Let Y be the number of people intending to vote for this candidate before the debate but changing their mind after the debate; Let Y be the number of people intending to vote against this candidate before the debate but changing their mind after the debate. Let the sample proportion be pij = Yn and let ij denote the corresponding true proportion of the population. The test problem "H : = versus Ha : > " is equivalent to "H : = versus Ha : > ". Under Ha , 11 22 12 21 ij 0 1+ 21 +1 +1 1+ 0 21 12 12 var(p ; p ) = var(p ; p ) = var(p ) + var(p ) ; 2cov(p ; p ) = (1 ; ) + n(1 ; ) + 2 +1 1+ 21 12 21 21 13 12 21 12 21 12 12 21 12 Hence asymptotically p ; p _ N ; ; (1 ; ) + n(1 ; ) + 2 21 and 12 21 12 21 21 12 12 21 12 ! p ; p ; ( ; ) _ N (0; 1) ( (1 ; ) + (1 ; ) + 2 )=n Tq 21 21 12 21 Under H , 0 21 12 12 12 21 12 p ; p N (0; 2 =n) 21 12 21 and the mle of is ^ = p +2 p : Thus, under H , p pp2121;pp1212 =n has a large sample standard normal distribution. The test of H at signicance level is 21 21 21 0 ( + 12 ) 0 reject H if q p ; p > Z : (p + p )=n 0 21 12 21 12 To enable the test to have power 1 ; , we must have 0 1 p ; p > ZHaA = 1 ; : Pr @ q (p + p )=n That is 21 12 21 12 q 1 Z (p + p )=n ; ( ; ) Ha A = 1 ; ( (1 ; ) + (1 ; ) + 2 )=n 0 Pr @T > q 21 21 21 12 21 12 12 12 21 12 and we must have q Z (p + p )=n ; ( ; ) q = Z ; = ;Z : ( (1 ; ) + (1 ; ) + 2 )=n 21 12 21 12 1 21 21 12 12 21 12 Consequently, q p Z p + p + Z (1 ; ) + (1 ; ) + 2 Z n= ( ; ) Now substitute p and p for and , respectively, to obtain q p Z p + p + Z p (1 ; p ) + p (1 ; p ) + 2p p n= (p ; p ) 2 21 12 21 21 21 21 12 21 12 12 12 21 12 2 12 2 21 12 21 21 21 14 12 12 2 12 21 12 Now we know that p +p p +p 0:48 0:51 0:05 0:10 : = = = = 11 12 11 21 Then, p = 0:48 ; p and p = 0:51 ; p , where 0 p 0:48, and 12 11 p 21 11 p 11 0:99 ; 2p Z : + 0:9891 ; 2p Z : n= 0:03 Hence n is related to p . Obviously n is a decreasing function of p for 0 p 0:48. So the maximum value of n = 9147 is for the unlikely situation with p = 0 where almost everyone changes their mind. At a more likely value of p = 0:45, for example, n = 853. At p = 0:48 we would need a sample size of n = 282. Examine the required sample sizes at several likely values for and make a decision based on what the experts think is likely to be. 11 0 05 2 11 01 2 11 11 11 11 11 11 11 11 Problem 7 The test problem is "H : the two treatments provide the same results" versus "HA: the two treatments provide dierent results". The degrees of freedom for the Pearson test of the null hypothesis are df=2. = 0:05, power=0.9. P = (0:2; 0:2; 0:6; 0:3; 0:3; 0:4), PA = (0:25; 0:25; 0:5; 0:25; 0:25; 0:5). Using the program "chopow.ssc", we get n = 159. 0 2 0 15