Stat 557 Fall 2000 Assignment 2 Solutions Problem 1 (a) The table of counts would have a multinomial distribution if simple random sampling with replacement was used. Since the population is large, a multinomial distribution would provide a good approximation to the distribution of the counts if simple random sampling without replacement was used. The log-likelihood function is `(; Y) = log(n!) ; 2 X 2 X i=1 j=i log(Yij!) + 2 X 2 X i=1 j=i Yijlog(ij): Some students suggested that the respondents would have to be "identical". This is nonsense. Yhe population may contain individuals that are quite dierent. By selecting respondents from the population using simple random sampling with replacement, the probability distribution of the possible responses is the same for each selection and the outcome for any single selection would be stochastically independent of the outcome for any other selection. This is what is needed to obtain a multinomial distribution for the counts. (b) Under the independence model ij = i j . Substituting this into the log-likelihood function shown above, we have + `(; Y) = log(n!) ; 2 X 2 X i=1 j=i + log(Yij!) + 2 X i=1 Yi log(i ) + + + 2 X j=1 Y jlog( j): + + Maximize this log-likelihood with respect to the contraints + = 1 and + = 1. The formula for the maximum likelihood estimates for the expected counts is m^ ij = YiY Y j i = 1; 2 and j = 1; 2; 1+ 2+ 1+ 2+ + + ++ and (m^ ; m^ ; m^ ; m^ ) = (799:5; 220:5; 295:2; 81:5): 11 12 21 22 (c) G = 2 Pi Pj i Yij log(Yij =m^ ij ) = 5:321 on 1 d.f. with p-value = .021. 2 X = Pi Pj i Y m;m = 5:15 on 1 d.f. with p-value = .023. The data suggest that opinions on gun registration are not held independently of opinions on the death penalty. In particular, people who oppose the death penalty are more likely to favor gun registration than people who favor the death penalty. 2 2 2 =1 2 =1 2 = 2 = ( ij ^ ij ) ^ ij 1 (d) This is reected in the estimated odds ratio ^ = YY1211 YY2122 = :705, which indicates that the odds of favoring gun registration among those who favor the death penalty is only about 70% of the odds of favoring fun registration among those who oppose the death penalty. To obtain an approximate 95% condence interval for , rst compute a an approximate 95% condence interval for log(): q log(^) (Z: ) ( Y11 + Y12 + Y21 + Y22 ) 025 ) 1 1 1 1 (;0:34956) (1:96)(:1545) ) (;0:65244; ;0:04668). ) (0:521; 0:954) Then convert to a 95% condence interval for : (exp(;0:65244); exp(;0:04668)) Problem 2 (a) The log-likelihood function is `(; Y) = log(n!) ; = log(n!) ; = log(n!) ; 2 X 2 X i=1 j =i 2 X 2 X i=1 j =i 2 X j =i log(Yij !) + 2 X 2 X i=1 j =i Yij log(ij ) log(Yij !) + Y log( ) + (Y + Y )log((1 ; )) + 2Y log((1 ; ) ) 2 11 12 21 22 log(Yij !) + (2Y + Y + Y )log() + (Y + Y + 2Y )log((1 ; )) 11 12 21 12 (b) Solve the likelihood equation 0 = @`(; Y) = 2Y + Y + Y ; Y + Y + 2Y @ 1; The maximum likelihood estimate is ^ = 2Y +2YY + Y = Y 2Y+ Y = 0:757 11 11 12 12 21 21 12 1+ ++ 21 +2 ++ (c) Compute m.l.e.'s for expected counts: m^ m^ = m^ m^ 11 12 21 22 = Y ^ = 800:5055 = Y ^(1 ; ^) = 256:9944 = Y (1 ; ^) = 82:5055 ++ 2 ++ ++ 2 2 22 21 22 2 The deviance statistic is G =2 2 2 X 2 X i=1 j =i Yij log(Yij =m^ ij ) = 16:2823 with 2 d.f. and p-value = .0003. It is not surprising that the data do not support this model because the independence model was rejected in problem 1. (d) An analysis of deviance table Comparison d.f. Deviance p-value Model A vs. Model B 1 10.962 .0009 Model B vs. Model C 1 5.321 .021 Model A vs. Model C 2 16.283 .0003 Although Model B is a signicant improvement over Model A, neither Model A nor Model B is appropriate in this case. Problem 3 (a) The m.l.e. for the mean number of corn borers per location in the Poisson model is m^ = 3:16667: This estimate was computed without combining any categories. The Pearson statistic for testing the t of the i.i.d. Poisson model is X = 103:90 with 6 d:f: and p ; value < :0001: In computing this test categories with at least 7 corn borers are combined into a single category. The Poisson model does not appear to be appropriate. 2 (b) In this case, Y = 3:16667; and S = 7:70555: The deviance statistic is nS =Y = 292 with 119 d:f: and p;value < :0001: This test shows that the variance of the distribution of numbers of corn borers across locations is larger than the mean. Hence, the i.i.d. Poisson model is not appropriate. 2 2 (c) Maximum likelihood estimates of the expected counts for the Poisson and negative binomial models are shown in the following table. Blank entries indicate that categories were combined to keep estimates of expected counts larger than 5 as requested in the statement of the problem. Maximum likelihood estimates of the parameters in the negative binomial probability function are ^ = 0:3573 and K^ = 1:7605. 3 Number of Number of Poisson Neg. Binomial Corn Borers Locations Model Model 0 24 5.057 19.602 1 16 16.015 22.179 2 16 25.357 19.675 3 18 26.765 15.850 4 15 21.189 12.124 5 19 13.420 8.977 6 6 7.083 6.501 7 5 5.111 7.892 8 3 9 4 7.200 10 3 11 0 12 or more 1 The value of the Pearson goodness-of-t statistic is X = 4:497 with 6 d:f: and p ; value = 0:610: The negative binomial model is consistent with the observed data. 2 (d) Do not use the i.i.d. Poisson model to construct the condence interval because it was shown that the Poisson model is inappropriate. There are several ways, however, to obtain an approximate 95% condence interval for the mean number of corn borers at a location. (i) You could use the central limit to show that Y has a limiting normal distribution. You must also show that S is a consistent estimator for = V ar(Yi). Then an approximate 95% condence interval 1s 2 2 q Y (1:96) S =n 2 ) (2:67; 3:67) This method does not assume any particular distribution for the counts, but it does assume that counts at the n locations are independent and identically distributed random variables. (ii) Assuming the negative binomial model is appropriate, V ar(Yi) = K (1 ; )= and V ar(Y ) = K (1 ; )=n . Then an approximate 95% condence interval is 2 2 s ^ Y (1:96) K (1n^; ^ ) 2 4 ) (2:63; 3:70) (iii) You could use the Delta method to nd the limiting normal distribution for the m.l.e. of the mean ^ ; :3573) = 3:1667 = Y : m^ nb = g(^; K^ ) = K (1^; ^ ) = (1:7605)(1 :3573 The computer output inverts the estimated Fisher information matrix to estimate the covariance matrix of (^; K^ )0 as 2 3 : 0031364 : 0210552 5 V^ = 4 :0210552 :1613241 Compute rst partial derivatives of g(; K ), ! ;K @g @g G = @ @K = 2 Then 1 ; ! ^ G^ = ;^K 1 ; ^ = (;13:790; 1:79875) ^ and V ar(m^nb) is estimated as G^ V^ G^ 0 = :073855: Using the large sample normal approximation to the distribution of the m.l.e., m^ nb; an approximate 95% condence interval for the mean number of corn borers in a location is 2 p 3:111667 (1:96) :073855 ) (2:63; 3:70): This method is also based on the belief that the negative binomial model is appropriate. Note that the condence interval based on the incorrect Poisson model q ^ m^ (1:96) m=n ) (2:85; 3:49) is too short because the Poisson model does not allow for enough dispersion in the counts. (iv) You could use the delta method and large sample normality of m.l.e.'s to rst construct a condence interval for log(mnb) = log(K ) + log(1 ; ) ; log() Then apply the exponential function to the endpoints of that interval to obtain an approximate 95% condence interval for mnb . When would this be better than approach (iii)? 5 Problem 4 (a) ^ = 2:933 with approximate 95% condence interval (1:67; 5:16). This suggests that the odds of contracting Hodgkin's Disease are between 67% to 500% greater among people who have had their tonsils removed. This condence interval was constructed by rst constructing an approximate 95% condence interval for log() and then applying the exponential function to the endpoints of the interval to obtain an approximate 95% condence interval for the odds ratio. Directly using the large sample normal distribution for the odds ratio, we have q ^ (1:96) (^ 2 2 X 2 X i=1 j =i Yij; ) 1 ) (1:27; 4:59): This does not adequately account for the right skewness in the disribution of the odds ratio in this case. Notice how much this interval is shifted to the left of the previous interval. (b) Some people questioned the use of the odds ratio as a good approximation to relative risk in the Vianna study. Since this is a retrospective study, you cannot directly estimate relative risk, but the odds ratio does provide a good approximation because Hodgkin's disease is a rare disease. Some people worried about the restrictions put on the control group in the Vianna study, and others noted that the sibling controls used by Johnson and Johnson may dier substantially from the controls used in the Vianna study. (b) While there are advantages in using siblings as controls, Johnson and Johnson did not do an appropriate analysis, because they did not allow for the eects of matched sibling pairs. The "controls" are not an independent sample of persons without Hodgkin's disease. Each sibling pair, not each individual, should be thought of as an independent experimental unit. Siblings are likely to provide correlated responses. Johnson and Johnson should have reported the data in the following table with one count for each sibling pair. C ontrol Sibling Had T. No T. Sibling with Had T. Y Y Hodgkin's disease No T. Y Y 6 11 12 21 22 There are n = Y + Y + Y + Y = 85 sibling pairs. Unfortunately, this table cannot be obtained from the table reported by Johnson and Johnson (1972). The Pearson chi-square test for independence performed by Johnson and Johnson is incorrect. They should have performed a test of marginal homogeneity using the counts in the table shown above. This is McNemar's test (or the sign test). Reject the null hypothesis of marginal homogeneity if ;Y ) > X = (YY + ; Y 21 (this is a 2-sided test). Using the above table Johnson and Johnson could have estimated an odds ratio as + Y )(Y + Y ) : ^ = ((YY + Y )(Y + Y ) To construct an approximate condence interval you would have to derive the formula for the variance of this statistic, or the natural logarithm of this statistic, We might attempt this on the next assignment. Finally, note that Vianna, et al. sampled from a dierent population than Johnson and Johnson. What are the consequences of this? 11 12 21 22 12 2 21 2 12 2 1 11 12 12 22 21 22 11 12 Problem 5 There are only six tables with the same row and column toals as the observed table. These tables can be distinguished by the Y value. The exact probabilites are presented in the following table. Table Exact Number Y Probability 1 21 0.2755 observed table 2 22 0.0939 3 23 0.0114 4 19 0.2127 5 18 0.0449 6 20 0.6384 Looking at the proportions of cases where the cancer is controlled for the surgery and radiation treatments, only table 6 is more consistent with the null hypothesis of equal proportions than the observed table. Consequently, the p-value, 0.6384, is the sum of the probabilities for the other ve tables. These data are consistent with the null hypothesis that the success rates are the same for the surgery and radiation treatments. 11 11 7 Problem 6 Since the objective is to show that the IFN-B treatment is better, we should test the null hypothesis that the IFN-B treatment has the same eect as the control treatment against a directional alternative where the IFN-B treatment is "better". Dierent denitions of what it means for the IFN-B treatment to be "better" result in dierent answers. The row total in this table are xed by the randomization procedure that places 10 subjects in each treatment group. Under the null hypothesis that the IFN-B and control treatments are equally eective, the column totals are also xed. There are 43 possible tables with these row and column totals. Each table is distinguished by the values of Y and Y , the rst two counts in the rst row of the table. The values of these two counts and the corresponding probability that each table occurs by random assignment of subjects to treatment groups are shown below. 11 Table Number Y 1 6 2 6 3 6 4 6 5 6 6 5 7 5 8 5 9 5 10 5 11 5 12 4 13 4 14 4 15 4 16 4 17 4 11 Y 0 4 1 3 2 5 0 1 4 2 3 6 0 1 2 3 4 12 Pearson X 14.67 12.00 10.50 9.17 8.67 9.17 13.33 7.83 5.33 4.67 3.83 8.67 14.67 7.83 3.33 1.17 1.33 2 12 Exact Probability 15/184756 70/184756 160/184756 336/184756 420/184756 336/184756 25/184756 600/184756 2520/184756 (observed table) 2800/184756 4200/184756 420/184756 15/184756 600/184756 6300/184756 16800/184756 15750/184756 8 Table Pearson Exact Number Y Y X Probability 18 4 5 3.83 4200/184756 19 3 7 10.50 160/184756 20 3 6 4.67 2800/184756 21 3 5 1.17 16800/184756 22 3 4 0.00 28000/184756 23 3 3 1.17 16800/184756 24 3 2 4.67 2800/184756 25 3 1 10.50 160/184756 26 2 3 3.83 4200/184756 27 2 4 1.33 15750/184756 28 2 5 1.67 16800/184756 29 2 6 3.33 6300/184756 30 2 7 7.83 600/184756 31 2 8 14.67 15/184756 32 2 2 8.67 420/184756 33 1 5 4.67 2800/184756 34 1 4 5.33 2520/184756 35 1 8 13.33 25/184756 36 1 7 7.83 600/184756 37 1 6 4.67 2800/184756 38 1 3 9.17 336/184756 39 0 6 8.67 420/184756 40 0 5 9.17 336/184756 41 0 7 10.50 160/184756 42 0 4 12.00 70/184756 43 0 8 14.67 15/184756 Note that in this case the exact probabilities provide the same ordering of the tables the Pearson X values. Table 9 is the observed table and the EXACT option in PROC FREQ in SAS computes a multi-sided p-value of 0.0642 by adding the probabilities for tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 19, 25, 30, 31, 32, 34, 35, 36, 38, 39, 40, 41, 42, 43. The sher.test( ) function in SPLUS yields the same p-value by summing the probabilities for all tables that occur with probability no larger than the probabiility of the observed table. Thisis not necessarily a good way to dene a critical region or dene a p-value. This set of tables includes many tables that have fewer treated patients that either improve or stay the 11 2 12 2 9 same than in the observed table, so it does not provide an appropriate p-value with respect to the objective of this study. One reasonable criterion is that tables provide more evidence than the observed table that the IFN-B treatment is better if at least 5 of the treated patients show improvement and at least 9 of the treated patients either improve or stay the same (or no more than one of the treated patients becomse worse). This includes tables 2, 4, 6, 9 and the p-value is 0.01766. Another crierion is that dierence between the number of treated patients that improve and the number of treated patients that get worse should be at least 5 ; 1 = 4. This results in a p-value of 0.0222. Many students failed to clearly describe the criterion used to identify the possible tables that were included in the evaluation of the p-value. Problem 7 (a) Suppose skin cancer patients were randomly sampled with replacement from a population of skin cancer patients (or simple random sampling without replacement could be used if the population of skin patients is large). Each patient in the sample is classied according to the stage of their disease and their reaction to DCNB allergen. Then, the counts in the table would have a multinomial distribution with six categories and sample size n = 173. (b) The value of the Pearson X test is 15.365 with 2df and p ; value = 0:00046: The value of the deviance test is G = 15:445 with 2df and p ; value = 0:00044. 2 2 (c) Both tests in part show strong evidence that the allergic reaction to DCNB is related to the stages of skin cancer. Stage III skin cancer patients are more likely to have negative reactions to DCNB exposure than the other skin cancer patients. (d) The results are the same as in part (b). As shown in class, for a two way contingency table, both the multinomial distribution and independent binomial distributions yield the same mle's for the expected counts and the same degrees of freedom for testing the t of the independence model. Consequently, both procedures yield the same value of the test statistic and the same p-value. The columns of the table would correspond to independent binomila distributions if separate simple random samples of patients were taken from patients at the three dierent stages of skin cancer. 10 Problem 8-11 The objective of this exercise was to show you how easy it is use simulation methods to investigate small sample properties of test statistics. Here we concentrated on the Type I error levels of the Pearson X test and the G test. The table in problem 8 satises Cochran's rule. Only one expected count is smaller than 5 and none are smaller than 1. Cochran's rule is more severely violated as we move through problems 9, 10, and 11. Results in problems 8, 9, and 10 illustrate that the presence of expected counts in the range from 0.5 through 5 tends to inate the type I error level of the G test, while the Pearson X test maintains a type I error level closer to the nominal .05 level. This is seen in the Q-Q plots where quantile of the null distribution of the G statistic tend to be above the 45 degree line. For the table in problemm 11, however, points on the Q-Q plot for the G statistic tend to be below the 45-degree line and the type I error level is smaller than the nominal .05 level. This illustrates that using the large sample chi-square approximation too the null distribution of the G statistic can produce a conservative test with very little power when the table contains many very small expected counts. In such cases, the table will contain many observed zero counts and cells with zero counts contribute zero to the value of the G statistic. In such cases the null distribution of the Pearson statistic places more probability on smaller values than indicated by the large sample chi-squared approximation, but there are also sizeable probabilites of some large positive values of X . These patterns would have been more severe in a table with more cells. Overall, the large sample chi-square approximation to the null distirbution of the test statistics provides more accurate type I error levels for the Pearson X than for the G statistic when there are moderate violations of Cochran's rule. For more severe violations of Cochran's rule, the large sample chi-squared approximation may not provide the desired type I error level for either test statistic. 2 2 2 2 2 2 2 2 2 2 2 The code you used also provided "exact" p-values for the X and G statistics. This was done by simulating 10,000 tables for which the null hypothesis of independence is true, using the overall success rate from the observed table to generate four independent binomial counts with sample sizes corresponding to the column totals in the original table of counts. Is this a reasonable thing to do? It does not provide the same p-value as Fisher's exact test where both the row and column totals are xed. The Fisher exact test, which is appropriate for randomized experiments, considers only a small subset of the possible tables we could have seen in simulating 10,000 tables. Which approach is better in this situation? Most students failed to address these issues. 2 11 2