Stat 557 Midterm Exam Solutions Fall 2002 Some problems may have more than one reasonable solution, although better solutions receive higher scores. Not all of the reasonable solutions may be mentioned below. Also, you should only give the solution you think is best. If you gave more than one solution, you were awarded the score for the weakest solution you gave. You should a score written on your paper for each part of each problem. If not, check with the instructor. Also check if your score was correctly tabulated. A stem-leaf display of scores is given on the last page. 1. (A) (10 points) Since the observed numbers of tumors are small, the normal approximation to the null p1 − p 2 n p + n 2 p2 distribution of , may not provide an accurate p-value. Z= where p = 1 1 n1 + n 2 1 1 p(1 − p) + n1 n 2 Also, the chi-squared approximation to the null distribution of the Pearson chi-square test statistic, or the likelihood ratio test statistic, may not provide an accurate p-value, and those tests are for two-sided alternatives. Use Fisher’s exact randomization test to compute the one-sided p-value as 5 27 5 27 4 12 + 5 11 = 86919300 + 13037895 = .166 601080390 601080390 32 32 16 16 The null hypothesis is not rejected. These data do not provide strong evidence in support of the claim that exposure to Avadex increases the incidence of lung cancer. (B) (10 points) In a larger study, the normal approximation to the Z-test could provide accurate p-values. We will base our power calculation on the normal approximation to that test statistic for obtaining power .90 of detecting a difference of (.25-.0625) = .1875 using a .05 level test against a one-sided alternative. .25 + .0625 2(.15625)(.84375) Using π = = .15625 and R = = 1.035 we have 2 (.25)(.75) + (.0625)(.9375) n= 2. (1.282 + (1.035) (1.645)) 2 [(.25)(.75) + (.0625)(.9375)] (.25 − .75) 2 = 63 mice in each treatment group. (A) (8 points) The five counts Y1 , Y2 , Y3 , Y4 , Y5 have a multinomial distribution with probabilities π i = ( mi −1 e − m ) /(i − 1)! for i=1, 2, 3, 4 and π 5 = 1 − π1 − π 2 − π 3 − π 4 . Then, the log-likelihood 5 5 i =1 i =1 function for these counts is ( m) = log(n! ) − ∑ log(Yi !) + ∑ Yi log(π i ) where n=400. 2 (B) (8 points) df = (5 − 1) − 1 = 3 . Since 9.02 > χ 3,.05 = 7.81 , the Poisson model is rejected at the .05 level of significance. (C) 5 ( Yi − m̂ i ) 2 i =1 m̂ i (i) (8 points) Since X 2 = ∑ = 3.634 , is smaller than χ 22, .10 the data are 1 consistent with the proposed model. Also, G 2 = 2 5 ∑ i =1 χ 22, .10 = 4.61. (ii) Y Yi log i = 3.558 is smaller than m̂ ij (8 points) Assuming this model is correct, the mle for the expected number of ˆ compartments that contain exactly one yeast cell is m̂1 = 400θˆ (1 − λˆ ) e − θ = 119.62 . To apply the delta method, compute a vector of estimates of first partial derivatives ∂m Ĝ = 1 ∂θ ˆ ∂m1 = 400(1 − θˆ )(1 − λˆ ) e − θ ∂λ ˆ − 400θˆ e − θ = [39.027 − 141.897] and estimate the variance of m̂1 as Var ( m̂1 ) = ĜV̂Ĝ T = 75.133 . Then, an approximate 95% confidence interval is 119.62 ± (1.96)(8.668) ⇒ 119.62 ± 16.99 ⇒ (102.6, 136.6) Alternatively, you could construct an approximate 95% confidence interval for log(m1 ) = log(n ) + log(θ) + log(1 − λ ) − θ using the delta method to obtain the large sample normal approximation to the distribution of log(m̂1 ) = log(n ) + log(θˆ ) + log(1 − λˆ ) − θˆ . ∂m Compute Ĝ = 1 ∂θ [ ] ∂m1 ˆ −1 = θ − 1 − (1 − λˆ ) −1 = [0.326 − 1.186] . Then, ∂λ Var (log(m̂1 )) = ĜV̂Ĝ T = .00525 and an approximate 95% confidence interval for log(m1 ) is log(119.62) ± (1.96) .00525 ⇒ ( 4.642, 4.926) . Applying the exponential function to the endpoints of this interval yields (103.7, 137.8) as an approximate 95% confidence interval for m1 . 3. ˆ= (A) (4 points) α (B) (51)(10) = 0.5095 (77)(13) 1 1 1 1 + + + = .4577 . Then, an 51 77 13 10 ˆ ) ± (1.96)Slog(αˆ ) ⇒ − .6743 ± .4577 approximate 95% confidence interval for log(α) is log(α ˆ ) = −0.6743 and Slog(αˆ ) = (8 points) Compute log(α ⇒ (−1.5714, 0.2229) Then, an approximate 95% confidence interval for α is (e −1.5714 , e 0.2229 ) ⇒ (0.2078, 1.2497) . For women under 50 in Tokyo, the odds of three year survival for women diagnosed with malignant tumors is not significantly different from the odds of three year survival for women diagnosed with benign tumors. ˆ ± (1.96)Sαˆ would not come as close to achieving a 95% A confidence interval constructed as α ˆ). coverage probability as the confidence interval based on the normal approximation to log(α Alternatively, one could have described a bootstrap procedure for constructing a confidence interval. (C) (6 points) The largest log-linear model that satisfies the null hypothesis that 3-year survival is conditionally independent of treatment center given the age of the women and tumor status at time of diagnosis is CT AS AT ST CAT AST log(mijk ) = λ + λCi + λAj + λSk + λT + λCA ij + λ i + λ jk + λ j + λ k + λ ij + λ ik (D) (4 points) There are 12 degrees of freedom for testing the fit of this model. 2 (E) (14 points) The log-linear model CT AS AT ST AST log(mijk ) = λ + λCi + λAj + λSk + λT + λCA ij + λ i + λ jk + λ j + λ k + λ ik was fit to these data, where C A S T = treatment center (Tokyo, Boston, London) = age group = survival for 3 years (yes, no) = tumor status at time of diagnosis (malignant, benign) Maximum likelihood estimates of the parameters in this model, along with their standard errors, were displayed on page 8 of this exam. These estimates were obtained using the restrictions that any λ term is zero when any factor involved in that λ term is at its highest level. CS Since λ ij = 0 for all (i,j) in this model, it implies that within each of the subgroups formed by the six possible combinations of the age and tumor status categories the probably of surviving three years is the same at all three locations. The presence of a significant three factor interaction involving age, tumor status and survival rates suggests that at each treatment center the association between 3-year survival and tumor status changes across age groups. To examine this in more detail, compute the values of λ ST + λ AST for each age k jk group < 50 years 3-year survival (S) 50 − 69 years > 69 years Malign ant benign Malign ant benign Malign ant Benign Yes -.9103 0 -.4786 0 .1341 0 No 0 0 0 0 0 0 Note that the log of a conditional odds ratio, the odds of 3-year survival when diagnosed with a malignant tumor divided by the odds of 3-year survival when diagnosed with a benign tumor for a particular age group and any center, is obtained as ST + λ AST ) − ( λ ST + λ AST ) − ( λ ST + λ AST ) + ( λ ST + λ AST ) = ( λ ST + λ AST ) ( λ11 j11 12 j12 21 j21 22 j22 11 j11 under the imposed parameter constraints. Then, the estimated conditional odds ratios are Under 50 50-69 70 or older λˆ AST 111 ˆ AST αˆ ST • A =1 = exp(λˆST 11 + λ111 ) = exp( −.9013) = .4024 αˆ = exp(λˆST + λˆ AST ) = exp( −.4786) = .6197 ST • A = 2 αˆ ST • A = 3 11 211 = exp( λˆST 11 ) = exp(.1341) = 1.143 = −1.044 is significantly different from zero (p-value=.0287), we can conclude that αST • A =1 is smaller than αST • A = 3 . However, the data do not show that αST • A = 2 is significantly different from αST • A = 3 . An approximate 95% confidence interval for αST • A = 3 is (.543, 2.41). Confidence intervals for αST • A = 3 and αST • A = 3 cannot be constructed because covariance estimates were not reported. Since 3 For younger women, the odds of 3-year survival for women diagnosed with a malignant tumor are only about 40% of the odds of survival for women diagnosed with a benign tumor. For women 50-69, the odds of 3-year survival for wome diagnosed with a malignant tumor are estimated to be about 60% of the odds of survival for women diagnosed with a benign tumor. For women over 69, the odds of 3-year survival are about the same for women diagnosed with either malignant or benign tumors. Conditional associations between 3-year survival and age, conditional on tumor status and center, are consistent across treatment centers, but are different for malignant and benign tumors. Estimated values of AST λ AS jk + λ jk are shown below. Age Group Malignant Tumor Benign Tumor Survived Died Survived Died <50 0.1384 0 1.1828 0 50-69 0.3887 0 1.0014 0 >69 0 0 0 0 (A) The log of a conditional odds ratio, the odds of 3-year survival for a particular age group divided by the odds of 3-year survival for the oldest age group, for the - th tumor status and any center, is obtained as AST AS AST AS AST AS AST AS AST ( λ AS j1 + λ j1 ) − ( λ j2 + λ j2 ) − ( λ 31 + λ 31 ) + ( λ 32 + λ 32 ) = ( λ11 + λ j1 ) under the imposed parameter constraints. Then, the estimated conditional odds ratios are ˆ AST Malignant tumor: under 50 vs over 69: exp( λˆ AS 11 + λ111 ) = exp(.1384 ) = 1.148 ˆ AST exp(λˆ AS 21 + λ 211 ) = exp(.3887) = 1.475 Malignant tumor: under 50 vs 50-69: exp(.1384 − .3887) = 0.779 Benign tumor: under 50 vs over 69: exp( λˆ AS ) = exp(1.1828) = 3.263 Malignant tumor: 50-69 vs over 69: 11 Benign tumor: 50-69 vs over 69: exp(λˆ AS 21 ) = exp(1.0014) = 2.722 exp(1.1828 − 1.0014) = 1.199 Benign tumor: under 50 vs 50-69: For women diagnosed with malignant tumors, the odds of 3-year survival is estimated to be 15% greater for women under 50 than for women over 69. We cannot construct a confidence interval or perform a test of significance because the required covariance estimate was not reported. For women diagnosed with benign tumors, the odds of 3-year survival is about 3 times greater for women under 50 than for women over 69. An approximate 95% confidence interval for this odds ratio is (1.68, 6.32). Since λˆ AST 111 = −1.044 is significantly different from zero (p-value= .0287), we conclude that the conditional odds ratios (age effect on 3-year survival) for women with benign tumors is significantly greater than the conditional odds ratio for women with malignant tumors. For women diagnosed with malignant tumors, the odds of 3-year survival is estimated to be 47% greater for women under 50 than for women over 69. We cannot construct a confidence interval or perform a test of significance because the required covariance estimate was not reported. For women diagnosed with benign tumors, the odds of 3-year survival is about 3 times greater for women under 50 than for women over 69. An approximate 95% confidence interval for this odds ratio is (1.50, 4.94). Since λˆ AST 211 = −1.044 is not significantly different from zero (p-value= .1737), we cannot conclude that the conditional odds ratios (age effect on 3-year survival) for benign tumors is significantly greater than the conditional odds ratio for malignant tumors. Differences in odds of 3-year survival between the youngest and middle age groups are not large, if they exist at all. We cannot perform tests of significance or construct confidence intervals because needed covariance estimates were not reported. 4 You could also comment on the associations between treatment centers and age and treatment centers and CA rates of malignant and benign tumors implied by the significant λ ij and λCT i terms in the model, although you were not asked for this information and it does not affect the score you received for your ˆCT answer. λˆ CT 11 = −.4051 and λ 21 = −.6740 indicate that there were lower proportions of malignant tumors in the samples taken from the Tokyo and Boston treatment centers than from the sample taken from ˆ AC the London treatment centers. λˆ AC 11 = 1.4390 and λ12 = −0.7642 indicate that the samples taken from the Tokyo treatment centers had the highest proportion of women under 50 and the samples taken from the Boston treatment centers had the lowest proportion of women under 50, within each combination of the ˆ AC survival and tumor status categories. λˆ AC 21 = 0.6049 and λ 22 = −0.4831 indicate a similar pattern for 50-69 year old women, but not as pronounced. (F) (4 points) Since the model in part (E) indicates that both 3-year survival (S) and treatment centers (C) have significant associations with age, inference about the conditional association between 3-year survival (S) and treatment centers (C) could change if the table is collapsed across the age levels. 4. (8 points) A confidence interval based on the large sample normal approximation to the distribution of a sample proportion would not be appropriate in this case. It would produce 0 ± (1.96)(0) ⇒ (0, 0) . An “exact” confidence interval can be obtained by finding the values of the proportion π 0 that are consistent with the observed data in the sense that the null hypothesis H 0 : π = π 0 would not be rejected at the .05 level of significance. Then, the upper bound of the interval corresponds to the largest value of π 0 for which 10 .025 ≥ Pr{Y ≤ 0} = Pr{Y = 0} = π 00 (1 − π 0 )10 − 0 = (1 − π 0 )10 . This yields π 0 = .308 as the upper 0 bound of the confidence interval. The lower bound is the smallest value of π 0 for which .025 ≥ Pr{Y ≥ 0} . This is taken to be π 0 = 0 in this case because Pr{Y ≥ 0} = 1 for any π 0 > 0 . The confidence interval is (0, .308) . A bootstrap confidence intervals was suggested by some students, but this is no more accurate than the large sample normal approximation in this case. Since you observed no failures in 10 cases, resampling from the observed sample would always produce no failures in 10 cases and a confidence interval of (0,0). The total number of points for this exam are 100. A stem-leaf display of the scores is shown below. 9|02 8|68 8|0001113344 7|5678 7|011113444 6|5555667778 6|001344 5| 5| 4| 4|3 5