STAT 557 FALL 2000 Reading Assignment: Assignment #3 Lloyd: The delta method is reviewed in Section 1.6. The analysis of several 2x2 tables is discussed in Sections 3.4-3.6. Written Assignment: On campus: Off campus: 1. Name ______________ Due Friday, October 6, in class. Put in the mail by Saturday, October 14. In 1974, the Danish National Institute for Social Science Research interviewed a random sample of Danes between 20 and 69 years old in order to investigate the general welfare in Denmark, The following two tables (Andersen, 1990) cross-classify workers with respect to the physical and psychological demands of the employment. There are separate tables for males and females. Table 1: Females Work is physically demanding Usually Sometimes Seldom Work is psychologically demanding Usually Sometimes Seldom 100 109 202 33 89 179 100 179 542 Usually Sometimes Seldom Work is psychologically demanding Usually Sometimes Seldom 113 163 370 45 106 280 229 343 568 Table 2: Males Work is physically demanding Use the Goodman-Kruskal gamma statistic to quantify the level of association between attitudes about the physical and psychological demands of work for females and males. Report a standard error for each estimate and use the large sample normal approximation to the distribution of the gamma statistic to construct approximate 95% confidence intervals. 95% Confidence Interval Gamma Estimate Females: Males: Standard Error Lower Limit Upper Limit 2 2. The sample size may need to be very large before the sampling distribution of the GoodmanKruskal gamma (γˆ ) statistic is reasonably well approximated by its limiting normal distribution, especially if γ is large. A transformation that approaches its asymptotic normal distribution more rapidly is ξ̂ = 1/2 log [(1 + γˆ ) / (1 − γˆ )] . (Note that this is a transformation R.A. Fisher proposed for correlation coefficients.) Use the delta method to obtain a formula for the large sample variance of function of the large sample variance for γ̂ . (b) ξ̂ also has a limiting normal distribution. Use this fact and the result for Part (a) to construct approximate 95% confidence intervals for γ for the two tables in Problem 1. (c) 3. ξ̂ as a (a) females: lower limit = _________ upper limit = _________ males: lower limit = _________ upper limit = _________ Test the null hypothesis that the level of association between attitudes toward physical and psychological demands of employment, as measured by gamma, is the same for females and males. Give a formula and a value for your test statistic and a p-value. State your conclusions. (In answering this question you may assume that the counts for females and males have independent multinomial distributions. If continuous data are classified into categories, the choice of the number of categories can affect the value of measures of association or agreement. The following set of tables were obtained by cross-classifying 173 Boston area female registered nurses aged 34-39 according to sucrose intake levels from a food consumption questionnaire administered twice to each nurse, about one year apart, in 1980 and 1981 (Maclure, M. and Willet, W. C. 1987, Am. J. Epidemiology, 122, 51-65. In the first table the nurses are cross-classified according to whether they are above or below the median of the sample for each questionnaire. In the second table they are crossclassified by quartiles, and in the fourth table they are cross-classified by dodeciles. For each table compute P = proportion of cases on the main diagonal K = unweighted kappa ρs = Spearman's rho γ = Goodman-Kruskal Gamma λ R|C = proportional reduction in error for predicting the row category from the column category 3 (a) 1981 1980 1 2 67 20 19 67 1 2 (b) 1981 1 24 10 8 1 1980 2 3 12 5 21 12 7 14 3 12 4 2 1 14 27 1 17 5 3 1 2 1 2 5 9 6 4 3 1 1980 3 4 4 1 8 5 10 7 6 4 0 9 1 2 5 0 2 2 11 7 8 4 1 0 2 3 2 3 0 1 0 1 0 1 5 1 3 2 3 1 1 1 2 0 0 0 0 1 2 3 4 (c) 1 2 1981 3 4 5 6 6 1 1 1 3 8 15 (d) 1981 1 1 2 3 4 5 6 7 8 9 10 11 12 1 7 3 1 1 0 0 1 0 1 0 0 0 2 4 3 0 3 2 1 0 0 1 0 1 0 3 0 4 2 2 1 0 1 2 2 0 0 0 6 0 0 3 0 5 3 2 1 0 0 1 0 1980 7 1 0 1 1 0 3 3 0 1 4 0 0 8 0 0 2 1 1 3 0 1 2 2 1 1 9 0 0 1 1 2 0 3 2 2 3 0 1 10 0 0 0 0 0 0 1 5 1 1 5 2 11 0 1 1 0 0 1 1 0 3 1 1 5 12 0 0 0 0 0 0 1 1 1 3 5 4 4 (e) 4. After reviewing the results for Parts (a), (b), (c), and (d), what advice would you give to the researchers on summarizing the level of agreement or association between the sucrose intake values from the first and second questionnaire? Research in the 1970's indicated that the Epstein-Barr virus (EBV) was a cause of infectious mononucleosis. Some investigators felt that EBV resides and replicates in the oropharynx with transmission via the buccal fluids. Tonsillectomy and adenoidectomy might eliminate or reduce infection rates for mononucleosis. To check this hypothesis Richard Goode and Donald Coursey reviewed charts of students treated for infectious mononucleosis at Stanford University's Cowell Student Health Center between January 1968 and May 1973 for confirmation of the disease and history of tonsillectomy. This constitutes the IM group. The control group consists of students seen at the Cowell Health Center between April and September 1973 for any other ailment and who were willing to divulge whether or not they had a tonsillectomy. The following data were reported by Miller (1980) for 18 to 24 year old patients. (a) Consider the overall table IM Cases Controls Tonsillectomy 40 235 No Tonsillectomy 145 420 Construct a 95% confidence interval for the odds ratio (or approximate relative risk). Report lower bounds = __________ (b) upper bounds = _____________ There was some concern that ages of students might affect the comparison because the control group has a larger proportion of older students who have had more opportunity to undergo a tonsillectomy. Also, medical wisdom on the value of tonsillectomy has varied over the years. (Read the discussion of Simpson’s Paradox in Section 3.6 of Lloyd’s book). The following tables provide a stratification of the data by age. Compute an estimate of the odds ratio and an approximate 95% confidence intervals for the odds ratios at each age level. 5 Tonsillectomy No Tonsillectomy Odds Ratio IM Cases Controls 6 17 17 32 _______ _______ IM Cases Controls 3 26 39 70 _______ _______ IM Cases Controls 12 34 29 78 _______ _______ IM Cases Controls 8 48 38 89 _______ _______ IM Cases Controls 5 45 10 73 _______ _______ IM Cases Controls 2 29 7 37 _______ _______ IM Cases Controls 4 36 5 39 _______ _______ Age (in years) 18 19 20 21 22 23 24 (c) Compute the value of the Mantel-Haenszel estimator of the common odds ratio and also obtain an approximate 95% confidence interval. αˆ MH = ________ (d) 95% C.I. for Odds Ratio lower limit = _________ upper limit = _________ The Mantel-Haenszel estimator in Part (b) is appropriate when the odds ratios are the same for all 7 age levels. Compute the values of the Breslow-Day test and the T4 test for homogeneity of odds ratios. Report Breslow-Day test statistics = _______ T4 = _________ d.f. = _______ p-value = _________ p-value = _________ (e) State your conclusion from Part (d). Do the odds ratios appear to be homogeneous? If not, describe how the odds ratios differ across age groups. (f) Compute the value of the Cochran-Mantel-Haenszel test statistic for the null hypothesis that tonsillectomy rates are independent of case/control status within each age group. Report X² = ________ d.f. = _________ p-value = _________ and state your conclusion. 6 5. Do Parts (a), (b), and (c) of Problem 2.14 on Pages 112-113 of Lloyd’s book. In each of those parts, the alternative hypothesis is the general product multinomial model (d) Consider the independence model where the multinomial distributions of plant counts, across the four leaf shape / size categories, are the same for all three districts. Test this null hypothesis against the general alternative. (e) The models in Parts (a), (b), and (c) are a set of nested models. Write out an analysis of deviance table for this set of models. (f) The models in Parts (a), (c) and (d) form a set of nested models. Write out an analysis of deviance table for this set of models. (g) Do the models in (a), (b), (c), and (d) form a set of nested models? Explain. 6. Consider the data in Problem 3.24 on Page 173 in Lloyd’s book. (a) Use the odds ratio to quantify the effect of loss of a sibling on the risk of being a “problem” child within each of the three birth order categories. Construct an approximate 95% confidence interval for each odds ratio. (b) Use the Breslow-Day test to test the null hypothesis of homogeneous odds ratios in Part (a). State your conclusion. Is there a trend in the logarithms of the odds ratios? Why is it appropriate to use the Breslow-Day test in this case? (c) Obtain a 2x2 table by collapsing across the birth order categories. Analyze this table. Is this an example of Simpson’s paradox? Explain. 7. Return to the analysis of smooth cavities in 12 year old children performed in the lecture. (a) Use the maximum likelihood estimates for the parameters in the negative binomial model to compute a maximum likelihood estimate of the proportion of 12 year old children with no cavities. (b) Derive a formula for a large sample approximation to the variance of your estimate in Part (a). (c) Evaluate the variance formula in Part (a) and use it to obtain an approximate 95% confidence interval or the proportion of 12 year old children with no cavities. (d) Describe a method for assessing the true coverage probability of the confidence interval constructed in Part (c). Do not perform any calculations, just outline what you would do.