Document 10714929

advertisement
STAT 557
FALL 2000
Reading Assignment:
Assignment #3
Lloyd: The delta method is reviewed in Section 1.6. The analysis of
several 2x2 tables is discussed in Sections 3.4-3.6.
Written Assignment: On campus:
Off campus:
1.
Name ______________
Due Friday, October 6, in class.
Put in the mail by Saturday, October 14.
In 1974, the Danish National Institute for Social Science Research interviewed a random sample
of Danes between 20 and 69 years old in order to investigate the general welfare in Denmark,
The following two tables (Andersen, 1990) cross-classify workers with respect to the physical
and psychological demands of the employment. There are separate tables for males and
females.
Table 1: Females
Work is physically demanding
Usually
Sometimes
Seldom
Work is psychologically demanding
Usually
Sometimes
Seldom
100
109
202
33
89
179
100
179
542
Usually
Sometimes
Seldom
Work is psychologically demanding
Usually
Sometimes
Seldom
113
163
370
45
106
280
229
343
568
Table 2: Males
Work is physically demanding
Use the Goodman-Kruskal gamma statistic to quantify the level of association between attitudes
about the physical and psychological demands of work for females and males. Report a
standard error for each estimate and use the large sample normal approximation to the
distribution of the gamma statistic to construct approximate 95% confidence intervals.
95% Confidence Interval
Gamma
Estimate
Females:
Males:
Standard Error
Lower Limit
Upper Limit
2
2.
The sample size may need to be very large before the sampling distribution of the GoodmanKruskal gamma (γˆ ) statistic is reasonably well approximated by its limiting normal distribution,
especially if γ is large. A transformation that approaches its asymptotic normal distribution
more rapidly is ξ̂ = 1/2 log [(1 + γˆ ) / (1 − γˆ )] . (Note that this is a transformation R.A. Fisher
proposed for correlation coefficients.)
Use the delta method to obtain a formula for the large sample variance of
function of the large sample variance for γ̂ .
(b)
ξ̂ also has a limiting normal distribution. Use this fact and the result for Part (a) to
construct approximate 95% confidence intervals for γ for the two tables in Problem 1.
(c)
3.
ξ̂ as a
(a)
females:
lower limit = _________
upper limit = _________
males:
lower limit = _________
upper limit = _________
Test the null hypothesis that the level of association between attitudes toward physical and
psychological demands of employment, as measured by gamma, is the same for females
and males. Give a formula and a value for your test statistic and a p-value. State your
conclusions. (In answering this question you may assume that the counts for females and
males have independent multinomial distributions.
If continuous data are classified into categories, the choice of the number of categories can affect
the value of measures of association or agreement. The following set of tables were obtained by
cross-classifying 173 Boston area female registered nurses aged 34-39 according to sucrose
intake levels from a food consumption questionnaire administered twice to each nurse, about
one year apart, in 1980 and 1981 (Maclure, M. and Willet, W. C. 1987, Am. J. Epidemiology,
122, 51-65. In the first table the nurses are cross-classified according to whether they are above
or below the median of the sample for each questionnaire. In the second table they are crossclassified by quartiles, and in the fourth table they are cross-classified by dodeciles. For each
table compute
P = proportion of cases on the main diagonal
K = unweighted kappa
ρs = Spearman's rho
γ = Goodman-Kruskal Gamma
λ R|C = proportional reduction in error for predicting the row category from the
column category
3
(a)
1981
1980
1
2
67
20
19
67
1
2
(b)
1981
1
24
10
8
1
1980
2
3
12 5
21 12
7
14
3
12
4
2
1
14
27
1
17
5
3
1
2
1
2
5
9
6
4
3
1
1980
3
4
4
1
8
5
10 7
6
4
0
9
1
2
5
0
2
2
11
7
8
4
1
0
2
3
2
3
0
1
0
1
0
1
5
1
3
2
3
1
1
1
2
0
0
0
0
1
2
3
4
(c)
1
2
1981 3
4
5
6
6
1
1
1
3
8
15
(d)
1981
1
1
2
3
4
5
6
7
8
9
10
11
12
1
7
3
1
1
0
0
1
0
1
0
0
0
2
4
3
0
3
2
1
0
0
1
0
1
0
3
0
4
2
2
1
0
1
2
2
0
0
0
6
0
0
3
0
5
3
2
1
0
0
1
0
1980
7
1
0
1
1
0
3
3
0
1
4
0
0
8
0
0
2
1
1
3
0
1
2
2
1
1
9
0
0
1
1
2
0
3
2
2
3
0
1
10
0
0
0
0
0
0
1
5
1
1
5
2
11
0
1
1
0
0
1
1
0
3
1
1
5
12
0
0
0
0
0
0
1
1
1
3
5
4
4
(e)
4.
After reviewing the results for Parts (a), (b), (c), and (d), what advice would you give to
the researchers on summarizing the level of agreement or association between the sucrose
intake values from the first and second questionnaire?
Research in the 1970's indicated that the Epstein-Barr virus (EBV) was a cause of infectious
mononucleosis. Some investigators felt that EBV resides and replicates in the oropharynx with
transmission via the buccal fluids. Tonsillectomy and adenoidectomy might eliminate or reduce
infection rates for mononucleosis. To check this hypothesis Richard Goode and Donald
Coursey reviewed charts of students treated for infectious mononucleosis at Stanford
University's Cowell Student Health Center between January 1968 and May 1973 for
confirmation of the disease and history of tonsillectomy. This constitutes the IM group. The
control group consists of students seen at the Cowell Health Center between April and
September 1973 for any other ailment and who were willing to divulge whether or not they had
a tonsillectomy. The following data were reported by Miller (1980) for 18 to 24 year old
patients.
(a)
Consider the overall table
IM Cases
Controls
Tonsillectomy
40
235
No Tonsillectomy
145
420
Construct a 95% confidence interval for the odds ratio (or approximate relative risk).
Report
lower bounds = __________
(b)
upper bounds = _____________
There was some concern that ages of students might affect the comparison because the
control group has a larger proportion of older students who have had more opportunity to
undergo a tonsillectomy. Also, medical wisdom on the value of tonsillectomy has varied
over the years. (Read the discussion of Simpson’s Paradox in Section 3.6 of Lloyd’s
book). The following tables provide a stratification of the data by age. Compute an
estimate of the odds ratio and an approximate 95% confidence intervals for the odds ratios
at each age level.
5
Tonsillectomy
No
Tonsillectomy
Odds
Ratio
IM Cases
Controls
6
17
17
32
_______
_______
IM Cases
Controls
3
26
39
70
_______
_______
IM Cases
Controls
12
34
29
78
_______
_______
IM Cases
Controls
8
48
38
89
_______
_______
IM Cases
Controls
5
45
10
73
_______
_______
IM Cases
Controls
2
29
7
37
_______
_______
IM Cases
Controls
4
36
5
39
_______
_______
Age (in years)
18
19
20
21
22
23
24
(c)
Compute the value of the Mantel-Haenszel estimator of the common odds ratio and also
obtain an approximate 95% confidence interval.
αˆ MH = ________
(d)
95% C.I.
for Odds Ratio
lower limit = _________
upper limit = _________
The Mantel-Haenszel estimator in Part (b) is appropriate when the odds ratios are the
same for all 7 age levels. Compute the values of the Breslow-Day test and the T4 test for
homogeneity of odds ratios. Report
Breslow-Day test statistics = _______
T4 = _________
d.f. = _______
p-value = _________
p-value = _________
(e)
State your conclusion from Part (d). Do the odds ratios appear to be homogeneous? If
not, describe how the odds ratios differ across age groups.
(f)
Compute the value of the Cochran-Mantel-Haenszel test statistic for the null hypothesis
that tonsillectomy rates are independent of case/control status within each age group.
Report
X² = ________
d.f. = _________
p-value = _________
and state your conclusion.
6
5. Do Parts (a), (b), and (c) of Problem 2.14 on Pages 112-113 of Lloyd’s book. In each of those
parts, the alternative hypothesis is the general product multinomial model
(d)
Consider the independence model where the multinomial distributions of plant counts,
across the four leaf shape / size categories, are the same for all three districts. Test this
null hypothesis against the general alternative.
(e)
The models in Parts (a), (b), and (c) are a set of nested models. Write out an analysis of
deviance table for this set of models.
(f)
The models in Parts (a), (c) and (d) form a set of nested models. Write out an analysis
of deviance table for this set of models.
(g)
Do the models in (a), (b), (c), and (d) form a set of nested models? Explain.
6. Consider the data in Problem 3.24 on Page 173 in Lloyd’s book.
(a)
Use the odds ratio to quantify the effect of loss of a sibling on the risk of being a
“problem” child within each of the three birth order categories. Construct an
approximate 95% confidence interval for each odds ratio.
(b)
Use the Breslow-Day test to test the null hypothesis of homogeneous odds ratios in Part
(a). State your conclusion. Is there a trend in the logarithms of the odds ratios? Why is
it appropriate to use the Breslow-Day test in this case?
(c)
Obtain a 2x2 table by collapsing across the birth order categories. Analyze this table. Is
this an example of Simpson’s paradox? Explain.
7. Return to the analysis of smooth cavities in 12 year old children performed in the lecture.
(a)
Use the maximum likelihood estimates for the parameters in the negative binomial model
to compute a maximum likelihood estimate of the proportion of 12 year old children with
no cavities.
(b)
Derive a formula for a large sample approximation to the variance of your estimate in
Part (a).
(c)
Evaluate the variance formula in Part (a) and use it to obtain an approximate 95%
confidence interval or the proportion of 12 year old children with no cavities.
(d)
Describe a method for assessing the true coverage probability of the confidence interval
constructed in Part (c). Do not perform any calculations, just outline what you would do.
Download