Biost 513

advertisement
Biost 513
Spring 2000
Professor Breslow
HOMEWORK #3
(Due Friday, April 21 in class)
Reading:

Rosner  11.8, 11.14
For Reference:

“Use of the logistic and related models in longitudinal studies of chronic disease risk”
(Coursepak Readings)
Problems:
1. The two previous homework assignments have used the grouped data from the Ille-etVilaine case-control study (either tuyns.dat or esoph.raw) to study the relationship
between alcohol and tobacco consumption and esophageal cancer. These analyses
ignored age. Yet we know from the class notes that age is strongly related to the
cancer outcome (p. 32715) and is at least moderately associated with both alcohol
and tobacco consumption in the population at risk (p. 32717). This would suggest
that, as in many such situations, age be treated as confounder (Age may not be so
much a causal risk factor for cancer as a surrogate for the cumulative effects of many
unmeasured risk factors).
Create a binary “exposure” variable tobexp coded 0 for 0-19 cigarettes per day and 1
for 20+ cigarettes per day (see p. 32913 of the class notes for one way to do this).
Similarly, create a binary alcohol exposure variable alcexp coded 0 for 0-79 gm/day
and 1 for 80+ gm/day (see p. 40507). Denote cancer status by cc=1 for cases cc=0
for controls.
a) Analyze the relationship between cancer (cc) and tobacco (tobexp) by creating a
single 2  2 table. Quote and interpret the odds ratio estimate and a 95%
confidence limit for the odds ratio. Why is this called the “crude” estimate?
b) Now adjust for age by stratification into the 6 age categories. Determine and
interpret an adjusted odds ratio for tobexp and a 95% confidence limit using the
Mantel-Haenszel method. State in simple terms how the meaning of this estimate
differs from that calculated in (a).
c) Is the assumption of a common odds ratio, which implicitly underlies the
calculations in (b), a plausible assumption? Present evidence to support your
conclusions.
1
d) Repeat parts (b) and (c), but this time using simultaneous adjustment for age and
alcexp.
e) Is there evidence that alcohol and tobacco consumption are associated? after
adjustment for age? Why is it best to examine this association using the control
population only?
2. Logistic regression can be used to compute odds ratio estimates after adjusting for
other variables. Consider the analyses in question 1 that focused on tobexp as the
exposure variable of interest:
a) What would be the dependent variable in a logistic regression for the Ille-etVilaine data?
b) Define (write down the equation for) a logistic regression model that would
characterize the unadjusted (crude) odds ratio that was measured in question (a).
c) The output below is from a logistic regression of cc on both tobexp and age, with
5 dummy variables used for the effects of each higher age group relative to
baseline. Compute and interpret the estimated odds ratio for tobexp, and its 95%
confidence limit. Compare the point and interval estimates to those obtained in
question 1(b). Are they similar? Do they have similar interpretations? Why or
why not?
. xi: logit cc tobexp i.age [fweight=freq]
i.age Iage_1-6 (naturally coded; Iage_1 omitted)
Logit estimates
Number of obs
LR chi2(6)
Prob > chi2
Log likelihood = -425.10698
Pseudo R2
=
0.1408
cc
tobexp
Iage_2
Iage_3
Iage_4
Iage_5
Iage_6
_cons
Coef.
.8339739
1.713168
3.47941
3.999044
4.217314
3.990266
-5.008187
=
=
=
Std. Err.
.1929275
1.061829
1.019266
1.015121
1.020075
1.060016
1.008153
975
139.27
0.0000
[95% Conf. Interval]
.455843
1.212105
-.3679782
3.794314
1.481686
5.477135
2.009444
5.988644
2.218005
6.216624
1.912672
6.06786
-6.984131
-3.032244
3. Suppose you are interested in describing whether social status, as measured by a (0,1)
variable called SOC, is associated with cardiovascular disease mortality (within 10
years), as defined by a (0,1) variable called CVD. Suppose further that you have
carried out a 12-year follow-up study of 200 men who are 60 years old or older. In
assessing the relationship between SOC and CVD, you decide that you want to
control for smoking status [SMK, a (0,1) variable] and systolic blood pressure (SBP,
a continuous variable).
2
In analyzing your data, you decide to fit two logistic models, each involving the
dependent variable CVD, but with different sets of independent variables. The
variables involved in each model and their estimated coefficients are listed below:
Model 1
Variable
Coefficient
Intercept
-1.180
SOC
-0.520
SBP
0.040
SMK
-0.560
SOC  SBP
-0.033
SOC  SMK
0.175
Model 2
Variable
Coefficient
Intercept
-1.190
SOC
-0.500
SBP
0.010
SMK
-0.420
a) For each of the models fitted above, state the form of the logistic model that was
used – stating the dependent variable, the interpretation of the probability   X ,
and the model for   X  in terms of the (unknown) population parameters and the
independent variables.
b) For each of the models in (a) state the form of the estimated log odds functions:
logit   X   . 
c) Using Model 1, compute the estimated risk for CVD death (i.e., CVD=1) for a
high social class (SOC=1) smoker (SMK=1) with SBP=150 (person 1), and a low
social class (SOC=0) smoker (SMK=1) with SBP=150 (person 2). What is the
estimated relative risk comparing these individuals?
d) Repeat part (c) using Model 2. Why is the estimate different?
e) What is the estimated odds ratio comparing SOC=1 to SOC=0 for non-smokers
SMK=0 with SBP=150 under Model 1 and under Model 2 (Note: use the
coefficients directly rather than calculate ˆ ( X )).
f) If the study design had been a case-control study (retrospective) which risk
estimate would you report (RR or OR)? Justify.
3
Download