Biostat 513

advertisement
Biostat 513
Homework 3 Key
Note to students:
The STATA output has been edited to eliminate the presentation of information
unrelated to the question of interest. If you must include STATA output,
please edit it accordingly. In this key, the STATA commands are included with
our tables for your conveinence. You may include STATA output and commands in
an appendix if you think they will be helpful to the graders.
.
.
.
.
.
.
infile age alc tob freq1 freq0 using a:\tuyns_dat.txt
reshape long freq, i(age alc tob) j(cc)
gen tobexp=tob
recode tobexp 1/2=0 3/4=1
gen alcexp=alc
recode alcexp 1/2=0 3/4=1
1. (a) Analyze the relationship between cancer (cc) and tobacco (tobexp) by
creating a 2X2 table. Quote and interpret the odds ratio estimate and
a 95% confidence limit for the odds ratio. Why is this called the “crude”
estimate?
. cs cc tobexp [freq=freq], or
| tobexp
|
|
Exposed
Unexposed |
Total
-----------------+------------------------+---------Cases |
64
136 |
200
Noncases
|
150
625 |
775
-----------------+------------------------+---------Total |
214
761
|
975
|
Point estimate
| [95% Conf. Interval]
|------------------------+---------------------Odds ratio
|
1.960784
| 1.387991
2.770272
+----------------------------------------------chi2(1) =
14.84 Pr>chi2 = 0.0001
(Cornfield)
The odds for getting cancer is approximately 2 times greater for people who
smoke 20+ cigarettes per day than those who smoke less than 20 cigarettes
per day.
This odds ratio is the crude estimate because we did not adjust for any
other covariates.
1. (b) Now adjust for age by stratification into the 6 age categories.
Determine and interpret an adjusted odds ratio for tobexp and a 95%
confidence limit using the Mantel-Haenszel method. State in simple terms how
the meaning of this estimate differs from that calculated in (a).
. cs cc tobexp [freq=freq], by(age) or
age in years |
OR
[95% Conf. Interval]
-----------------+----------------------------------25-34
|
0
0
.
35-44
|
1.817073
.4776855
6.966522
45-54
|
2.857464
1.429078
5.721081
55-64
|
2.442708
1.33919
4.457753
65-74
|
2.186047
.9238396
5.176556
75+
|
.9454545
0
5.039992
-----------------+----------------------------------Crude |
1.960784
1.387991
2.770272
M-H combined |
2.302855
1.578173
3.360306
-----------------+----------------------------------Test of homogeneity (M-H)
chi2(5) =
1.477 Pr>chi2 = 0.9157
Test that combined OR = 1:
Mantel-Haenszel chi2(1) =
Pr>chi2 =
19.16
0.0000
The summary odds ratio for getting cancer, holding age constant, is 2.30
( 95% CI: [1.58,3.36] ) This implies that for each age group, the odds of
getting cancer is 2.3 times greater for people who smoke 20+ cigarettes per
day than those who smoke less than 20 cigarettes per day.
1. (c) Is the assumption of a common odds ratio, which implicitly underlies
the calculations in (b), a plausible assumption? Present evidence to support
your conclusions.
By looking at the M-H test of homogeneity (from part (b)) we can conclude
that the assumption of a common odds ratio is reasonable. The p-value is
0.9157, which provides no evidence that the odd ratios are different.
1. (d) Repeat parts (b) and (c), but this time using simultaneous adjustment
for age and alcexp.
. egen age_alc=group(age alcexp)
. cs cc tobexp [freq=freq], by(age_alc) or
age in years/alchol consumption |
OR
[95% Conf. Interval]
--------------------------------+----------------------------------25-35 / 0-79 gms/day |
.
.
.
25-35 / 80+ gms/day |
0
0
.
35-35 / 0-79 gms/day |
.8888889
0
6.180248
35-35 / 80+ gms/day |
4.2
.5867284
30.99466
45-35 / 0-79 gms/day |
3.916084
1.535386
10.01481
45-35 / 80+ gms/day |
1.767857
.5572697
5.599932
55-35 / 0-79 gms/day |
2.903704
1.316374
6.419849
55-35 / 80+ gms/day |
2.2
.7067541
6.768626
65-35 / 0-79 gms/day |
1.689655
.6156516
4.661245
65-35 / 80+ gms/day |
6.071429
.8038395
.
75+ / 0-79 gms/day |
1.733333
.3163389
10.09754
75+ / 80+ gms/day |
.
.
.
--------------------------------+----------------------------------Crude |
1.960784
1.387991
2.770272
M-H combined |
2.382241
1.591432
3.566017
--------------------------------+----------------------------------Test of homogeneity (M-H)
chi2(9) =
3.738 Pr>chi2 = 0.9278
Test that combined OR = 1:
Mantel-Haenszel chi2(1) =
Pr>chi2 =
0.0000
18.47
part(b): The summary odds ratio for getting cancer, holding age and alcohol
consumption constant, is 2.38. The odds for getting cancer is 2.38 times
greater for people who smoke 20+ cigarettes per day than those who smoke
less than 20 cigarettes per day.
part(c): By looking at the M-H test of homogeneity, we can conclude that the
assumption of a common odds ratio is valid. The p-value is 0.9278, which
provides no evidence that the odds ratios are different.
1. (e) Is there evidence that alcohol and tobacco consumption are associated?
After adjustment for age? Why is it best to examine this association using
the control population only?
To check if alcohol and tobacco consumption are associated:
. cc alcexp tobexp if cc==0 [freq=freq]
| tobexp
|
Proportion
|
Exposed
Unexposed |
Total
Exposed
-----------------+------------------------+---------------------Exposed |
23
86 |
109
0.2110
Unexposed |
127
539 |
666
0.1907
-----------------+------------------------+---------------------Total |
150
625 |
775
0.1935
|
|
|
Point estimate
| [95% Conf. Interval]
|------------------------+---------------------Odds ratio |
1.135049
| .6918794
1.863025 (Cornfield)
+------------------------+---------------------chi2(1) =
0.25 Pr>chi2 = 0.6187
Since this is a 2x2 table, the X2 statisitic can be interpreted as the
result of a X2 test of association, with H0: no association exists and H1:
there is an association between alcohol and tobacco consumption. The
statistic is not significant (p=0.62), so the conclusion is that there is no
evidence for an association between alcohol and tobacco consumption in the
controls.
To check if alcohol and tobacco consumption are associated:
. cc alcexp tobexp if cc==0 [freq=freq], by (age)
age in years |
OR
[95% Conf. Interval]
-----------------+----------------------------------25-34
|
4.772727
1.269772
17.91537
35-44
|
.8465608
.3096488
2.330879
45-54
|
1.370629
.5427856
3.482617
55-64
|
.9427609
.339913
2.636674
64-74
|
.4117647
0
2.698141
75+
|
.
.
.
-----------------+----------------------------------Crude |
1.135049
.6918794
1.863025
M-H combined |
1.170469
.7067113
1.938553
-----------------+----------------------------------Test of homogeneity (M-H)
chi2(4) =
5.47 Pr>chi2 = 0.2424
Test that combined OR = 1:
Mantel-Haenszel chi2(1) =
Pr>chi2 =
0.37
0.5410
To test if there is an association between alcohol and tobacco consumption
after adjusting for age, first the M-H test of homogeneity is used. The
test statisitic is not significant (p=0.24), indicating that there is no
evidence to suggest that the OR’s are different within the age groups. The
M-H test of association test statistic is also non-significant, indicating
that there is no evidence that the OR’s within the age groups are different
from 1.
In conclusion, there is no evidence to suggest an association between
alcohol and tobacco consumption in the control population, with and without
adjusting for age.
It is best to examine this association against the control population
because the control population reflects the population the results of the
study will be applied to. The diseased population is more likely to show an
association between alcohol consumption and tobacco consumption.
2. (a) What would be the dependent variable in a logistic regression for the
Ille-et-Vilaine data?
Cancer (cc) is the dependent variable.
1. (b) Define (write down the equation for) a logistic regression model that
would characterize the unadjusted (crude) odds ratio that was measured in
question (a).
pi(X) = expit(b0 + b1*X) = [exp(b0 + b1*tobexp)]/[1 + exp(b0 + b1*tobexp)]
or
logit[pi(X)] = bo + b1*tobexp
2. (c) Compute and interpret the estimated odds ratio for tobexp, with
adjustment for age, and its 95% confidence limit. Compare the point and
interval estimates to those obtained in question 1(b). Are they similar? Do
they have similar interpretations? Why or Why not?
OR = exp(0.83397) = 2.30
95% CI: [exp(0.455), exp(1.212)] = [1.58, 3.36]
The odds ratio and 95% CI are the same as the point and interval estimate
obtained in 1(b). The interpretation is identical; 2.30 is an estimator of
the age-specific OR assumed constant in age.
3. (a) For each of the models fitted above, state the form of the logistic
model that was used – stating the dependent variable, the interpretation of the
probability pi(X), and the model for pi(X) in terms of the (unknown) population
parameters and the independent variables.
The dependent variable is CVD mortality (1=death from CVD, 0 otherwise).
pi(X) is the probability of dying from Cardiovascular Disease within ten
years for a group of subjects with covariate values X.
The hypothesized models are:
*Model 1:
pi(X) = ( exp( b0 + b1*SOC + b2*SBP + b3*SMK + b4*SOC*SBP +
.b5*SOC*SMK) ) / ( 1 + ( exp( b0 + b1*SOC + b2*SBP +
b3*SMK + b4*SOC*SBP + b5*SOC*SMK) ))
*Model 2:
pi(X) = ( exp( b0 + b1*SOC + b2*SBP + b3*SMK) ) /
(1 + ( exp( b0 + b1*SOC + b2*SBP + b3*SMK) ))
The fitted models are:
*Model 1:
pi(X) = ( exp(-1.180 - .520*SOC + .040*SBP - .560*SMK - .033*SOC*SBP +
.175*SOC*SMK) ) / ( 1 + ( exp(-1.180 - .520*SOC + .040*SBP .560*SMK - .033*SOC*SBP + .175*SOC*SMK) ))
*Model 2:
pi(X) = ( exp(-1.19 - .500*SOC + .010*SBP - .420*SMK) ) /
(1 + ( exp(-1.19 - .500*SOC + .010*SBP - .420*SMK) ))
3. (b) For each of the models in (a) state the form of the estimated log odds
functions: logit[pi(X)]= …
Model 1:
logit[pi(X)] = -1.180 - .520*SOC + .040*SBP - .560*SMK - .033*SOC*SBP
+ .175*SOC*SMK
Model 2:
logit[pi(X)] = -1.19 - .500*SOC + .010*SBP - .420*SMK
3. (c) Using model 1, compute the estimated risk for CVD death (i.e. CVD=1)
for a high social class (SOC=1) smoker(SMK=1) with SBP=150 (person 1), and a
low social class (SOC=0)smoker (SMK=1) with SBP=150 (person 2). What is the
estimated relative risk comparing these individuals?
High Social Class:
pi(X) = ( exp(-1.180 - .520*1 + .040*150 - .560*1 - .033*1*150 +
.175*1*1) ) / ( 1 + ( exp(-1.180 - .520*1 + .040*150 - .560*1 .033*1*150 + .175*1*1) ))
= exp(-1.035) / (1 + exp(-1.035))
= 0.262
Low Social Class:
pi(X) = exp(-1.180 + .040*150 -.56*1) / (1 + exp(-1.180 + .040*150 -.56*1))
= exp(4.26) / (1 + exp(4.26))
= .986
RR = .262 / . 986 = .2657
Smokers with a SBP of 150 in a high social class have .2657 times the
risk of CVD mortality compared to smokers with a SBP of 150 in a low
social class
3. (d)
Repeat parts (c) using model 2.
Why is the estimate so different?
High Social Class:
pi(X) = exp(-1.19 - .5*1 + .01*150 - .42) / (1 + exp(-1.19 - .5*1 +
.01*150 - .42) )
= exp(-.61) / ( 1 + exp(-.61))
= .352
Low Social Class:
pi(X) = exp(-1.19 + .01*150 - .42) / (1 + exp(-1.19 + .01*150 - .42))
= exp(-.11) / (1 + exp(-.11))
= .4725
RR = .352 / .4725 = .745
Smokers with a SBP of 150 in a high social class have .745 times the
risk of CVD mortality compared to smokers with a SBP of 150 in a low
social class
The estimates are different due to the interactions terms in model 1.
3. (e) What is the estimated odds ratio comparing SOC=1 to SOC=0 for nonsmokers SMK=0 with SBP=150 under model 1 and under model 2.
Model 1:
OR = exp(-.52 - .033(150)) = exp(-5.47) = .0042
Model 2:
OR = exp(-.5) = .606
3. (f) If the study design had been a case-control study (retrospective) which
risk estimate would you report (RR or OR)? Justify.
From case-control studies we can not estimate the disease relative risk
(comparing exposed to unexposed). However, from this study design we
can estimate the exposure odds ratio which equals the disease odds ratio.
For rare diseases the OR approximates the RR.
Download