Biostat 513 Homework 3 Key Note to students: The STATA output has been edited to eliminate the presentation of information unrelated to the question of interest. If you must include STATA output, please edit it accordingly. In this key, the STATA commands are included with our tables for your conveinence. You may include STATA output and commands in an appendix if you think they will be helpful to the graders. . . . . . . infile age alc tob freq1 freq0 using a:\tuyns_dat.txt reshape long freq, i(age alc tob) j(cc) gen tobexp=tob recode tobexp 1/2=0 3/4=1 gen alcexp=alc recode alcexp 1/2=0 3/4=1 1. (a) Analyze the relationship between cancer (cc) and tobacco (tobexp) by creating a 2X2 table. Quote and interpret the odds ratio estimate and a 95% confidence limit for the odds ratio. Why is this called the “crude” estimate? . cs cc tobexp [freq=freq], or | tobexp | | Exposed Unexposed | Total -----------------+------------------------+---------Cases | 64 136 | 200 Noncases | 150 625 | 775 -----------------+------------------------+---------Total | 214 761 | 975 | Point estimate | [95% Conf. Interval] |------------------------+---------------------Odds ratio | 1.960784 | 1.387991 2.770272 +----------------------------------------------chi2(1) = 14.84 Pr>chi2 = 0.0001 (Cornfield) The odds for getting cancer is approximately 2 times greater for people who smoke 20+ cigarettes per day than those who smoke less than 20 cigarettes per day. This odds ratio is the crude estimate because we did not adjust for any other covariates. 1. (b) Now adjust for age by stratification into the 6 age categories. Determine and interpret an adjusted odds ratio for tobexp and a 95% confidence limit using the Mantel-Haenszel method. State in simple terms how the meaning of this estimate differs from that calculated in (a). . cs cc tobexp [freq=freq], by(age) or age in years | OR [95% Conf. Interval] -----------------+----------------------------------25-34 | 0 0 . 35-44 | 1.817073 .4776855 6.966522 45-54 | 2.857464 1.429078 5.721081 55-64 | 2.442708 1.33919 4.457753 65-74 | 2.186047 .9238396 5.176556 75+ | .9454545 0 5.039992 -----------------+----------------------------------Crude | 1.960784 1.387991 2.770272 M-H combined | 2.302855 1.578173 3.360306 -----------------+----------------------------------Test of homogeneity (M-H) chi2(5) = 1.477 Pr>chi2 = 0.9157 Test that combined OR = 1: Mantel-Haenszel chi2(1) = Pr>chi2 = 19.16 0.0000 The summary odds ratio for getting cancer, holding age constant, is 2.30 ( 95% CI: [1.58,3.36] ) This implies that for each age group, the odds of getting cancer is 2.3 times greater for people who smoke 20+ cigarettes per day than those who smoke less than 20 cigarettes per day. 1. (c) Is the assumption of a common odds ratio, which implicitly underlies the calculations in (b), a plausible assumption? Present evidence to support your conclusions. By looking at the M-H test of homogeneity (from part (b)) we can conclude that the assumption of a common odds ratio is reasonable. The p-value is 0.9157, which provides no evidence that the odd ratios are different. 1. (d) Repeat parts (b) and (c), but this time using simultaneous adjustment for age and alcexp. . egen age_alc=group(age alcexp) . cs cc tobexp [freq=freq], by(age_alc) or age in years/alchol consumption | OR [95% Conf. Interval] --------------------------------+----------------------------------25-35 / 0-79 gms/day | . . . 25-35 / 80+ gms/day | 0 0 . 35-35 / 0-79 gms/day | .8888889 0 6.180248 35-35 / 80+ gms/day | 4.2 .5867284 30.99466 45-35 / 0-79 gms/day | 3.916084 1.535386 10.01481 45-35 / 80+ gms/day | 1.767857 .5572697 5.599932 55-35 / 0-79 gms/day | 2.903704 1.316374 6.419849 55-35 / 80+ gms/day | 2.2 .7067541 6.768626 65-35 / 0-79 gms/day | 1.689655 .6156516 4.661245 65-35 / 80+ gms/day | 6.071429 .8038395 . 75+ / 0-79 gms/day | 1.733333 .3163389 10.09754 75+ / 80+ gms/day | . . . --------------------------------+----------------------------------Crude | 1.960784 1.387991 2.770272 M-H combined | 2.382241 1.591432 3.566017 --------------------------------+----------------------------------Test of homogeneity (M-H) chi2(9) = 3.738 Pr>chi2 = 0.9278 Test that combined OR = 1: Mantel-Haenszel chi2(1) = Pr>chi2 = 0.0000 18.47 part(b): The summary odds ratio for getting cancer, holding age and alcohol consumption constant, is 2.38. The odds for getting cancer is 2.38 times greater for people who smoke 20+ cigarettes per day than those who smoke less than 20 cigarettes per day. part(c): By looking at the M-H test of homogeneity, we can conclude that the assumption of a common odds ratio is valid. The p-value is 0.9278, which provides no evidence that the odds ratios are different. 1. (e) Is there evidence that alcohol and tobacco consumption are associated? After adjustment for age? Why is it best to examine this association using the control population only? To check if alcohol and tobacco consumption are associated: . cc alcexp tobexp if cc==0 [freq=freq] | tobexp | Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+---------------------Exposed | 23 86 | 109 0.2110 Unexposed | 127 539 | 666 0.1907 -----------------+------------------------+---------------------Total | 150 625 | 775 0.1935 | | | Point estimate | [95% Conf. Interval] |------------------------+---------------------Odds ratio | 1.135049 | .6918794 1.863025 (Cornfield) +------------------------+---------------------chi2(1) = 0.25 Pr>chi2 = 0.6187 Since this is a 2x2 table, the X2 statisitic can be interpreted as the result of a X2 test of association, with H0: no association exists and H1: there is an association between alcohol and tobacco consumption. The statistic is not significant (p=0.62), so the conclusion is that there is no evidence for an association between alcohol and tobacco consumption in the controls. To check if alcohol and tobacco consumption are associated: . cc alcexp tobexp if cc==0 [freq=freq], by (age) age in years | OR [95% Conf. Interval] -----------------+----------------------------------25-34 | 4.772727 1.269772 17.91537 35-44 | .8465608 .3096488 2.330879 45-54 | 1.370629 .5427856 3.482617 55-64 | .9427609 .339913 2.636674 64-74 | .4117647 0 2.698141 75+ | . . . -----------------+----------------------------------Crude | 1.135049 .6918794 1.863025 M-H combined | 1.170469 .7067113 1.938553 -----------------+----------------------------------Test of homogeneity (M-H) chi2(4) = 5.47 Pr>chi2 = 0.2424 Test that combined OR = 1: Mantel-Haenszel chi2(1) = Pr>chi2 = 0.37 0.5410 To test if there is an association between alcohol and tobacco consumption after adjusting for age, first the M-H test of homogeneity is used. The test statisitic is not significant (p=0.24), indicating that there is no evidence to suggest that the OR’s are different within the age groups. The M-H test of association test statistic is also non-significant, indicating that there is no evidence that the OR’s within the age groups are different from 1. In conclusion, there is no evidence to suggest an association between alcohol and tobacco consumption in the control population, with and without adjusting for age. It is best to examine this association against the control population because the control population reflects the population the results of the study will be applied to. The diseased population is more likely to show an association between alcohol consumption and tobacco consumption. 2. (a) What would be the dependent variable in a logistic regression for the Ille-et-Vilaine data? Cancer (cc) is the dependent variable. 1. (b) Define (write down the equation for) a logistic regression model that would characterize the unadjusted (crude) odds ratio that was measured in question (a). pi(X) = expit(b0 + b1*X) = [exp(b0 + b1*tobexp)]/[1 + exp(b0 + b1*tobexp)] or logit[pi(X)] = bo + b1*tobexp 2. (c) Compute and interpret the estimated odds ratio for tobexp, with adjustment for age, and its 95% confidence limit. Compare the point and interval estimates to those obtained in question 1(b). Are they similar? Do they have similar interpretations? Why or Why not? OR = exp(0.83397) = 2.30 95% CI: [exp(0.455), exp(1.212)] = [1.58, 3.36] The odds ratio and 95% CI are the same as the point and interval estimate obtained in 1(b). The interpretation is identical; 2.30 is an estimator of the age-specific OR assumed constant in age. 3. (a) For each of the models fitted above, state the form of the logistic model that was used – stating the dependent variable, the interpretation of the probability pi(X), and the model for pi(X) in terms of the (unknown) population parameters and the independent variables. The dependent variable is CVD mortality (1=death from CVD, 0 otherwise). pi(X) is the probability of dying from Cardiovascular Disease within ten years for a group of subjects with covariate values X. The hypothesized models are: *Model 1: pi(X) = ( exp( b0 + b1*SOC + b2*SBP + b3*SMK + b4*SOC*SBP + .b5*SOC*SMK) ) / ( 1 + ( exp( b0 + b1*SOC + b2*SBP + b3*SMK + b4*SOC*SBP + b5*SOC*SMK) )) *Model 2: pi(X) = ( exp( b0 + b1*SOC + b2*SBP + b3*SMK) ) / (1 + ( exp( b0 + b1*SOC + b2*SBP + b3*SMK) )) The fitted models are: *Model 1: pi(X) = ( exp(-1.180 - .520*SOC + .040*SBP - .560*SMK - .033*SOC*SBP + .175*SOC*SMK) ) / ( 1 + ( exp(-1.180 - .520*SOC + .040*SBP .560*SMK - .033*SOC*SBP + .175*SOC*SMK) )) *Model 2: pi(X) = ( exp(-1.19 - .500*SOC + .010*SBP - .420*SMK) ) / (1 + ( exp(-1.19 - .500*SOC + .010*SBP - .420*SMK) )) 3. (b) For each of the models in (a) state the form of the estimated log odds functions: logit[pi(X)]= … Model 1: logit[pi(X)] = -1.180 - .520*SOC + .040*SBP - .560*SMK - .033*SOC*SBP + .175*SOC*SMK Model 2: logit[pi(X)] = -1.19 - .500*SOC + .010*SBP - .420*SMK 3. (c) Using model 1, compute the estimated risk for CVD death (i.e. CVD=1) for a high social class (SOC=1) smoker(SMK=1) with SBP=150 (person 1), and a low social class (SOC=0)smoker (SMK=1) with SBP=150 (person 2). What is the estimated relative risk comparing these individuals? High Social Class: pi(X) = ( exp(-1.180 - .520*1 + .040*150 - .560*1 - .033*1*150 + .175*1*1) ) / ( 1 + ( exp(-1.180 - .520*1 + .040*150 - .560*1 .033*1*150 + .175*1*1) )) = exp(-1.035) / (1 + exp(-1.035)) = 0.262 Low Social Class: pi(X) = exp(-1.180 + .040*150 -.56*1) / (1 + exp(-1.180 + .040*150 -.56*1)) = exp(4.26) / (1 + exp(4.26)) = .986 RR = .262 / . 986 = .2657 Smokers with a SBP of 150 in a high social class have .2657 times the risk of CVD mortality compared to smokers with a SBP of 150 in a low social class 3. (d) Repeat parts (c) using model 2. Why is the estimate so different? High Social Class: pi(X) = exp(-1.19 - .5*1 + .01*150 - .42) / (1 + exp(-1.19 - .5*1 + .01*150 - .42) ) = exp(-.61) / ( 1 + exp(-.61)) = .352 Low Social Class: pi(X) = exp(-1.19 + .01*150 - .42) / (1 + exp(-1.19 + .01*150 - .42)) = exp(-.11) / (1 + exp(-.11)) = .4725 RR = .352 / .4725 = .745 Smokers with a SBP of 150 in a high social class have .745 times the risk of CVD mortality compared to smokers with a SBP of 150 in a low social class The estimates are different due to the interactions terms in model 1. 3. (e) What is the estimated odds ratio comparing SOC=1 to SOC=0 for nonsmokers SMK=0 with SBP=150 under model 1 and under model 2. Model 1: OR = exp(-.52 - .033(150)) = exp(-5.47) = .0042 Model 2: OR = exp(-.5) = .606 3. (f) If the study design had been a case-control study (retrospective) which risk estimate would you report (RR or OR)? Justify. From case-control studies we can not estimate the disease relative risk (comparing exposed to unexposed). However, from this study design we can estimate the exposure odds ratio which equals the disease odds ratio. For rare diseases the OR approximates the RR.