2571

advertisement
2571
Biost 536 (Autumn 2014)
Instructor: Scott Emerson
Homework 2
Questions 1-5 refer to analyses of the data in the file mri.txt that is located on the class webpages. In
those questions we are interested in associations between 5 year mortality and prevalence of
atherosclerotic cardiovascular disease (ASCVD) as defined by history of prior angina, myocardial
infarction, transient ischemic attacks, or stroke. You will likely find it useful to create a new variable
indicating ASCVD. This variable can be derived from the chd and stroke variables.
For these questions, we will be considering adjustment for age and sex using both stratified and
regression analyses. For the stratified analyses, it will be necessary to use an appropriate categorization
of age.
1. We are interested in analyzing associations between 5 year mortality and prevalence of ASCVD
at study enrollment using statistical methods appropriate for binary response variables. The
observation time for death among these subjects is potentially subject to censoring. Provide a
statistical analysis demonstrating that such methods as logistic regression can be used to answer
this question.
Methods: The prevalence of ASCVD at study enrollment was compared using the odds ratio
comparing the odds of 5 year mortality among subjects with ASCVD to the odds of 5 year
mortality among those without ASCVD. The odds ratio and its 95% confidence interval were
calculated using a Wald statistic and its standard error based on the approximate normal
distribution for the maximum likelihood estimates for a binomial distribution. A corresponding
two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality
between ASCVD arms.
Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of
465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. We estimate
that the odds of death within 5 years among those with ASCVD is 4.00 times higher than it is in
those without ASCVD. Based on a 95% confidence interval, such an observation is consistent
with the true odds of 5 year mortality being anywhere between 2.67 and 6.00 times higher
among subjects with ASCVD than those without. We therefore reject the null hypothesis of
equal likelihood of 5 year mortality between subjects with and without ASCVD (two-sided
p-value < 0.001).
2. Using the risk difference (RD) as a measure of association, provide statistical inference regarding
an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and
sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cs or an
equivalent analysis in R).
Methods: The prevalence of ASCVD at study enrollment was compared using the risk
difference comparing the probability of 5 year mortality among subjects with ASCVD
minus the probability of 5 year mortality among those of the same age category and sex
without ASCVD. Age categories were defined as 65-70, 71-75, 76-80, 81-85, and 86-99.
Stratum weights were determined according to ASCVD distribution. The age-and-sex-
adjusted difference and its 95% confidence interval were calculated using a Wald
statistic and its standard error based on the approximate normal distribution for the
maximum likelihood estimates for a binomial distribution. A two-sided p-value was
computed to test the null hypothesis of no difference in 5 year mortality between
ASCVD arms using the chi-squared test.
Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed
in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without
ASCVD. After adjusting for age and sex, we estimate that 5 year mortality is an absolute
difference of 19.3% higher among those with ASCVD than it is in those of the same age
category and sex without ASCVD. Based on a 95% confidence interval, such an
observation is consistent with a true difference of 5 year mortality being anywhere
between 12.5% and 26.0% higher among subjects with ASCVD compared to those
without ASCVD of the same sex and age category. We therefore reject the null
hypothesis of equal likelihood of 5 year mortality between subjects of the same sex and
age category with and without ASCVD (two-sided p-value < 0.01).
b. Answer the question using an appropriate regression model.
Methods: The prevalence of ASCVD at study enrollment was compared using adjusted
robust linear regression to model difference in risk of 5 year mortality among subjects
with ASCVD to those of the same age and sex without ASCVD. The age-and-sex-adjusted
difference and its 95% confidence interval were calculated using a Wald statistic and its
standard error based on the approximate normal distribution for the maximum
likelihood estimates for a binomial distribution. A corresponding two-sided p-value was
computed to test the null hypothesis of no difference in 5 year mortality between
ASCVD arms.
Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed
in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without
ASCVD. After adjusting for age and sex, we estimate that 5 year mortality is an absolute
difference of 18.9% higher among those with ASCVD than it is in those of the same age
and sex without ASCVD. Based on a 95% confidence interval, such an observation is
consistent with a true difference of 5 year mortality being anywhere between 12.1%
and 25.7% higher among subjects with ASCVD compared to those without ASCVD of the
same sex and age. We therefore reject the null hypothesis of equal likelihood of 5 year
mortality between subjects of the same sex and age with and without ASCVD (two-sided
p-value < 0.001).
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer: Differences in results obtained from analyses A and B can be attributed to the
categorization of age in analysis A (I kept age as a continuous variable in my regression
model for analysis B). Moreover, within my regression model, analysis B involves a
borrowing of information across groups, which cannot be said for analysis A. Finally,
stratified analyses automatically adjust for interactions between variables, whereas I
chose not to adjust for interaction in my regression model.
3. Using the odds ratio (OR) as a measure of association, provide statistical inference regarding an
association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and
sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cc or an
equivalent analysis in R).
Methods: The prevalence of ASCVD at study enrollment was compared using the odds
ratio comparing the odds of 5 year mortality among subjects with ASCVD to the odds of
5 year mortality among those of the same age category and sex without ASCVD. Age
categories were defined as 65-70, 71-75, 76-80, 81-85, and 86-99. The Mantel-Haenszel
statistic was used to determine stratum weights. The age-and-sex-adjusted odds ratio
and its 95% confidence interval were calculated using a Wald statistic and its standard
error based on the approximate normal distribution for the maximum likelihood
estimates for a binomial distribution. A two-sided p-value was computed to test the null
hypothesis of no difference in 5 year mortality between ASCVD arms using the chisquared test.
Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed
in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without
ASCVD. We estimate that the odds of death within 5 years among those with ASCVD is
3.50 times higher than it is in those subjects of the same age category and sex without
ASCVD. Based on a 95% confidence interval, such an observation is consistent with the
true odds of 5 year mortality being anywhere between 2.31 and 5.29 times higher
among subjects with ASCVD compared to those of the same sex and age category
without ASCVD. We therefore reject the null hypothesis of equal likelihood of 5 year
mortality between subjects with and without ASCVD (two-sided p-value < 0.001).
b. Answer the question using an appropriate regression model.
Methods: The prevalence of ASCVD at study enrollment was compared using adjusted
logistic regression to model the odds ratio comparing the odds of 5 year mortality
among subjects with ASCVD to the odds of 5 year mortality among those of the same
age and sex without ASCVD. The age-and-sex-adjusted odds ratio and its 95%
confidence interval were calculated using a Wald statistic and its standard error based
on the approximate normal distribution for the maximum likelihood estimates for a
binomial distribution. A corresponding two-sided p-value was computed to test the null
hypothesis of no difference in 5 year mortality between ASCVD arms.
Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed
in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without
ASCVD. We estimate that the odds of death within 5 years among those with ASCVD is
3.57 times higher than it is in those subjects of the same age and sex without ASCVD.
Based on a 95% confidence interval, such an observation is consistent with the true
odds of 5 year mortality being anywhere between 2.36 and 5.38 times higher among
subjects with ASCVD compared to those of the same age and sex without ASCVD. We
therefore reject the null hypothesis of equal likelihood of 5 year mortality between
subjects with and without ASCVD (two-sided p-value < 0.001).
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer: Differences in results obtained from analyses A and B can be attributed to the
categorization of age in analysis A (I kept age as a continuous variable in my regression
model for analysis B). Moreover, within my regression model, analysis B involves a
borrowing of information across groups, which cannot be said for analysis A. Finally,
stratified analyses automatically adjust for interactions between variables, whereas I
chose not to adjust for interaction in my regression model.
4. Using the relative risk (RR) as a measure of association, provide statistical inference regarding an
association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and
sex.
a. Answer the question using a stratified analysis (e.g., using Stata command ir or an
equivalent analysis in R).
Methods: The prevalence of ASCVD at study enrollment was compared using the
relative risk comparing the probability of 5 year mortality among subjects with ASCVD to
the probability of 5 year mortality among those of the same age category and sex
without ASCVD. Age categories were defined as 65-70, 71-75, 76-80, 81-85, and 86-99.
The Mantel-Haenszel statistic was used to determine stratum weights. The age-and-sexadjusted relative risk and its 95% confidence interval were calculated using a Wald
statistic and its standard error based on the approximate normal distribution for the
maximum likelihood estimates for a binomial distribution. A two-sided p-value was
computed to test the null hypothesis of no difference in 5 year mortality between
ASCVD arms using the chi-squared test.
Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed
in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without
ASCVD. We estimate that the risk of death within 5 years among those with ASCVD is
2.63 times higher than it is in those subjects of the same age category and sex without
ASCVD. Based on a 95% confidence interval, such an observation is consistent with the
true risk of 5 year mortality being anywhere between 1.92 and 3.62 times higher among
subjects with ASCVD compared to those of the same sex and age category without
ASCVD. We therefore reject the null hypothesis of equal likelihood of 5 year mortality
between subjects with and without ASCVD (two-sided p-value < 0.01).
b. Answer the question using an appropriate regression model.
Methods: The prevalence of ASCVD at study enrollment was compared using adjusted
Poisson regression to model the relative risk comparing the probability of 5 year
mortality among subjects with ASCVD to the probability of 5 year mortality among those
of the same age and sex without ASCVD. The age-and-sex-adjusted relative risk and its
95% confidence interval were calculated using a Wald statistic and its standard error
based on the approximate normal distribution for the maximum likelihood estimates for
a binomial distribution. A corresponding two-sided p-value was computed to test the
null hypothesis of no difference in 5 year mortality between ASCVD arms.
Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed
in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without
ASCVD. We estimate that the risk of death within 5 years among those with ASCVD is
2.72 times higher than it is in those subjects of the same age category and sex without
ASCVD. Based on a 95% confidence interval, such an observation is consistent with the
true risk of 5 year mortality being anywhere between 1.89 and 3.91 times higher among
subjects with ASCVD compared to those of the same sex and age category without
ASCVD. We therefore reject the null hypothesis of equal likelihood of 5 year mortality
between subjects with and without ASCVD (two-sided p-value < 0.001).
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer: Differences in results obtained from analyses A and B can be attributed to the
categorization of age in analysis A (I kept age as a continuous variable in my regression
model for analysis B). Moreover, within my regression model, analysis B involves a
borrowing of information across groups, which cannot be said for analysis A. Finally,
stratified analyses automatically adjust for interactions between variables, whereas I
chose not to adjust for interaction in my regression model.
5. Comment very briefly on the similarity or differences among the three approaches. Which
would you tend to prefer in general? Why?
Answer: The stratified analyses performed in questions 2, 3, and 4 are similar in that age was
treated as a categorical variable and there was no borrowing of information across stratum. The
regression analyses performed in questions 2, 3, and 4 are similar in that age was treated as a
continuous variable, and borrowing of information occurred between stratum. A priori, I would
have chosen to model the risk difference using a robust linear regression model (analysis 2b), to
maximize the amount of information given by our stratum. Moreover, there are potential issues
of collapsibility (and additive process) when dealing with odds ratios or risk ratios (multiplicative
processes).
Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the U.S. as a
function of birthplace (U.S. born vs foreign born) (see datafile surveillance.txt and documentation
surveillance.doc on the class web pages).
6. Using the incidence ratio as a measure of association, provide inference for an association
between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and SEER.
a. Answer the question using directly standardized rates, with standardization to the U.S.
population.
Methods: The incidence of colorectal cancer among whites living in the U.S. was
compared between foreign born versus U.S. born subjects using the number of
colorectal cancer diagnoses made at various SEER sites over the course of 12 years and
adjusting for age, sex, and SEER site. To adjust for varying lengths of follow-up within
each SEER site, observation time was converted to person-years at risk. Note that the
number of colorectal cancers is negligible compared to the overall population, so we did
not subtract out people once they were diagnosed with colorectal cancer. Age was
measured in five year categories (e.g. 0-5 years, 5-10 years, etc.). Incidence rates were
weighted according to directly standardized rates, with standardization to the U.S.
population.
Results: Among the 972 subjects enrolled in this study, birthplace was known in 324
subjects born in the U.S. and 324 subjects born outside of the U.S. We estimate that the
incidence of colorectal cancer among subjects with known birthplace is 1.02 times
higher among foreign born subjects than it is in subjects of the same age, sex, and SEER
site who were U.S. born. Based on a 95% confidence interval, such an observation is
consistent with the true incidence of colorectal cancer being anywhere between 0.987
times lower to 1.046 times higher among foreign born subjects compared to those of
the same age, sex, and SEER site who were U.S. born. We therefore fail to reject the null
hypothesis of no difference in colorectal cancer incidence by subjects’ birthplace (twosided p-value > 0.05).
b. Answer the question using an appropriate regression model.
Methods: The incidence of colorectal cancer among whites living in the U.S. was
compared between foreign born versus U.S. born subjects using a robust Poisson
regression model adjusting for age, sex, and SEER site. To adjust for varying lengths of
follow-up within each SEER site, observation time was converted to person-years at risk.
Note that the number of colorectal cancers is negligible compared to the overall
population, so we did not subtract out people once they were diagnosed with colorectal
cancer. Age was measured in five year categories (e.g. 0-5 years, 5-10 years, etc.).
Statistical inference on the ratio of the probabilities of colorectal cancer was based on
the Wald statistic computed from the regression slope parameter and its standard error
as estimated using the Huber-White sandwich estimator. A two-sided p-value and 95%
confidence interval computed using the approximate normal distribution for Poisson
regression parameter estimates.
Results: Among the 972 subjects enrolled in this study, birthplace was known in 324
subjects born in the U.S. and 324 subjects born outside of the U.S. We estimate that the
incidence of colorectal cancer among subjects with known birthplace is 0.988 times
lower among foreign born subjects than it is in subjects of the same age, sex, and SEER
site who were U.S. born. Based on a 95% confidence interval, such an observation is
consistent with the true incidence of colorectal cancer being anywhere between 0.892
time lower to 1.09 times higher among foreign born subjects compared to those of the
same age, sex, and SEER site who were U.S. born. We therefore fail to reject the null
hypothesis of no difference in colorectal cancer incidence by subjects’ birthplace (twosided p-value = 0.811).
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer: The models from parts A and B both return similar, insignificant results. The
main difference between the two analyses can be attributed to the inherent
consideration of interaction terms between strata in model A. We chose to ignore such
interaction terms in model B.
Download