4765grading4754 - Emerson Statistics

advertisement
Biost 536: Categorical Data Analysis in Epidemiology
Emerson, Fall 2014
Homework #4
November 4, 2014
Written problems: To be submitted as a MS-Word compatible file to the class Catalyst dropbox by
11:30 pm on Sunday, November 9, 2014. See the instructions for peer grading of the homework that are
posted on the web pages.
Questions refer to analyses of the data in the file infarcts.txt that is located on the class webpages.
We are interested in associations between prevalence of infarct-like lesions on MRI and various
predictors. For this homework, we will presume that any missing data is missing completely at random
(MCAR) in this dataset and hence ignorable.
1. Fit a logistic regression model investigating prevalence of infarcts as a function of age
(modeled continuously) and coronary heart disease (modeled as dummy variables).
Provide a scientific interpretation of each of the regression coefficients, including a
description of the intercept in the model. (You do not need to describe the methods, or
provide CI or p values.)
Age Coefficient Interpretation: Among adults ≥65 years of the same coronary heart
disease status enrolled in the MRI portion of the Cardiovascular Health Study, the odds
of having an infarct on MRI increase by 5.3% with each increasing year of age (95% CI
3.8-6.8). There is significant evidence that we can reject the null of no difference in the
odds of infarct-like lesion on MRI among different age groups (p-value <0.001).
CHD Coefficient Interpretation: Among adults ≥65 years of the same age enrolled in
the MRI portion of the Cardiovascular Health Study, there is significant evidence that we
can reject the null that subjects with a history of angina have equivalent odds of having
an infarct-like lesion on MRI (p-value=0.008). We can also reject the null that subjects
with a history of myocardial infarction have equivalent odds of having an infarct-like
lesion on MRI (p-value <0.001). Among adults of the same age, the odds of having an
infarct-like lesion is 36.7% higher (95% CI 8.5-72.2) for those with a history of angina,
and 81.6% higher (95% CI 43.5-29.9) for those with a history of myocardial infarction
when either are compared to subjects with no history of coronary heart disease.
Intercept Coefficient Interpretation: Among newborns (age 0) who have no history of
coronary heart disease, the odds of having an infarct-like lesion on MRI is 0.8%. The
meaning of the intercept is not relevant in this case because it is outside the limits of our
data; we have no patients who are <65 years old.
2. Fit a logistic regression model investigating prevalence of infarcts as a function of age
(modeled continuously), coronary heart disease (modeled as dummy variables), and their
multiplicative interaction. Provide a scientific interpretation of each of the regression
coefficients, including a description of the intercept in the model. (You do not need to
describe the methods, or provide CI or p values.)
Interaction Coefficient Interpretation: A graph of the fitted values from the model
including effect modification of CHD history on the relationship between age and having
an infarct-like lesion on MRI, it appears that the odds of having an infarct-like lesion on
MRI does differ by age across the three CHD groups. However, our model may lack
sufficient statistical power to detect any differences because we cannot reject the null
hypothesis that the odds of having an infarct-like lesion on MRI when comparing
subjects with angina to those with no history of CHD (p-value 0.950), and subjects with
myocardial infarction to those with no history of CHD (p-value 0.442) differ by age.
Therefore, an interaction term will not be included in the model for interpreting the other
coefficients.
Age Coefficient Interpretation: Among adults ≥65 years of the same coronary heart
disease status enrolled in the MRI portion of the Cardiovascular Health Study, the odds
of having an infarct on MRI increase by 5.3% with each increasing year of age (95% CI
3.8-6.8). There is significant evidence that we can reject the null of no difference in the
odds of infarct-like lesion on MRI among different age groups (p-value <0.001).
CHD Coefficient Interpretation: Among adults ≥65 years of the same age enrolled in
the MRI portion of the Cardiovascular Health Study, there is significant evidence that we
can reject the null that subjects with a history of angina have equivalent odds of having
an infarct-like lesion on MRI (p-value=0.008). We can also reject the null that subjects
with a history of myocardial infarction have equivalent odds of having an infarct-like
lesion on MRI (p-value <0.001). Among adults of the same age, the odds of having an
infarct-like lesion is 36.7% higher (95% CI 8.5-72.2) for those with a history of angina,
and 81.6% higher (95% CI 43.5-29.9) for those with a history of myocardial infarction
when either are compared to subjects with no history of coronary heart disease.
Intercept Coefficient Interpretation: Among newborns (age 0) who have no history of
coronary heart disease, the odds of having an infarct-like lesion on MRI is 0.8%. The
meaning of the intercept is not relevant in this case because it is outside the limits of our
data; we have no patients who are <65 years old.
3. Fit a logistic regression model that investigates the linearity of the association between
the log odds of presence of infarcts and age, after adjustment for coronary heart disease.
(Here you do need to describe your methods and results as they relate to the specific
question.)
Methods: The binary indicator of infarct-like lesion on MRI was analyzed using logistic
regression on the continuous predictor age (in years) and dummy variable confounder
CHD (no history, history of angina, history of myocardial infarction) in order to assess
the log odds of having an infarct-like lesion across age groups. The logit regression slope
was used to estimate the average linear trend in the log odds ratios associated with every
1 year difference in age. The estimate of the standard error of the regression parameters
was used with asymptotic normal theory to compute a two-sided p-value from Wald test
of association and to compute a 95% confidence interval. A 0.05 threshold was used for
statistical significance.
Inference: Logistic regression analysis estimates that when comparing two groups of the
same CHD status, a one year change in age results in a statistically significant 0.051 unit
change in the log of the odds (0.038-0.066, p-value <0.001).
4. Fit a logistic regression model that investigates whether there is a U-shaped association
between the log odds of presence of infarcts and ldl, after adjustment for age. (Here you
do need to describe your methods and results as they relate to the specific question.)
Methods: The binary indicator of having an infarct-like lesion on MRI was analyzed
using logistic regression on a polynomial model of low-density lipoprotein (ldl) that
included both a linear continuous term and a term equal to ldl squared. Age in years was
included in the model as a continuous confounding adjustment variable. Ratios of the
odds of having an infarct-like lesion on MRI across ldl were then evaluated by
simultaneously testing that both regression coefficients (ldl, ldl2) were equal to 0. The
estimate of the standard error of the regression parameters was used with asymptotic
normal theory to compute a two-sided p value from a likelihood ratio test of association.
A hierarchical testing scheme was predefined such that in the presence of a statistically
significant primary test for association, a secondary test for linearity of association would
be performed using the coefficient for the squared term: if that coefficient for the squared
term was significantly different from zero, that would be interpreted as evidence that the
association between having an infarct-like lesion on MRI and ldl was not linear in across
ldl levels. Because the overall test of association is used as a “gate-keeper” in this testing
strategy, the experiment-wise type 1 error of the test for nonlinearity is preserved.
Inference: Logistic regression analysis of the log odds of having an infarct-like lesion on
MRI across multiple ages using a polynomial model estimates a significant association
between having an infarct-like lesion on MRI and ldl (two sided p-value <0.001).
Because we found a statistically significant association between having an infarct-like
lesion on MRI and ldl, we further considered whether the regression model presented
evidence of a nonlinear association. In that analysis, the regression coefficient for the
squared term was statistically significant (two-sided p=0.042). This suggests that the
association between lesion on MRI and ldl is not well-described by a purely linear
relationship in the log odds ratio (see plot of fitted values vs ldl below).
Download