Biost 536: Categorical Data Analysis in Epidemiology Emerson, Fall 2014 Homework #4 November 4, 2014 Written problems: To be submitted as a MS-Word compatible file to the class Catalyst dropbox by 11:30 pm on Sunday, November 9, 2014. See the instructions for peer grading of the homework that are posted on the web pages. Questions refer to analyses of the data in the file infarcts.txt that is located on the class webpages. We are interested in associations between prevalence of infarct-like lesions on MRI and various predictors. For this homework, we will presume that any missing data is missing completely at random (MCAR) in this dataset and hence ignorable. 1. Fit a logistic regression model investigating prevalence of infarcts as a function of age (modeled continuously) and coronary heart disease (modeled as dummy variables). Provide a scientific interpretation of each of the regression coefficients, including a description of the intercept in the model. (You do not need to describe the methods, or provide CI or p values.) Age Coefficient Interpretation: Among adults ≥65 years of the same coronary heart disease status enrolled in the MRI portion of the Cardiovascular Health Study, the odds of having an infarct on MRI increase by 5.3% with each increasing year of age (95% CI 3.8-6.8). There is significant evidence that we can reject the null of no difference in the odds of infarct-like lesion on MRI among different age groups (p-value <0.001). CHD Coefficient Interpretation: Among adults ≥65 years of the same age enrolled in the MRI portion of the Cardiovascular Health Study, there is significant evidence that we can reject the null that subjects with a history of angina have equivalent odds of having an infarct-like lesion on MRI (p-value=0.008). We can also reject the null that subjects with a history of myocardial infarction have equivalent odds of having an infarct-like lesion on MRI (p-value <0.001). Among adults of the same age, the odds of having an infarct-like lesion is 36.7% higher (95% CI 8.5-72.2) for those with a history of angina, and 81.6% higher (95% CI 43.5-29.9) for those with a history of myocardial infarction when either are compared to subjects with no history of coronary heart disease. Intercept Coefficient Interpretation: Among newborns (age 0) who have no history of coronary heart disease, the odds of having an infarct-like lesion on MRI is 0.8%. The meaning of the intercept is not relevant in this case because it is outside the limits of our data; we have no patients who are <65 years old. 2. Fit a logistic regression model investigating prevalence of infarcts as a function of age (modeled continuously), coronary heart disease (modeled as dummy variables), and their multiplicative interaction. Provide a scientific interpretation of each of the regression coefficients, including a description of the intercept in the model. (You do not need to describe the methods, or provide CI or p values.) Interaction Coefficient Interpretation: A graph of the fitted values from the model including effect modification of CHD history on the relationship between age and having an infarct-like lesion on MRI, it appears that the odds of having an infarct-like lesion on MRI does differ by age across the three CHD groups. However, our model may lack sufficient statistical power to detect any differences because we cannot reject the null hypothesis that the odds of having an infarct-like lesion on MRI when comparing subjects with angina to those with no history of CHD (p-value 0.950), and subjects with myocardial infarction to those with no history of CHD (p-value 0.442) differ by age. Therefore, an interaction term will not be included in the model for interpreting the other coefficients. Age Coefficient Interpretation: Among adults ≥65 years of the same coronary heart disease status enrolled in the MRI portion of the Cardiovascular Health Study, the odds of having an infarct on MRI increase by 5.3% with each increasing year of age (95% CI 3.8-6.8). There is significant evidence that we can reject the null of no difference in the odds of infarct-like lesion on MRI among different age groups (p-value <0.001). CHD Coefficient Interpretation: Among adults ≥65 years of the same age enrolled in the MRI portion of the Cardiovascular Health Study, there is significant evidence that we can reject the null that subjects with a history of angina have equivalent odds of having an infarct-like lesion on MRI (p-value=0.008). We can also reject the null that subjects with a history of myocardial infarction have equivalent odds of having an infarct-like lesion on MRI (p-value <0.001). Among adults of the same age, the odds of having an infarct-like lesion is 36.7% higher (95% CI 8.5-72.2) for those with a history of angina, and 81.6% higher (95% CI 43.5-29.9) for those with a history of myocardial infarction when either are compared to subjects with no history of coronary heart disease. Intercept Coefficient Interpretation: Among newborns (age 0) who have no history of coronary heart disease, the odds of having an infarct-like lesion on MRI is 0.8%. The meaning of the intercept is not relevant in this case because it is outside the limits of our data; we have no patients who are <65 years old. 3. Fit a logistic regression model that investigates the linearity of the association between the log odds of presence of infarcts and age, after adjustment for coronary heart disease. (Here you do need to describe your methods and results as they relate to the specific question.) Methods: The binary indicator of infarct-like lesion on MRI was analyzed using logistic regression on the continuous predictor age (in years) and dummy variable confounder CHD (no history, history of angina, history of myocardial infarction) in order to assess the log odds of having an infarct-like lesion across age groups. The logit regression slope was used to estimate the average linear trend in the log odds ratios associated with every 1 year difference in age. The estimate of the standard error of the regression parameters was used with asymptotic normal theory to compute a two-sided p-value from Wald test of association and to compute a 95% confidence interval. A 0.05 threshold was used for statistical significance. Inference: Logistic regression analysis estimates that when comparing two groups of the same CHD status, a one year change in age results in a statistically significant 0.051 unit change in the log of the odds (0.038-0.066, p-value <0.001). 4. Fit a logistic regression model that investigates whether there is a U-shaped association between the log odds of presence of infarcts and ldl, after adjustment for age. (Here you do need to describe your methods and results as they relate to the specific question.) Methods: The binary indicator of having an infarct-like lesion on MRI was analyzed using logistic regression on a polynomial model of low-density lipoprotein (ldl) that included both a linear continuous term and a term equal to ldl squared. Age in years was included in the model as a continuous confounding adjustment variable. Ratios of the odds of having an infarct-like lesion on MRI across ldl were then evaluated by simultaneously testing that both regression coefficients (ldl, ldl2) were equal to 0. The estimate of the standard error of the regression parameters was used with asymptotic normal theory to compute a two-sided p value from a likelihood ratio test of association. A hierarchical testing scheme was predefined such that in the presence of a statistically significant primary test for association, a secondary test for linearity of association would be performed using the coefficient for the squared term: if that coefficient for the squared term was significantly different from zero, that would be interpreted as evidence that the association between having an infarct-like lesion on MRI and ldl was not linear in across ldl levels. Because the overall test of association is used as a “gate-keeper” in this testing strategy, the experiment-wise type 1 error of the test for nonlinearity is preserved. Inference: Logistic regression analysis of the log odds of having an infarct-like lesion on MRI across multiple ages using a polynomial model estimates a significant association between having an infarct-like lesion on MRI and ldl (two sided p-value <0.001). Because we found a statistically significant association between having an infarct-like lesion on MRI and ldl, we further considered whether the regression model presented evidence of a nonlinear association. In that analysis, the regression coefficient for the squared term was statistically significant (two-sided p=0.042). This suggests that the association between lesion on MRI and ldl is not well-described by a purely linear relationship in the log odds ratio (see plot of fitted values vs ldl below).