2571 Biost 536 (Autumn 2014) Instructor: Scott Emerson Homework 2 Questions 1-5 refer to analyses of the data in the file mri.txt that is located on the class webpages. In those questions we are interested in associations between 5 year mortality and prevalence of atherosclerotic cardiovascular disease (ASCVD) as defined by history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. You will likely find it useful to create a new variable indicating ASCVD. This variable can be derived from the chd and stroke variables. For these questions, we will be considering adjustment for age and sex using both stratified and regression analyses. For the stratified analyses, it will be necessary to use an appropriate categorization of age. 1. We are interested in analyzing associations between 5 year mortality and prevalence of ASCVD at study enrollment using statistical methods appropriate for binary response variables. The observation time for death among these subjects is potentially subject to censoring. Provide a statistical analysis demonstrating that such methods as logistic regression can be used to answer this question. Methods: The prevalence of ASCVD at study enrollment was compared using the odds ratio comparing the odds of 5 year mortality among subjects with ASCVD to the odds of 5 year mortality among those without ASCVD. The odds ratio and its 95% confidence interval were calculated using a Wald statistic and its standard error based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. A corresponding two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality between ASCVD arms. Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. We estimate that the odds of death within 5 years among those with ASCVD is 4.00 times higher than it is in those without ASCVD. Based on a 95% confidence interval, such an observation is consistent with the true odds of 5 year mortality being anywhere between 2.67 and 6.00 times higher among subjects with ASCVD than those without. We therefore reject the null hypothesis of equal likelihood of 5 year mortality between subjects with and without ASCVD (two-sided p-value < 0.001). 2. Using the risk difference (RD) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cs or an equivalent analysis in R). Methods: The prevalence of ASCVD at study enrollment was compared using the risk difference comparing the probability of 5 year mortality among subjects with ASCVD minus the probability of 5 year mortality among those of the same age category and sex without ASCVD. Age categories were defined as 65-70, 71-75, 76-80, 81-85, and 86-99. Stratum weights were determined according to ASCVD distribution. The age-and-sex- adjusted difference and its 95% confidence interval were calculated using a Wald statistic and its standard error based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. A two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality between ASCVD arms using the chi-squared test. Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. After adjusting for age and sex, we estimate that 5 year mortality is an absolute difference of 19.3% higher among those with ASCVD than it is in those of the same age category and sex without ASCVD. Based on a 95% confidence interval, such an observation is consistent with a true difference of 5 year mortality being anywhere between 12.5% and 26.0% higher among subjects with ASCVD compared to those without ASCVD of the same sex and age category. We therefore reject the null hypothesis of equal likelihood of 5 year mortality between subjects of the same sex and age category with and without ASCVD (two-sided p-value < 0.01). b. Answer the question using an appropriate regression model. Methods: The prevalence of ASCVD at study enrollment was compared using adjusted robust linear regression to model difference in risk of 5 year mortality among subjects with ASCVD to those of the same age and sex without ASCVD. The age-and-sex-adjusted difference and its 95% confidence interval were calculated using a Wald statistic and its standard error based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. A corresponding two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality between ASCVD arms. Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. After adjusting for age and sex, we estimate that 5 year mortality is an absolute difference of 18.9% higher among those with ASCVD than it is in those of the same age and sex without ASCVD. Based on a 95% confidence interval, such an observation is consistent with a true difference of 5 year mortality being anywhere between 12.1% and 25.7% higher among subjects with ASCVD compared to those without ASCVD of the same sex and age. We therefore reject the null hypothesis of equal likelihood of 5 year mortality between subjects of the same sex and age with and without ASCVD (two-sided p-value < 0.001). c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: Differences in results obtained from analyses A and B can be attributed to the categorization of age in analysis A (I kept age as a continuous variable in my regression model for analysis B). Moreover, within my regression model, analysis B involves a borrowing of information across groups, which cannot be said for analysis A. Finally, stratified analyses automatically adjust for interactions between variables, whereas I chose not to adjust for interaction in my regression model. 3. Using the odds ratio (OR) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cc or an equivalent analysis in R). Methods: The prevalence of ASCVD at study enrollment was compared using the odds ratio comparing the odds of 5 year mortality among subjects with ASCVD to the odds of 5 year mortality among those of the same age category and sex without ASCVD. Age categories were defined as 65-70, 71-75, 76-80, 81-85, and 86-99. The Mantel-Haenszel statistic was used to determine stratum weights. The age-and-sex-adjusted odds ratio and its 95% confidence interval were calculated using a Wald statistic and its standard error based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. A two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality between ASCVD arms using the chisquared test. Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. We estimate that the odds of death within 5 years among those with ASCVD is 3.50 times higher than it is in those subjects of the same age category and sex without ASCVD. Based on a 95% confidence interval, such an observation is consistent with the true odds of 5 year mortality being anywhere between 2.31 and 5.29 times higher among subjects with ASCVD compared to those of the same sex and age category without ASCVD. We therefore reject the null hypothesis of equal likelihood of 5 year mortality between subjects with and without ASCVD (two-sided p-value < 0.001). b. Answer the question using an appropriate regression model. Methods: The prevalence of ASCVD at study enrollment was compared using adjusted logistic regression to model the odds ratio comparing the odds of 5 year mortality among subjects with ASCVD to the odds of 5 year mortality among those of the same age and sex without ASCVD. The age-and-sex-adjusted odds ratio and its 95% confidence interval were calculated using a Wald statistic and its standard error based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. A corresponding two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality between ASCVD arms. Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. We estimate that the odds of death within 5 years among those with ASCVD is 3.57 times higher than it is in those subjects of the same age and sex without ASCVD. Based on a 95% confidence interval, such an observation is consistent with the true odds of 5 year mortality being anywhere between 2.36 and 5.38 times higher among subjects with ASCVD compared to those of the same age and sex without ASCVD. We therefore reject the null hypothesis of equal likelihood of 5 year mortality between subjects with and without ASCVD (two-sided p-value < 0.001). c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: Differences in results obtained from analyses A and B can be attributed to the categorization of age in analysis A (I kept age as a continuous variable in my regression model for analysis B). Moreover, within my regression model, analysis B involves a borrowing of information across groups, which cannot be said for analysis A. Finally, stratified analyses automatically adjust for interactions between variables, whereas I chose not to adjust for interaction in my regression model. 4. Using the relative risk (RR) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command ir or an equivalent analysis in R). Methods: The prevalence of ASCVD at study enrollment was compared using the relative risk comparing the probability of 5 year mortality among subjects with ASCVD to the probability of 5 year mortality among those of the same age category and sex without ASCVD. Age categories were defined as 65-70, 71-75, 76-80, 81-85, and 86-99. The Mantel-Haenszel statistic was used to determine stratum weights. The age-and-sexadjusted relative risk and its 95% confidence interval were calculated using a Wald statistic and its standard error based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. A two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality between ASCVD arms using the chi-squared test. Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. We estimate that the risk of death within 5 years among those with ASCVD is 2.63 times higher than it is in those subjects of the same age category and sex without ASCVD. Based on a 95% confidence interval, such an observation is consistent with the true risk of 5 year mortality being anywhere between 1.92 and 3.62 times higher among subjects with ASCVD compared to those of the same sex and age category without ASCVD. We therefore reject the null hypothesis of equal likelihood of 5 year mortality between subjects with and without ASCVD (two-sided p-value < 0.01). b. Answer the question using an appropriate regression model. Methods: The prevalence of ASCVD at study enrollment was compared using adjusted Poisson regression to model the relative risk comparing the probability of 5 year mortality among subjects with ASCVD to the probability of 5 year mortality among those of the same age and sex without ASCVD. The age-and-sex-adjusted relative risk and its 95% confidence interval were calculated using a Wald statistic and its standard error based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. A corresponding two-sided p-value was computed to test the null hypothesis of no difference in 5 year mortality between ASCVD arms. Results: Among the 735 subjects enrolled in this study, 5 year mortality was witnessed in 53 of 465 subjects (11.4%) with ASCVD and 68 of 149 subjects (45.6%) without ASCVD. We estimate that the risk of death within 5 years among those with ASCVD is 2.72 times higher than it is in those subjects of the same age category and sex without ASCVD. Based on a 95% confidence interval, such an observation is consistent with the true risk of 5 year mortality being anywhere between 1.89 and 3.91 times higher among subjects with ASCVD compared to those of the same sex and age category without ASCVD. We therefore reject the null hypothesis of equal likelihood of 5 year mortality between subjects with and without ASCVD (two-sided p-value < 0.001). c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: Differences in results obtained from analyses A and B can be attributed to the categorization of age in analysis A (I kept age as a continuous variable in my regression model for analysis B). Moreover, within my regression model, analysis B involves a borrowing of information across groups, which cannot be said for analysis A. Finally, stratified analyses automatically adjust for interactions between variables, whereas I chose not to adjust for interaction in my regression model. 5. Comment very briefly on the similarity or differences among the three approaches. Which would you tend to prefer in general? Why? Answer: The stratified analyses performed in questions 2, 3, and 4 are similar in that age was treated as a categorical variable and there was no borrowing of information across stratum. The regression analyses performed in questions 2, 3, and 4 are similar in that age was treated as a continuous variable, and borrowing of information occurred between stratum. A priori, I would have chosen to model the risk difference using a robust linear regression model (analysis 2b), to maximize the amount of information given by our stratum. Moreover, there are potential issues of collapsibility (and additive process) when dealing with odds ratios or risk ratios (multiplicative processes). Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the U.S. as a function of birthplace (U.S. born vs foreign born) (see datafile surveillance.txt and documentation surveillance.doc on the class web pages). 6. Using the incidence ratio as a measure of association, provide inference for an association between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and SEER. a. Answer the question using directly standardized rates, with standardization to the U.S. population. Methods: The incidence of colorectal cancer among whites living in the U.S. was compared between foreign born versus U.S. born subjects using the number of colorectal cancer diagnoses made at various SEER sites over the course of 12 years and adjusting for age, sex, and SEER site. To adjust for varying lengths of follow-up within each SEER site, observation time was converted to person-years at risk. Note that the number of colorectal cancers is negligible compared to the overall population, so we did not subtract out people once they were diagnosed with colorectal cancer. Age was measured in five year categories (e.g. 0-5 years, 5-10 years, etc.). Incidence rates were weighted according to directly standardized rates, with standardization to the U.S. population. Results: Among the 972 subjects enrolled in this study, birthplace was known in 324 subjects born in the U.S. and 324 subjects born outside of the U.S. We estimate that the incidence of colorectal cancer among subjects with known birthplace is 1.02 times higher among foreign born subjects than it is in subjects of the same age, sex, and SEER site who were U.S. born. Based on a 95% confidence interval, such an observation is consistent with the true incidence of colorectal cancer being anywhere between 0.987 times lower to 1.046 times higher among foreign born subjects compared to those of the same age, sex, and SEER site who were U.S. born. We therefore fail to reject the null hypothesis of no difference in colorectal cancer incidence by subjects’ birthplace (twosided p-value > 0.05). b. Answer the question using an appropriate regression model. Methods: The incidence of colorectal cancer among whites living in the U.S. was compared between foreign born versus U.S. born subjects using a robust Poisson regression model adjusting for age, sex, and SEER site. To adjust for varying lengths of follow-up within each SEER site, observation time was converted to person-years at risk. Note that the number of colorectal cancers is negligible compared to the overall population, so we did not subtract out people once they were diagnosed with colorectal cancer. Age was measured in five year categories (e.g. 0-5 years, 5-10 years, etc.). Statistical inference on the ratio of the probabilities of colorectal cancer was based on the Wald statistic computed from the regression slope parameter and its standard error as estimated using the Huber-White sandwich estimator. A two-sided p-value and 95% confidence interval computed using the approximate normal distribution for Poisson regression parameter estimates. Results: Among the 972 subjects enrolled in this study, birthplace was known in 324 subjects born in the U.S. and 324 subjects born outside of the U.S. We estimate that the incidence of colorectal cancer among subjects with known birthplace is 0.988 times lower among foreign born subjects than it is in subjects of the same age, sex, and SEER site who were U.S. born. Based on a 95% confidence interval, such an observation is consistent with the true incidence of colorectal cancer being anywhere between 0.892 time lower to 1.09 times higher among foreign born subjects compared to those of the same age, sex, and SEER site who were U.S. born. We therefore fail to reject the null hypothesis of no difference in colorectal cancer incidence by subjects’ birthplace (twosided p-value = 0.811). c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: The models from parts A and B both return similar, insignificant results. The main difference between the two analyses can be attributed to the inherent consideration of interaction terms between strata in model A. We chose to ignore such interaction terms in model B.