Biost 536: Categorical Data Analysis in Epidemiology Emerson, Fall 2014 Homework #2 October 5, 2014 Written problems: In all problems requesting “statistical analyses” (either descriptive or inferential), you should present both Methods: A brief sentence or paragraph describing the statistical methods you used. This should be using wording suitable for a scientific journal, though it might be a little more detailed. A reader should be able to reproduce your analysis. DO NOT PROVIDE Stata OR R CODE. Inference: A paragraph providing full statistical inference in answer to the question. Please see the supplementary document relating to “Reporting Associations” for details. For these questions, we will be considering adjustment for age and sex using both stratified and regression analyses. For the stratified analyses, it will be necessary to use an appropriate categorization of age. 1. We are interested in analyzing associations between 5 year mortality and prevalence of ASCVD at study enrollment using statistical methods appropriate for binary response variables. The observation time for death among these subjects is potentially subject to censoring. Provide a statistical analysis demonstrating that such methods as logistic regression can be used to answer this question. Methods: To evaluate the association between 5 year mortality and prevalence of ASCVD, logistic regression was used to estimate the binary dependent variable, odds of mortality in 5 years based on the binary predictor of interest, history of CHD (ASCVD). Robust standard errors were used to adjust for the heteroscedasticity of the binary outcome variable, mortality in 5 years. 95% confidence intervals were calculated using Wald type CI based on the approximate normal distribution for the maximum likelihood estimates for a bninomial distribution. Age, as a continuous variable, and sex, as a binary variable, were in included in the model to adjuste for age and sex differences in the relationship between ASCVD and 5-year mortality. To account for right-censored data (i.e. patients who die prior to 5-years), post hoc analyses was completed using tobit regression. Inference: Adjusted for sex and age, the odds of death in 5 years in individuals with a history of ASCVD is 3.57 times the odds of death in 5 years in individuals without a history of ASCVD (95% CI 2.36-5.39). Based on this data, we can reject the null hypothesis that there is equal likelihood of death in 5 years among individuals with and without a history of ASCVD (p-value <.001), after adjusting for both age and sex. **Noting the question below (3b), I am wondering, if despite the introductory paragraphs which indicate these should all be adjusted for age and sex, we were supposed to present unadjusted estimates here, that would be 4.0 (2.6-6.1), p-value <.001 2. Using the risk difference (RD) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cs or an equivalent analysis in R). Methods: A standard mantel-haenszel analysis was used to calculate stratumspecific risk estimates for the dependent variable, 5-year mortality, based on the binary POI, history of ASCVD, and adjusted for/stratified by the binary variable, sex, and the categorized variable, age (by 5 year increments). 95% confidence intervals were calculated using Wald type CI based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution and Exact methods were used to calculate the appropriate p-value due to small sample sizes in stratum-specific age categories. Inference: The absolute risk difference was 21.1% (95% 14.4-27.8%) in the stratum-specific adjusted model. That is to say, adjusted for sex and age, the absolute difference in the risk of death in 5 years in individuals with a history of ASCVD is 21.1% higher than the risk of death in 5 years in individuals without a history of ASCVD (95% CI 14.4%-27.8%). b. Answer the question using an appropriate regression model. Methods: To evaluate the association between 5 year mortality and prevalence of ASCVD, linear regression was used to estimate the absolute risk difference of mortality in 5 years by the binary predictor of interest, history of CHD (ASCVD). Robust standard errors were used to adjust for the heteroscedasticity of the binary outcome variable, mortality in 5 years. 95% confidence intervals were calculated using Wald type CI based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. Age, as a continuous variable, and sex, as a binary variable, were in included in the model to adjust for age and sex differences in the relationship between ASCVD and 5-year mortality. To account for right-censored data (i.e. patients who die prior to 5-years), post hoc analyses was completed using tobit regression. Inference: Adjusted for sex and age, the absolute difference in the risk of death in 5 years in individuals with a history of ASCVD is 18.9% higher than the risk of death in 5 years in individuals without a history of ASCVD (95% CI 12.2%25.7%). Based on this data, we can reject the null hypothesis that there is no difference in the risk of death in 5 years among individuals with and without a history of ASCVD (p-value <.001), after adjusting for both age and sex. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? The MH approach (former) weights the data by sample sizes, but ignores the mean-variance relationship in choosing those weights. The regression approach takes into account the mean-variance relationship when weighting the data. 3. Using the odds ratio (OR) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cc or an equivalent analysis in R). Methods: A standard mantel-haenszel analysis was used to calculate stratum-specific odds of mortality in 5 years based on the binary POI, history of ASCVD. Stratification by the binary variable, sex, and the categorized variable, age (by 5 year increments) was completed. 95% confidence intervals were calculated using Wald type CI based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution and Exact methods were used to calculate the appropriate p-value due to small sample sizes in stratum-specific age categories. Inference: Using this approach, adjusted for age, the odds of death in 5 years in females with a history of ASCVD is 3.36 times the odds of death in 5 years in females without a history of ASCVD (95% CI 1.64-6.81) and 4.02 times (95% CI 2.31-7.03) among males. Based on this data, we can reject the null hypothesis that there is equal likelihood of death in 5 years among both males and females with and without a history of ASCVD (pvalue <.001), after adjusting for age. (I co.uld not figure out the stata commands to calculate these together. Cc and istandard and estandard or given wgts did not work) b. Answer the question using an appropriate regression model. Methods: To evaluate the association between 5 year mortality and prevalence of ASCVD, logistic regression was used to estimate the binary dependent variable, odds of mortality in 5 years based on the binary predictor of interest, history of CHD (ASCVD). Robust standard errors were used to adjust for the heteroscedasticity of the binary outcome variable, mortality in 5 years. 95% confidence intervals were calculated using Wald type CI based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. Age, as a continuous variable, and sex, as a binary variable, were in included in the model to adjust for age and sex differences in the relationship between ASCVD and 5-year mortality. Inference: Adjusted for sex and age, the odds of death in 5 years in individuals with a history of ASCVD is 3.57 times the odds of death in 5 years in individuals without a history of ASCVD (95% CI 2.36-5.39). Based on this data, we can reject the null hypothesis that there is equal likelihood of death in 5 years among individuals with and without a history of ASCVD (p-value <.001), after adjusting for both age and sex. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? The MH approach weights the data by sample sizes, but ignores the mean-variance relationship in choosing those weights. The regression approach takes into account the mean-variance relationship when weighting the data. 4. Using the risk difference (RR) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command ir or an equivalent analysis in R). Methods: A standard mantel-haenszel analysis was used to calculate stratum-specific odds of mortality in 5 years based on the binary POI, history of ASCVD. Stratification by the binary variable, sex, and the categorized variable, age (by 5 year increments) was completed. 95% confidence intervals were calculated using Wald type CI based on the approximate normal distribution for the maximum likelihood estimates for a binomial distribution. Inference: Using this approach, adjusted for age, the relative risk of death in 5 years in females with a history of ASCVD is 2.5 times the odds of death in 5 years in female without a history of ASCVD (95% CI 1.64-6.81) among females and 3.16 times (95% CI 2.04-5.38) in males. Based on this data, we can reject the null hypothesis that there is equal likelihood of death in 5 years among both males and females with and without a history of ASCVD (p-value <.001), after adjusting for age. (As above, with MH analyses, I could not find the stata commands to calculate point estimates with more than 1 stratification.) b. Answer the question using an appropriate regression model. Methods: A poisson regression was used to calculate the relative risk of the dependent binary variable, death in 5 years, in individuals with a history of ASCVD compared to those without, adjusting for the linear variable, age, and the binary variable, sex. A poisson regression was chosen assuming its approximation to the binomial in instances of a rare outcome. Robust standard errors was used to account for the mean-variance relationship. Inference: Adjusting for age and sex, the relative risk of death in 5 years in individuals with a history of ACSVD is 2.72 (95% CI 1.89-3.91) times the risk of death in 5 years among individuals without a history of ACSVD. These data provide evidence to reject the null hypothesis that there is no difference in risk of death based on ASCVD status (p-value <.001) c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? The poisson regression approach accounts for the mean-variance relationship with robust SE estimates, while the MH technique ignore the mean-variance relationship of binary variables. 5. Comment very briefly on the similarity or differences among the three approaches. Which would you tend to prefer in general? Why? They all estimate differential degrees of risk. I prefer the RR or OR from a clinician perspective as patients can understand “relative risks” easily. I think RD help from an epidemiologic and public health/policy perspective. I tend to prefer the regression approach to analyses in that you don’t lose characteristics of the data in categorization of linear variables. Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the U.S. as a function of birthplace (U.S. born vs foreign born) (see datafile surveillance.txt and documentation surveillance.doc on the class web pages). 6. Using the incidence ratio as a measure of association, provide inference for an association between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and SEER.. a. Answer the question using directly standardized rates, with standardization to the U.S. population. Methods: Using stratified analyses an incident rate ratio was calculated, in addition to 95% confidence intervals for the incident-rate ratio, for the incidence of CRC in whites living in the US as a function of birthplace. An incident rate was chosen as the estimand of interest because of the interest in count data and a person-year denominator. Strata of interest included age and sex. The data was weighted with the sum over the values within strata. Inference: The standardized incident rate ratio (IRR) between CRC incidence among US-born individuals as compared to foreign-born individuals was 1.02 (95% CI .99-1.05) over combined sex strata. These data provide reason to believe we should fail to reject the null hypothesis that there is no difference in incidence rates among all individuals when adjusted for age. However, when stratified by sex, the IRR for males was 1.05 (95% CI 1.00-1.09) and the IRR for females was .99 (95% CI .95-1.03), again, providing us reason to believe we should fail to reject the null hypothesis across sex strata, in addition to within combined sex strata. b. Answer the question using an appropriate regression model. Methods: A poisson regression was used to calculate IRR of our outcome of interest, CRC incidence, based on our binary predictor of interest, birthplace, while adjusted for covariates age, sex, and SEER site. 95% Wald type CI were calculated with 3 degrees of freedom. Inference: After adjusting for age, sex, and SEER site, the incidence of CRC in foreign born individuals is .99 times that of the incidence of CRC in US born individuals (95% CI .89-1.09). These data do not provide evidence to support our (alternative) hypothesis and reject the null (that there is no difference). Therefore, we fail to reject the null (p-value = .811). c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? The difference is that the standardized weights from the stratified analysis computed an average of the ratio between the weighted average of US and foreign born CRC incidence rates. Conversely, the poisson regression used a log link to first estimate the rates, then averaged the differences on the log scale, then exponentiated that difference to estimate the geometric mean (rather than arithmetic mean) of the stratum specific IRRs. In the first example above, I average over possible EM of sex in the first part, though when completed for individual sex strata, there was not meaningful difference in adjusted IRR. Hence, averaging over EM of sex, as done in poisson regression, is not a scientifically meaningful issue here, and I will thus conclude it is “ok”—the poisson regression is an acceptable method to this analysis.