Code: 2542 Categorical Analysis Homework #2 Due: Sunday, October 12 by 5:30 pm Question 1. Methods: The data were stratified on mortality and the following descriptive statistics mean, standard deviation, minimum, and maximum observation time for each group is reported in Table 1. Observation time was the period between enrollment either death or September 16, 1997, whichever came first. Table 1: Comparison of mean and range of observation time stratified by outcome 5-yr Survival n=614 (83.5%) Died within 5-yrs=121 (16.5%) Mean (SD) Range Mean Range Observation Time (years) 5.33 (0.29) 5.00-5.91 2.97 (1.40) 0.19-4.98 Inference: Seven hundred and thirty-five patients were enrolled in this prospective cohort study to investigate the incidence of cardiovascular (ASCVD) and cerebrovascular disease. In order to investigate associations between 5-year mortality and prevalence of ASCVD, we need complete information on all participants for at least 5 years of follow-up. We would be concerned about censoring if the individuals in the group that survived were not observed for at least 5 years because this would result in uncertainty on their survivorship status. If the data were censored, categorical analyses, including logistic regression are not sufficient, rather alternative methods using Kaplan-Meier survival methods are not necessary. Table 1 shows observation time stratified by survivor status. There were 614 (83.5%) individuals who survived to the end of the observation period and 133 (16.5%) who died. Of those that survived, the minimum observation time was 5.00 years indicating that all participants met the criteria and categorical analyses is an appropriate statistical approach to investigating the association between ASCVD risk factor and 5-year mortality. Question 2A. Methods: The study protocol only included Medicare enrollees aged 65 and older. Age was categorized apriori as a binary variable (65-74 years and ≥ 75 years) based on the understanding that people must have been healthy enough to reach age 65, and age 75 is clinically relevant age where health is expected to decline. ASCVD was categorized as any participant with a history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was compared using the difference in probability between participants with and without ASCVD. An age and sex adjusted estimate of the risk difference was calculated using internal standardization. A 95% confidence interval for the difference in survival probabilities was calculated using Wald type confidence intervals to approximate the normal distribution. The alpha-level for significance is 0.05 and p-values are calculated using Mantel-Haenszel chi-square to test the null hypothesis of no absolute difference in survivor probabilities between those with and without ASCVD. Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8% (465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability of 5-year survival is associated with a 19.4% absolute higher probability of death among those with an ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with an absolute difference between 12.5% and 26.2% in survival. We reject the null hypothesis of equal survival probability based on ASCVD diagnosis at enrollment (p<0.0001). Question 2b. Methods: The study protocol only included Medicare enrollees aged 65 and older. ASCVD was categorized as any participant with a history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was compared using the difference in probability between participants with and without ASCVD. Since we know apriori that age and mortality has a curvilinear relationship, a higher order term was generated to account for this non-linear assumption. A continuous age and binary sex adjusted estimate of the risk difference was calculated using linear regression with robust standard errors. The alpha-level for significance is 0.05. Risk difference and 2-sided 95% confidence intervals are reported. Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8% (465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability of 5-year survival is associated with a 19.1% absolute higher probability of death among those with an ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with an absolute difference between 12.3% and 25.8% in survival. A two-sided p-value of <0.0001 suggests that we can with high confidence reject the null hypothesis of equal survival probability based on ASCVD diagnosis at enrollment. Question 2C. The regression model allowed us more flexibility in terms of modeling the age covariate as continuous or discrete. In the stratified analysis, age was binary (65-74 years and ≥ 75 years) which indicates that there may be a stepwise relationship between the predictor of interest and 5-year mortality across the 4 age and sex subgroups. The linear regression allowed for the relationship to change given the curvilinear relationship between age and mortality. Furthermore, regression modeling allows for a robust estimation of the standard errors to account for unequal variance in the exposed and unexposed groups. Finally, the linear relationship makes a comparison across all observations and accounts for differences in the mean-variance relationship because the maximum likelihood estimator borrows data across all strata in a more efficient manner relative to the stratified analysis. The measure of association, risk difference, was similar between the two analyses, 19.4% for the stratified compared to a 19.1% absolute difference for the linear regression with robust standard errors. The statistical inference for the association between the predictor of interest, ASCVD, and the outcome, 5-year mortality is the same in both analyses. Question 3A. Methods: Age was categorized apriori as a binary variable (65-74 years and ≥ 75 years) based on the understanding that people must have been healthy enough to reach age 65, and age 75 is clinically relevant age where health is expected to decline. ASCVD was categorized as any participant with a history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was compared using the odds of ASCVD among participants who died to the odds of ASCVD among participants who survived. An age and sex adjusted estimate of the odds ratio was calculated using Mantel-Haenszel weights. Exact 95% confidence intervals for the odds ratio was calculated. The alphalevel for significance is 0.05 and p-values are calculated using Mantel-Haenszel chi-square to test the null hypothesis of no difference in survivor odds (OR=1) between those with and without ASCVD. Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8% (465/518) among those without an ASCVD diagnosis. The age and sex adjusted odds of 5-year survival is associated with a 3.28-fold higher odds of death among those with an ASCVD diagnosis. Based on the 95% exact confidence interval this difference is consistent with a 2.21 to 4.87-fold higher odds of mortality with an ASCVD diagnosis. We reject the null hypothesis of equal odds of survival based on ASCVD diagnosis at enrollment (p<0.0001). Question 3B. Methods: ASCVD was categorized as any participant with a history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was compared using the difference in probability between participants with and without ASCVD. Since we know apriori that age and mortality has a curvilinear relationship, a higher order term was generated to account for this non-linear assumption. A continuous age and binary sex adjusted estimate of the odds ratio was calculated using logistic regression with robust standard errors. The alpha-level for significance is 0.05. The point estimate for the odds ratio and 2-sided 95% confidence intervals are reported. Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8% (465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability of 5-year survival is associated with a 3.58-fold higher odds of death among those with an ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with a 2.36-5.44-fold higher odds of mortality. A two-sided p-value of <0.0001 suggests that we can with high confidence reject the null hypothesis of equal survival probability based on ASCVD diagnosis at enrollment. Question 3C. The regression model allowed us more flexibility in terms of modeling the age covariate as continuous or discrete. In the stratified analysis, age was binary (65-74 years and ≥ 75 years) which indicates that there may be a stepwise relationship between the predictor of interest and 5-year mortality across the 4 age and sex subgroups. The logistic regression allowed for the relationship to change given the curvilinear relationship between age and mortality. Furthermore, regression modeling allows for a robust estimation of the standard errors to account for unequal variance in the exposed and unexposed groups. Finally, the regression analysis makes a comparison across all observations and accounts for differences in the mean-variance relationship because the maximum likelihood estimator borrows data across all strata in a more efficient manner relative to the stratified analysis. The measure of association, risk difference, was similar between the two analyses, 3.28 (95% CI: 2.214.87) fold higher odds for the stratified compared to a 3.58 (95% CI: 2.36-5.44) –fold higher odds in the regression analysis with robust standard errors. The statistical inference for the association between the predictor of interest, ASCVD, and the outcome, 5-year mortality is the same in both analyses. Question 4A. Methods: Age was categorized apriori as a binary variable (65-74 years and ≥ 75 years) based on the understanding that people must have been healthy enough to reach age 65, and age 75 is clinically relevant age where health is expected to decline. ASCVD was categorized as any participant with a history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was compared using the risk of ASCVD among participants who died to the risk of ASCVD among participants who survived. An age and sex adjusted estimate of the relative risk was calculated using MantelHaenszel weights. A 95% confidence interval for the relative risk was calculated. The alpha-level for significance is 0.05 and p-values are calculated using Mantel-Haenszel chi-square to test the null hypothesis of no difference in survivor probabilities (RR=1) between those with and without ASCVD. Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8% (465/518) among those without an ASCVD diagnosis. The age and sex adjusted probability of 5-year survival is 2.65-fold higher risk among those with an ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with a 1.92 to 3.65-fold higher risk of mortality with an ASCVD diagnosis. We reject the null hypothesis of equal probability of survival based on ASCVD diagnosis at enrollment (p<0.0001). Question 4B. Methods: ASCVD was categorized as any participant with a history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was compared using the difference in probability between participants with and without ASCVD. Since we know apriori that age and mortality has a curvilinear relationship, a higher order term was generated to account for this non-linear assumption. A continuous age and binary sex adjusted estimate of the relative risk was calculated using poisson regression with robust standard errors. The alpha-level for significance is 0.05. The point estimate for the relative risk and 2-sided 95% confidence intervals are reported. Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8% (465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability of 5-year survival is associated with a 2.71-fold higher risk of 5-year mortality among those with an ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with a 1.95-3.77fold higher risk of 5-year mortality. A two-sided p-value of <0.0001 suggests that we can with high confidence reject the null hypothesis of equal survival probability based on ASCVD diagnosis at enrollment. Question 4C. The regression model allowed us more flexibility in terms of modeling the age covariate as continuous or discrete. In the stratified analysis, age was binary (65-74 years and ≥ 75 years) which indicates that there may be a stepwise relationship between the predictor of interest and 5-year mortality across the 4 age and sex subgroups. The poisson regression allowed for the relationship to change given the curvilinear relationship between age and mortality. Furthermore, regression modeling allows for a robust estimation of the standard errors to account for unequal variance in the exposed and unexposed groups. Finally, the regression analysis makes a comparison across all observations and accounts for differences in the mean-variance relationship because the maximum likelihood estimator borrows data across all strata in a more efficient manner relative to the stratified analysis. The measure of association, risk difference, was similar between the two analyses, 2.65 (95% CI: 1.924.22) fold higher risk for the stratified compared to a 2.71 (95% CI: 1.95-3.77) –fold higher risk in the regression analysis with robust standard errors. The statistical inference for the association between the predictor of interest, ASCVD, and the outcome, 5-year mortality is the same in both analyses. Question 5. In all approaches, we tested the same underlying hypothesis, that there was no difference in 5-year mortality between the participants with and without ASCVD. Table 2 shows the point estimates and 95% confidence intervals for each measure of association. The relative risk and the odds ratio point estimates were not similar indicating that ASCVD has a higher prevalence. The difference between the stratified and regression analysis for the relative measures, odds ratio and relative risk were similar; the regression analysis had a higher measure of association than the stratified analysis, but the confidence interval for the odds ratio was wider than the confidence intervals for the relative risk. The relative risk is a better measure of association than the odds ratio since this is a cohort study and we can quantify the amount of risk of 5-year mortality given the ASCVD risk factor in this population. The risk difference may be the best measure of association. This would enable clinicians to communicate risk to an individual with a specific risk profile. The absolute difference also did not change as much between the stratified and regression analyses. Table 2. Comparison of Point Estimates and 95% Confidence Intervals for each measure of association and analysis. Stratified Analysis Regression Analysis Measure of Association Point Estimate 95% CI Point Estimate 95% CI Risk Difference 19.4% 12.5%-26.2% 19.1% 12.3%-25.8% Odds Ratio 3.44 2.28-5.18 3.58 2.36-5.44 Relative Risk 2.65 1.92-4.22 2.71 1.95-3.77 Question 6A. Methods: Incident cases of colorectal cancer reported to the Surveillance, Epidemiology and End Results Program (SEER) registry of the National Cancer Institute. SEER sites include the greater metro Atlanta region, Connecticut, Detroit region, Hawaii, Iowa, New Mexico, San Francisco region, Utah, and Western Washington State. Incidence rates are defined as the number of cancer diagnosed during the years 1973 to 1984. Incidence rates are standardized to the US 1980 census population for the geographical areas covered by the SEER registries during that time period and stratified by sex, age, and SEER site. Persontime at risk also includes those individuals diagnosed with cancer because this would be a negligible difference in the overall denominator. Since the scientific question of interest involves the association between birth-place (US born vs foreign born), the data were restricted to surveillance data with known birthplace. Data were also restricted to whites living in the United States at the time of cancer diagnosis. Standardized incidence rates and 95% exact confidence intervals for males and females are reported. Inference: During the 1973 to 1984 surveillance period, 94,440 incident cases of cancer diagnosed in the catchment area were reported to SEER. After restriction to cases with known birthplace, 73,694 (22% excluded) were included in the stratified analysis. In the analysis, there were 62,668 (85.0%%) incidence cancer cases among US-born and 11,026 (15.0%) among foreign born. The standardized, age, sex, and site adjusted incidence of cancer among women was 0.989 per person-year. This difference would be consistent with a true incident rate between 0.949 and 1.030 per person-year, indicating no difference in risk of colorectal cancer for women of US or foreign-born status. Contrastingly for men, the standardized, age, sex, and site adjusted incidence of cancer was 1.047 per person-year. This difference would be consistent with a true incident rate between 1.004 and 1.091 per person-year, indicating a higher level of risk for men born outside the US. Question 6B. Methods: Incident cases of colorectal cancer reported to the Surveillance, Epidemiology and End Results Program (SEER) registry of the National Cancer Institute. SEER sites include the greater metro Atlanta region, Connecticut, Detroit region, Hawaii, Iowa, New Mexico, San Francisco region, Utah, and Western Washington State. Incidence rates are defined as the number of cancer diagnosed during the years 1973 to 1984. Incidence rates are standardized to the US 1980 census population for the geographical areas covered by the SEER registries during that time period and adjusted for sex, age, and SEER site. Persontime at risk also includes those individuals diagnosed with cancer because this would be a negligible difference in the overall denominator. Since the scientific question of interest involves the association between birth-place (US born vs foreign born), the data were restricted to surveillance data with known birthplace. Data were also restricted to whites living in the United States at the time of cancer diagnosis. Poisson regression with robust standard errors was used to calculate the standardized incidence rate of colorectal cancer. Standardized incidence rates and 95% confidence intervals for males and females are reported. Inference: During the 1973 to 1984 surveillance period, 94,440 incident cases of cancer diagnosed in the catchment area were reported to SEER. After restriction to cases with known birthplace, 73,694 (22% excluded) were included in the stratified analysis. In the analysis, there were 62,668 (85.0%%) incidence cancer cases among US-born and 11,026 (15.0%) among foreign born. The standardized, age, sex, and site adjusted incidence of cancer among women was 0.955 per person-year. This difference would be consistent with a true incident rate between 0.842 and 1.082 per person-year, indicating no difference in risk of colorectal cancer for women of US or foreign-born status. Similarly for men, the standardized, age, sex, and site adjusted incidence of cancer was 1.013 per person-year. This difference would be consistent with a true incident rate between 0.870 and 1.180 per person-year, indicating no difference in risk of colorectal cancer for men of US or foreign-born status. Question 6C. In the stratified and regression approach to standardized rates, we tested the same underlying hypothesis, that there was no difference risk of colorectal cancer by place of birth. In the stratified analysis, the stratification variable for age, sex, and SEER site had many different levels, but the standardization allowed for the estimate to use more complete information and borrow across the strata using the mean-variance relationship. The regression modeling allows for both a robust estimation of the standard errors to account for unequal variance in the exposed and unexposed groups and to account for differences in the meanvariance relationship because the maximum likelihood estimator borrows data across all strata in a more efficient manner relative to the stratified analysis. The measure of association, incident rate, was similar between the two analyses. However, the conclusion for the reported colorectal incidence rate in men was different in the stratified and regression analysis. The regression analysis had wider confidence intervals than the stratified analysis, indicating uncertainty indicating that the model did not account for the relationships between the data as well as the stratified analysis or another type of regression model that would have fit the data better.