2542 - Emerson Statistics

advertisement
Code: 2542
Categorical Analysis
Homework #2
Due: Sunday, October 12 by 5:30 pm
Question 1.
Methods: The data were stratified on mortality and the following descriptive statistics mean, standard
deviation, minimum, and maximum observation time for each group is reported in Table 1. Observation
time was the period between enrollment either death or September 16, 1997, whichever came first.
Table 1: Comparison of mean and range of observation time stratified by outcome
5-yr Survival n=614 (83.5%)
Died within 5-yrs=121 (16.5%)
Mean (SD)
Range
Mean
Range
Observation Time (years)
5.33 (0.29)
5.00-5.91
2.97 (1.40)
0.19-4.98
Inference: Seven hundred and thirty-five patients were enrolled in this prospective cohort study to
investigate the incidence of cardiovascular (ASCVD) and cerebrovascular disease. In order to investigate
associations between 5-year mortality and prevalence of ASCVD, we need complete information on all
participants for at least 5 years of follow-up. We would be concerned about censoring if the individuals
in the group that survived were not observed for at least 5 years because this would result in
uncertainty on their survivorship status. If the data were censored, categorical analyses, including
logistic regression are not sufficient, rather alternative methods using Kaplan-Meier survival methods
are not necessary. Table 1 shows observation time stratified by survivor status. There were 614 (83.5%)
individuals who survived to the end of the observation period and 133 (16.5%) who died. Of those that
survived, the minimum observation time was 5.00 years indicating that all participants met the criteria
and categorical analyses is an appropriate statistical approach to investigating the association between
ASCVD risk factor and 5-year mortality.
Question 2A.
Methods: The study protocol only included Medicare enrollees aged 65 and older. Age was categorized
apriori as a binary variable (65-74 years and ≥ 75 years) based on the understanding that people must
have been healthy enough to reach age 65, and age 75 is clinically relevant age where health is expected
to decline. ASCVD was categorized as any participant with a history of prior angina, myocardial
infarction, transient ischemic attacks, or stroke. Survivor status was compared using the difference in
probability between participants with and without ASCVD. An age and sex adjusted estimate of the risk
difference was calculated using internal standardization. A 95% confidence interval for the difference in
survival probabilities was calculated using Wald type confidence intervals to approximate the normal
distribution. The alpha-level for significance is 0.05 and p-values are calculated using Mantel-Haenszel
chi-square to test the null hypothesis of no absolute difference in survivor probabilities between those
with and without ASCVD.
Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year
survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8%
(465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability
of 5-year survival is associated with a 19.4% absolute higher probability of death among those with an
ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with an absolute
difference between 12.5% and 26.2% in survival. We reject the null hypothesis of equal survival
probability based on ASCVD diagnosis at enrollment (p<0.0001).
Question 2b.
Methods: The study protocol only included Medicare enrollees aged 65 and older. ASCVD was
categorized as any participant with a history of prior angina, myocardial infarction, transient ischemic
attacks, or stroke. Survivor status was compared using the difference in probability between participants
with and without ASCVD. Since we know apriori that age and mortality has a curvilinear relationship, a
higher order term was generated to account for this non-linear assumption. A continuous age and
binary sex adjusted estimate of the risk difference was calculated using linear regression with robust
standard errors. The alpha-level for significance is 0.05. Risk difference and 2-sided 95% confidence
intervals are reported.
Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year
survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8%
(465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability
of 5-year survival is associated with a 19.1% absolute higher probability of death among those with an
ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with an absolute
difference between 12.3% and 25.8% in survival. A two-sided p-value of <0.0001 suggests that we can
with high confidence reject the null hypothesis of equal survival probability based on ASCVD diagnosis at
enrollment.
Question 2C.
The regression model allowed us more flexibility in terms of modeling the age covariate as continuous or
discrete. In the stratified analysis, age was binary (65-74 years and ≥ 75 years) which indicates that there
may be a stepwise relationship between the predictor of interest and 5-year mortality across the 4 age
and sex subgroups. The linear regression allowed for the relationship to change given the curvilinear
relationship between age and mortality. Furthermore, regression modeling allows for a robust
estimation of the standard errors to account for unequal variance in the exposed and unexposed
groups. Finally, the linear relationship makes a comparison across all observations and accounts for
differences in the mean-variance relationship because the maximum likelihood estimator borrows data
across all strata in a more efficient manner relative to the stratified analysis.
The measure of association, risk difference, was similar between the two analyses, 19.4% for the
stratified compared to a 19.1% absolute difference for the linear regression with robust standard errors.
The statistical inference for the association between the predictor of interest, ASCVD, and the outcome,
5-year mortality is the same in both analyses.
Question 3A.
Methods: Age was categorized apriori as a binary variable (65-74 years and ≥ 75 years) based on the
understanding that people must have been healthy enough to reach age 65, and age 75 is clinically
relevant age where health is expected to decline. ASCVD was categorized as any participant with a
history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was
compared using the odds of ASCVD among participants who died to the odds of ASCVD among
participants who survived. An age and sex adjusted estimate of the odds ratio was calculated using
Mantel-Haenszel weights. Exact 95% confidence intervals for the odds ratio was calculated. The alphalevel for significance is 0.05 and p-values are calculated using Mantel-Haenszel chi-square to test the
null hypothesis of no difference in survivor odds (OR=1) between those with and without ASCVD.
Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year
survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8%
(465/518) among those without an ASCVD diagnosis. The age and sex adjusted odds of 5-year survival is
associated with a 3.28-fold higher odds of death among those with an ASCVD diagnosis. Based on the
95% exact confidence interval this difference is consistent with a 2.21 to 4.87-fold higher odds of
mortality with an ASCVD diagnosis. We reject the null hypothesis of equal odds of survival based on
ASCVD diagnosis at enrollment (p<0.0001).
Question 3B.
Methods: ASCVD was categorized as any participant with a history of prior angina, myocardial infarction,
transient ischemic attacks, or stroke. Survivor status was compared using the difference in probability
between participants with and without ASCVD. Since we know apriori that age and mortality has a
curvilinear relationship, a higher order term was generated to account for this non-linear assumption. A
continuous age and binary sex adjusted estimate of the odds ratio was calculated using logistic
regression with robust standard errors. The alpha-level for significance is 0.05. The point estimate for
the odds ratio and 2-sided 95% confidence intervals are reported.
Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year
survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8%
(465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability
of 5-year survival is associated with a 3.58-fold higher odds of death among those with an ASCVD
diagnosis. Based on the 95% confidence interval this difference is consistent with a 2.36-5.44-fold higher
odds of mortality. A two-sided p-value of <0.0001 suggests that we can with high confidence reject the
null hypothesis of equal survival probability based on ASCVD diagnosis at enrollment.
Question 3C.
The regression model allowed us more flexibility in terms of modeling the age covariate as continuous or
discrete. In the stratified analysis, age was binary (65-74 years and ≥ 75 years) which indicates that there
may be a stepwise relationship between the predictor of interest and 5-year mortality across the 4 age
and sex subgroups. The logistic regression allowed for the relationship to change given the curvilinear
relationship between age and mortality. Furthermore, regression modeling allows for a robust
estimation of the standard errors to account for unequal variance in the exposed and unexposed
groups. Finally, the regression analysis makes a comparison across all observations and accounts for
differences in the mean-variance relationship because the maximum likelihood estimator borrows data
across all strata in a more efficient manner relative to the stratified analysis.
The measure of association, risk difference, was similar between the two analyses, 3.28 (95% CI: 2.214.87) fold higher odds for the stratified compared to a 3.58 (95% CI: 2.36-5.44) –fold higher odds in the
regression analysis with robust standard errors. The statistical inference for the association between the
predictor of interest, ASCVD, and the outcome, 5-year mortality is the same in both analyses.
Question 4A.
Methods: Age was categorized apriori as a binary variable (65-74 years and ≥ 75 years) based on the
understanding that people must have been healthy enough to reach age 65, and age 75 is clinically
relevant age where health is expected to decline. ASCVD was categorized as any participant with a
history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. Survivor status was
compared using the risk of ASCVD among participants who died to the risk of ASCVD among participants
who survived. An age and sex adjusted estimate of the relative risk was calculated using MantelHaenszel weights. A 95% confidence interval for the relative risk was calculated. The alpha-level for
significance is 0.05 and p-values are calculated using Mantel-Haenszel chi-square to test the null
hypothesis of no difference in survivor probabilities (RR=1) between those with and without ASCVD.
Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year
survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8%
(465/518) among those without an ASCVD diagnosis. The age and sex adjusted probability of 5-year
survival is 2.65-fold higher risk among those with an ASCVD diagnosis. Based on the 95% confidence
interval this difference is consistent with a 1.92 to 3.65-fold higher risk of mortality with an ASCVD
diagnosis. We reject the null hypothesis of equal probability of survival based on ASCVD diagnosis at
enrollment (p<0.0001).
Question 4B.
Methods: ASCVD was categorized as any participant with a history of prior angina, myocardial infarction,
transient ischemic attacks, or stroke. Survivor status was compared using the difference in probability
between participants with and without ASCVD. Since we know apriori that age and mortality has a
curvilinear relationship, a higher order term was generated to account for this non-linear assumption. A
continuous age and binary sex adjusted estimate of the relative risk was calculated using poisson
regression with robust standard errors. The alpha-level for significance is 0.05. The point estimate for
the relative risk and 2-sided 95% confidence intervals are reported.
Inference: In this cohort study of 735 Medicare enrollees, the age and sex adjusted probability of 5-year
survival given an ASCVD diagnosis at study enrollment was 66.7% (149/217) compared to 89.8%
(465/518) among those without an ASCVD diagnosis. The age and sex adjusted difference in probability
of 5-year survival is associated with a 2.71-fold higher risk of 5-year mortality among those with an
ASCVD diagnosis. Based on the 95% confidence interval this difference is consistent with a 1.95-3.77fold higher risk of 5-year mortality. A two-sided p-value of <0.0001 suggests that we can with high
confidence reject the null hypothesis of equal survival probability based on ASCVD diagnosis at
enrollment.
Question 4C.
The regression model allowed us more flexibility in terms of modeling the age covariate as continuous or
discrete. In the stratified analysis, age was binary (65-74 years and ≥ 75 years) which indicates that there
may be a stepwise relationship between the predictor of interest and 5-year mortality across the 4 age
and sex subgroups. The poisson regression allowed for the relationship to change given the curvilinear
relationship between age and mortality. Furthermore, regression modeling allows for a robust
estimation of the standard errors to account for unequal variance in the exposed and unexposed
groups. Finally, the regression analysis makes a comparison across all observations and accounts for
differences in the mean-variance relationship because the maximum likelihood estimator borrows data
across all strata in a more efficient manner relative to the stratified analysis.
The measure of association, risk difference, was similar between the two analyses, 2.65 (95% CI: 1.924.22) fold higher risk for the stratified compared to a 2.71 (95% CI: 1.95-3.77) –fold higher risk in the
regression analysis with robust standard errors. The statistical inference for the association between the
predictor of interest, ASCVD, and the outcome, 5-year mortality is the same in both analyses.
Question 5.
In all approaches, we tested the same underlying hypothesis, that there was no difference in 5-year
mortality between the participants with and without ASCVD. Table 2 shows the point estimates and 95%
confidence intervals for each measure of association. The relative risk and the odds ratio point estimates
were not similar indicating that ASCVD has a higher prevalence. The difference between the stratified
and regression analysis for the relative measures, odds ratio and relative risk were similar; the
regression analysis had a higher measure of association than the stratified analysis, but the confidence
interval for the odds ratio was wider than the confidence intervals for the relative risk. The relative risk
is a better measure of association than the odds ratio since this is a cohort study and we can quantify
the amount of risk of 5-year mortality given the ASCVD risk factor in this population.
The risk difference may be the best measure of association. This would enable clinicians to communicate
risk to an individual with a specific risk profile. The absolute difference also did not change as much
between the stratified and regression analyses.
Table 2. Comparison of Point Estimates and 95% Confidence Intervals for each measure of association
and analysis.
Stratified Analysis
Regression Analysis
Measure of Association
Point Estimate
95% CI
Point Estimate
95% CI
Risk Difference
19.4%
12.5%-26.2%
19.1%
12.3%-25.8%
Odds Ratio
3.44
2.28-5.18
3.58
2.36-5.44
Relative Risk
2.65
1.92-4.22
2.71
1.95-3.77
Question 6A.
Methods: Incident cases of colorectal cancer reported to the Surveillance, Epidemiology and End Results
Program (SEER) registry of the National Cancer Institute. SEER sites include the greater metro Atlanta
region, Connecticut, Detroit region, Hawaii, Iowa, New Mexico, San Francisco region, Utah, and Western
Washington State. Incidence rates are defined as the number of cancer diagnosed during the years 1973
to 1984. Incidence rates are standardized to the US 1980 census population for the geographical areas
covered by the SEER registries during that time period and stratified by sex, age, and SEER site. Persontime at risk also includes those individuals diagnosed with cancer because this would be a negligible
difference in the overall denominator. Since the scientific question of interest involves the association
between birth-place (US born vs foreign born), the data were restricted to surveillance data with known
birthplace. Data were also restricted to whites living in the United States at the time of cancer diagnosis.
Standardized incidence rates and 95% exact confidence intervals for males and females are reported.
Inference: During the 1973 to 1984 surveillance period, 94,440 incident cases of cancer diagnosed in the
catchment area were reported to SEER. After restriction to cases with known birthplace, 73,694 (22%
excluded) were included in the stratified analysis. In the analysis, there were 62,668 (85.0%%) incidence
cancer cases among US-born and 11,026 (15.0%) among foreign born. The standardized, age, sex, and
site adjusted incidence of cancer among women was 0.989 per person-year. This difference would be
consistent with a true incident rate between 0.949 and 1.030 per person-year, indicating no difference
in risk of colorectal cancer for women of US or foreign-born status. Contrastingly for men, the
standardized, age, sex, and site adjusted incidence of cancer was 1.047 per person-year. This difference
would be consistent with a true incident rate between 1.004 and 1.091 per person-year, indicating a
higher level of risk for men born outside the US.
Question 6B.
Methods: Incident cases of colorectal cancer reported to the Surveillance, Epidemiology and End Results
Program (SEER) registry of the National Cancer Institute. SEER sites include the greater metro Atlanta
region, Connecticut, Detroit region, Hawaii, Iowa, New Mexico, San Francisco region, Utah, and Western
Washington State. Incidence rates are defined as the number of cancer diagnosed during the years 1973
to 1984. Incidence rates are standardized to the US 1980 census population for the geographical areas
covered by the SEER registries during that time period and adjusted for sex, age, and SEER site. Persontime at risk also includes those individuals diagnosed with cancer because this would be a negligible
difference in the overall denominator. Since the scientific question of interest involves the association
between birth-place (US born vs foreign born), the data were restricted to surveillance data with known
birthplace. Data were also restricted to whites living in the United States at the time of cancer diagnosis.
Poisson regression with robust standard errors was used to calculate the standardized incidence rate of
colorectal cancer. Standardized incidence rates and 95% confidence intervals for males and females are
reported.
Inference: During the 1973 to 1984 surveillance period, 94,440 incident cases of cancer diagnosed in
the catchment area were reported to SEER. After restriction to cases with known birthplace, 73,694
(22% excluded) were included in the stratified analysis. In the analysis, there were 62,668 (85.0%%)
incidence cancer cases among US-born and 11,026 (15.0%) among foreign born. The standardized, age,
sex, and site adjusted incidence of cancer among women was 0.955 per person-year. This difference
would be consistent with a true incident rate between 0.842 and 1.082 per person-year, indicating no
difference in risk of colorectal cancer for women of US or foreign-born status. Similarly for men, the
standardized, age, sex, and site adjusted incidence of cancer was 1.013 per person-year. This difference
would be consistent with a true incident rate between 0.870 and 1.180 per person-year, indicating no
difference in risk of colorectal cancer for men of US or foreign-born status.
Question 6C.
In the stratified and regression approach to standardized rates, we tested the same underlying
hypothesis, that there was no difference risk of colorectal cancer by place of birth. In the stratified
analysis, the stratification variable for age, sex, and SEER site had many different levels, but the
standardization allowed for the estimate to use more complete information and borrow across the
strata using the mean-variance relationship.
The regression modeling allows for both a robust estimation of the standard errors to account for
unequal variance in the exposed and unexposed groups and to account for differences in the meanvariance relationship because the maximum likelihood estimator borrows data across all strata in a
more efficient manner relative to the stratified analysis.
The measure of association, incident rate, was similar between the two analyses. However, the
conclusion for the reported colorectal incidence rate in men was different in the stratified and
regression analysis. The regression analysis had wider confidence intervals than the stratified analysis,
indicating uncertainty indicating that the model did not account for the relationships between the data
as well as the stratified analysis or another type of regression model that would have fit the data better.
Download