2630 - Emerson Statistics

advertisement
Biost 536: Categorical Data Analysis in Epidemiology
Emerson, Fall 2014
Homework #2
October 5, 2014
Written problems: To be submitted as a MS-Word compatible file to the class Catalyst dropbox by 5:30
pm on Sunday, October 12, 2014. See the instructions for peer grading of the homework that are posted
on the web pages.
On this (as all homeworks) Stata / R code and unedited Stata / R output is TOTALLY
unacceptable. Instead, prepare a table of statistics gleaned from the Stata output. The table
should be appropriate for inclusion in a scientific report, with all statistics rounded to a
reasonable number of significant digits. (I am interested in how statistics are used to answer the
scientific question.)
In all problems requesting “statistical analyses” (either descriptive or inferential), you should
present both
 Methods: A brief sentence or paragraph describing the statistical methods you used.
This should be using wording suitable for a scientific journal, though it might be a
little more detailed. A reader should be able to reproduce your analysis. DO NOT
PROVIDE Stata OR R CODE.
 Inference: A paragraph providing full statistical inference in answer to the question.
Please see the supplementary document relating to “Reporting Associations” for
details.
Questions 1-5 refer to analyses of at the data in the file mri.txt that is located on the class webpages.
In those questions we are interested in associations between 5 year mortality and prevalence of
atherosclerotic cardiovascular disease (ASCVD) as defined by history of prior angina, myocardial
infarction, transient ischemic attacks, or stroke. You will likely find it useful to create a new variable
indicating ASCVD. This variable can be derived from the chd and stroke variables.
For these questions, we will be considering adjustment for age and sex using both stratified and
regression analyses. For the stratified analyses, it will be necessary to use an appropriate categorization of
age.
1. We are interested in analyzing associations between 5 year mortality and prevalence of
ASCVD at study enrollment using statistical methods appropriate for binary response
variables. The observation time for death among these subjects is potentially subject to
censoring. Provide a statistical analysis demonstrating that such methods as logistic
regression can be used to answer this question.
Answer:
Method: To check if there is censoring before the 5-year time point, we summarize the
observation time of subjects with and without death.
Results: Among the total 735 subjects, there are a 121 (16.5%) death within 5-years after
MRI. The average observation time among those who dead within 5 years after MRI is
2.97 years (1083 days), with a minimum time of 0.19 year (68 days) and a maximum of
4.98 years (1819 days). The average observation time among those who did not die within
5 years after MRI is 5.33 years (1946 days), with a minimum time of 5 years (1827 days)
and a maximum of 5.91 years (2159 days).
Since all the death events prior to the 5-year time point are observed, and no subject is
censored without an observation of death prior to the 5-year time point. There is no
censoring in regard with our scientific question of interest. Given that, logistic regression
analysis by dichotomizing death events regardless of observation time is suitable to
answer this question since all the subjects are followed up to 5 years after MRI.
2. Using the risk difference (RD) as a measure of association, provide statistical inference
regarding an association between 5 year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cs or
an equivalent analysis in R).
b. Answer the question using an appropriate regression model.
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer:
a. Methods: For each subject, if he/she had been diagnosed with coronary heart
disease or a cerebrovascular disease prior to MRI, he/she is considered to
have prevalent ASCVD, which is dichotomous in the analysis. For each
subject, if the vital status is dead and the observation time is shorter than 5
years, he/she is considered to be dead within 5 years, and the 5 year mortality
for each subject is also dichotomous.
In order to do stratified analysis, we also categorized age based on a 5-year
period (65-69, 70-74, 75-79, 80-84, and >=85), while subjects who are 90 or
older are combined with those who are 85 to 89, due to small sample size for
the oldest group. The 5-year age period has a good balance between keeping
the precision and accuracy for the age trend, have sufficient cell counts for
each stratum and have a small number of categories for stratified analysis.
The association between 5-year mortality and ASCVD prevalence adjusted
for sex and age is calculated using a Cochran-Mantel-Haenszel approach.
The risk difference is calculated within each stratum of male and age
categories. The weight for each stratum is calculated based on the age and
sex distribution of subjects with ASCVD prior to MRI. The overall risk
difference is then calculated. Corresponding 95% confidence interval for
adjusted risk difference is also computed.
Results:
There are 68 people dead within 5 years among 217 subjects with ASCVD
prior to MRI (31.34%), and 53 people dead within 5 years among 518
subjects without ASCVD prior to MRI (10.23%). Among subjects with
ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after
MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI
(22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of
236) males and 8.16% (23 out of 282) females dead within 5 years after MRI.
Subjects with ASCVD have a higher risk of 5-year mortality at each age
group, compared to subjects without ASCVD.
The crude risk difference is 0.211, with 95% confidence interval of 0.144 to
0.278. Without adjustment for age and sex, subjects with ASCVD prior to
MRI have a 21.1% higher risk of death within 5 years after MRI, compared
to subjects without ASCVD.
Based on estimates by Cochran-Mantel-Haenszel method adjusting for age
and 5-year age category, the risk difference adjusted for age and sex is 0.181,
with a 95% confidence interval of 0.115 to 0.246. This indicated that after
adjusting for age and sex, subjects with ASCVD prior to MRI have a 18.1%
absolute higher risk in 5 year mortality probabilities, compared to subjects
without ASCVD prior to MRI. With 95% confidence, such an observation is
judged to be consistent with a true absolute higher risk of 5-year mortality
between 11.5% and 24.6%. Based on the adjusted risk difference and
corresponding 95% confidence interval, we conclude that prevalence of
ASCVD is associated with 5-year mortality.
b. Methods: The sex and age adjusted estimates of ASCVD and 5-year
mortality risk difference was based on a linear regression parameter for the
indicator of ASCVD, including age and sex and the interaction term between
age and sex as covariates. Age is considered as a continuous variable for both
its main effect and the interaction term with sex. A 95% confidence interval
for the sex and age adjusted difference in 5-year mortality difference was
calculated using Wald type inference based on the Huber-White sandwich
estimator of standard errors in order to allow for the mean-variance
relationship associated with binary random variables. Confidence intervals
and p values assumed the approximate normal distribution for the regression
parameter estimates.
Results:
There are 68 people dead within 5 years among 217 subjects with ASCVD
prior to MRI (31.34%), and 53 people dead within 5 years among 518
subjects without ASCVD prior to MRI (10.23%). Among subjects with
ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after
MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI
(22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of
236) males and 8.16% (23 out of 282) females dead within 5 years after MRI.
Subjects with ASCVD have a higher risk of 5-year mortality at each age
group, compared to subjects without ASCVD.
Based on a linear regression model adjusting for sex, age and the interaction
term between sex and age, prevalence of ASCVD is estimated to be
associated with an absolute 18.7% higher risk in the probability of death
within 5 years. With 95% confidence, such an estimate is judged to be
consistent with a true absolute higher risk of 5-year mortality between 12.0%
and 25.4%. Based on a two-sided p value of 0.000, we reject the null
hypothesis that the 5-year mortality is equal among subjects with and
without ASCVD prior to MRI.
c. By comparing results form problem 2a and 2b, we can see that the estimates
of age and sex adjusted risk difference are close to each other although the 95%
confidence intervals are a little bit difference. The estimates are close to each
other because we adjusted for both potential confounding and effect
modification effects in both stratified analysis and linear regression analysis.
By including the interaction model in the linear regression, we do not borrow
information across subgroups, which we do if only adjust for the main effect
of age and sex. However, the estimates are not exactly the same because we
are using a 5-year categorized age group in the stratified analysis, while a
continuous variable of age is used in the linear regression analysis, which has
more precision of capturing the trend of disease risk at different ages. Last
but not least, the linear regression model takes into account the meanvariance relationship when calculating standard errors, while the MantelHaenszel model ignores that, thus the interval estimates from the linear
regression analysis has a higher precision compared to the stratified analysis.
3. Using the risk difference (OR) as a measure of association, provide statistical inference
regarding an association between 5 year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cc or
an equivalent analysis in R).
b. Answer the question using an appropriate regression model.
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer:
a. Methods: For each subject, if he/she had been diagnosed with coronary heart
disease or a cerebrovascular disease prior to MRI, he/she is considered to
have prevalent ASCVD, which is dichotomous in the analysis. For each
subject, if the vital status is dead and the observation time is shorter than 5
years, he/she is considered to be dead within 5 years, and the 5 year mortality
for each subject is also dichotomous.
In order to do stratified analysis, we also categorized age based on a 5-year
period (65-69, 70-74, 75-79, 80-84, and >=85), while subjects who are 90 or
older are combined with those who are 85 to 89, due to small sample size for
the oldest group. The 5-year age period has a good balance between keeping
the precision and accuracy for the age trend, have sufficient cell counts for
each stratum and have a small number of categories for stratified analysis.
The association between 5-year mortality and ASCVD prevalence adjusted
for age and sex is calculated using a Cochran-Mantel-Haenszel approach.
The odds ratio is calculated within each stratum of male and age categories.
The weight for each stratum is calculated based on the marginal counts for
each stratum. The overall odds ratio is then calculated. Corresponding 95%
confidence interval for adjusted odds ratio is also computed using MantelHaenszel method.
Results:
There are 68 people dead within 5 years among 217 subjects with ASCVD
prior to MRI (31.34%), and 53 people dead within 5 years among 518
subjects without ASCVD prior to MRI (10.23%). Among subjects with
ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after
MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI
(22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of
236) males and 8.16% (23 out of 282) females dead within 5 years after MRI.
Subjects with ASCVD have a higher risk of 5-year mortality at each age
group, compared to subjects without ASCVD.
The crude odds ratio is 4.00, with 95% confidence interval of 2.62 to 6.12.
Without adjustment for age and sex, subjects with ASCVD prior to MRI
have 4-fold higher odds of death within 5 years after MRI, compared to
subjects without ASCVD. Such an observation is judged to be consistent with
a true 2.62-fold to 6.12-fold relative increase in the odds or 5-year mortality.
Based on estimates by Cochran-Mantel-Haenszel method adjusting for age
and 5-year age category, the odds ratio adjusted for age and sex is 3.50, with
a 95% confidence interval of 2.31 to 5.29. This indicated that after adjusting
for age and sex, subjects with ASCVD prior to MRI have 3.5-fold higher
odds in 5 year mortality probabilities, compared to subjects without ASCVD
prior to MRI. With 95% confidence, such an observation is judged to be
consistent with a true 2.31-fold to 5.29-fold relative higher odds of 5-year
mortality. Based on the adjusted odd ratio and corresponding 95%
confidence interval, we conclude that prevalence of ASCVD is associated
with 5-year mortality.
b. Methods: The sex and age adjusted estimates of ASCVD and 5-year
mortality odds ratio was based on a logistic regression of the binary indicator
of 5-year mortality on a model that included an indicator of ASCVD, age and
sex and the interaction term between age and sex as covariates. Age is
considered as a continuous variable for both its main effect and the
interaction term with sex. The sex and age adjusted estimate of 5-year
mortality was based on the regression parameter for the indicator of ASCVD.
A 95% confidence interval for the sex and age adjusted difference in 5-year
mortality difference was calculated assuming the approximate normal
distribution for the regression parameter estimates.
Results:
There are 68 people dead within 5 years among 217 subjects with ASCVD
prior to MRI (31.34%), and 53 people dead within 5 years among 518
subjects without ASCVD prior to MRI (10.23%). Among subjects with
ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after
MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI
(22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of
236) males and 8.16% (23 out of 282) females dead within 5 years after MRI.
Subjects with ASCVD have a higher risk of 5-year mortality at each age
group, compared to subjects without ASCVD.
Based on a logistic regression model adjusting for sex and age, prevalence of
ASCVD is estimated to be associated with 3.50-fold higher odds of death
within 5 years after MRI. With 95% confidence, such an estimate is judged
to be consistent with a true odds ratio of 5-year mortality between 2.32 and
5.29. Based on a two-sided p value of 0.000, we reject the null hypothesis that
the 5-year mortality is equal among subjects with and without ASCVD prior
to MRI.
c. By comparing results form problem 3a and 3b, we can see that the estimates
of age and sex adjusted odds ratio are close to each other although the 95%
confidence intervals are a little bit difference. The estimates are close to each
other because we adjusted for both potential confounding and effect
modification effects in both stratified analysis and logistic regression analysis.
By including the interaction model in the logistic regression, we do not
borrow information across subgroups, which we do if only adjust for the
main effect of age and sex. However, the estimates are not exactly the same
because the underlying mathematical methods are different. For stratified
analysis, we are taking the average of odds of exposed and unexposed groups
separately across strata, and then take the ratio, while in the regression
analysis, we calculate the odds ratio for each stratum first, and then take the
average of log ratios, and convert back to the overall odds ratio. In addition,
we are using a 5-year categorized age group in the stratified analysis, while a
continuous variable of age is used in the logistic regression analysis, which
has more precision.
4. Using the risk difference (RR) as a measure of association, provide statistical inference
regarding an association between 5 year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command ir or
an equivalent analysis in R).
b. Answer the question using an appropriate regression model.
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer:
a. Methods: For each subject, if he/she had been diagnosed with coronary heart
disease or a cerebrovascular disease prior to MRI, he/she is considered to
have prevalent ASCVD, which is dichotomous in the analysis. For each
subject, if the vital status is dead and the observation time is shorter than 5
years, he/she is considered to be dead within 5 years, and the 5 year mortality
for each subject is also dichotomous.
In order to do stratified analysis, we also categorized age based on a 5-year
period (65-69, 70-74, 75-79, 80-84, and >=85), while subjects who are 90 or
older are combined with those who are 85 to 89, due to small sample size for
the oldest group. The 5-year age period has a good balance between keeping
the precision and accuracy for the age trend, have sufficient cell counts for
each stratum and have a small number of categories for stratified analysis.
The association between 5-year mortality and ASCVD prevalence adjusted
for age and sex is calculated using a Cochran-Mantel-Haenszel approach.
The rate ratio is calculated within each stratum of male and age categories,
using observation time as the time variable. The weight for each stratum is
calculated based on the marginal counts for each stratum. The overall rate
ratio is then calculated. Corresponding 95% confidence interval for adjusted
rate ratio is also computed using Mantel-Haenszel method
Results:
There are 68 people dead within 5 years among 217 subjects with ASCVD
prior to MRI (31.34%), and 53 people dead within 5 years among 518
subjects without ASCVD prior to MRI (10.23%). Among subjects with
ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after
MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI
(22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of
236) males and 8.16% (23 out of 282) females dead within 5 years after MRI.
Subjects with ASCVD have a higher risk of 5-year mortality at each age
group, compared to subjects without ASCVD.
The crude rate ratio is 3.32, with 95% confidence interval of 2.28 to 4.85.
Without adjustment for age and sex, subjects with ASCVD prior to MRI
have a 3.32-fold higher rate of death within 5 years after MRI, compared to
subjects without ASCVD. With 95% confidence, such an observation is
judged to be consistent with a true 2.28-fold to 4.85-fold relatively higher rate
of 5-year mortality.
The rate ratio adjusted for age and sex is 2.88, with a 95% confidence
interval of 2.01 to 4.12. This indicated that after adjusting for age and sex,
subjects with ASCVD prior to MRI have a 2.63-fold higher rate in 5 year
mortality probabilities, compared to subjects without ASCVD prior to MRI.
With 95% confidence, such an observation is judged to be consistent with a
true rate ratio of 5-year mortality between 2.01 and 4.12. Based on the
adjusted rate ratio and corresponding 95% confidence interval, we conclude
that prevalence of ASCVD is associated with 5-year mortality.
b. Methods: The sex and age adjusted estimates of ASCVD and 5-year
mortality rate ratio and corresponding confidence interval were based on a
Poisson regression of the binary indicator of 5-year mortality on a model that
included an indicator of ASCVD, age and sex and the interaction term
between age and sex as covariates and observation time as time variable. Age
was considered as a continuous variable for both its main effect and the
interaction term with sex. The sex and age adjusted estimate of 5-year
mortality was based on the regression parameter for the indicator of ASCVD.
A 95% confidence interval for the sex and age adjusted difference in 5-year
mortality difference was calculated assuming the approximate normal
distribution for the regression parameter estimates along with the standard
errors based on Huber-White sandwich estimator in order to allow for the
mean-variance relationship of binary response variables.
Results:
There are 68 people dead within 5 years among 217 subjects with ASCVD
prior to MRI (31.34%), and 53 people dead within 5 years among 518
subjects without ASCVD prior to MRI (10.23%). Among subjects with
ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after
MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI
(22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of
236) males and 8.16% (23 out of 282) females dead within 5 years after MRI.
Subjects with ASCVD have a higher risk of 5-year mortality at each age
group, compared to subjects without ASCVD.
Based on a Poisson regression model adjusting for sex and age, prevalence of
ASCVD is estimated to be associated with an 2.89-fold higher rate in death
within 5 years. With 95% confidence, such an estimate is judged to be
consistent with a true 2.02-fold to 4.12-fold relatively higher risk of 5-year
mortality. Based on a two-sided p value of 0.000, we reject the null hypothesis
that the 5-year mortality is equal among subjects with and without ASCVD
prior to MRI.
c. By comparing results form problem 4a and 4b, we can see that the point and
interval estimates of age and sex adjusted relative risk are similar to each
other. The estimates are close to each other because we adjusted for both
potential confounding and effect modification effects in both stratified
analysis and Poisson regression analysis. By including the interaction model
in the Poisson regression, we do not borrow information across subgroups,
which we do if only adjust for the main effect of age and sex. However, the
estimates are not exactly the same because the underlying mathematical
methods are different. For stratified analysis, we are taking the average of
rate of exposed and unexposed groups separately across strata, and then take
the ratio, while in the regression analysis, we calculate the rate ratio for each
stratum first, and then take the average of log ratios, and convert back to the
overall rate ratio. In addition, we are using a 5-year categorized age group in
the stratified analysis, while a continuous variable of age is used in the
Poisson regression analysis, which has more precision. Last but not least, the
Poisson regression model takes into account the mean-variance relationship
when calculating standard errors, while the Mantel-Haenszel method
ignored that. Thus, the interval estimates has a higher precision compared to
the stratified analysis. But in the case of incidence rate ratio, MantelHaenszel method handles it well, so the estimates are similar from both
methods.
5. Comment very briefly on the similarity or differences among the three approaches.
Which would you tend to prefer in general? Why?
Answer:
All three methods compared the difference of 5-year mortality between subjects with
and without ASCVD prior to MRI. And we observe significant association between
ASCVD and 5-year mortality regardless of the measurement. RD and RR estimates the
risk on absolute and relative scale, while OR estimates odds, instead of risks, on a
relative scale. Risk difference needs risk estimates for each group, while rate ratio takes
into account the time of event and odds ratio does not need absolute risk estimates or
the time to event. Both rate ratio and odds ratio provide estimates on a relative scale,
but the risk difference provides estimate based on an absolute scale. The estimates of
risk difference is more relevant to quantifying the excess risk of 5-year mortality due to
ASCVD, and the relative estimates are more relevant to quantifying the association
between ASCVD and 5-year mortality.
Since the 5-year mortality is not a rare event among exposed and unexposed groups
(31.34% and 10.23%), odds ratio may not be an overestimate for the association
between 5-year mortality and ASCVD. And since our scientific question is the
association between ASCVD and the mortality at the time point of 5 year, we do not
care about the time to event that much. So we are not that interested in rate ratio. Thus,
we would prefer to use risk difference as our estimates on this question of interest.
Furthermore, Mantel-Haenszel does not take into account the mean-variance
association of proportions, while linear regression analysis with robust standard error
does. So we would prefer using risk difference estimated based on linear regression
model with robust standard error.
Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the U.S. as
a function of birthplace (U.S. born vs foreign born) (see datafile surveillance.txt and
documentation surveillance.doc on the class web pages).
6. Using the incidence ratio as a measure of association, provide inference for an association
between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and
SEER..
a. Answer the question using directly standardized rates, with standardization to the
U.S. population.
b. Answer the question using an appropriate regression model.
c. What is the difference in the statistical models you used? That is, how would you
explain any differences between the two analysis approaches?
Answer:
a. Methods: For standardization analysis, only subjects with a known birth
place are included. Subjects reported unknown on their birth place are
excluded from any further analysis. An indicator variable is created for
subjects who are born in foreign countries, and US born subjects are
considered as unexposed group in this analysis.
An indicator variable is created for each stratum by age and SEER sites. The
standardization analysis of incidence ratio is conducted among males and
females separately. And another indicator variable is created for each
stratum by age, sex and SEER sites, a summary estimate on incidence ratio is
then calculated.
The US population census data in 1980 is used as the reference group for
standardization. The sum of person-years in the census by age, sex and
SEER sites is used as weights in the standardization.
Results:
The crude incidence rate of colorectal cancer among subjects who were born
in the US is 325.9 cases per 1,000,000 person-years, and the crude incidence
rate of colorectal cancer among subjects who were born in the US is 892.7
cases per 1,000,000 person-years. Subjects who were born in foreign
countries are estimated to have a 2.74-fold higher incidence rate of colorectal
cancer, compared to those who were born in the US. And such an
observation is estimated to be consistent with a true 2.68-fold to 2.80-fold
incidence ratio with 95% confidence.
Among males, the crude incidence rate of colorectal cancer was 309.5 cases
per 1,000,000 person-years among US born subjects, and 972.6 cases per
1,000,000 person-years among foreign born subjects. Males who are foreign
born have a 3.14-fold relative higher incidence rate of colorectal cancer,
compared to US born males. And such an observation is estimated to be
consistent with a true 3.05-fold to 3.23-fold incidence ratio with 95%
confidence.
Among female subjects, the crude incidence rate of colorectal cancer was
341.7 cases per 1,000,000 person-years among US born subjects and 825.2
cases per 1,000,000 person-years among foreign born subjects. Females who
are foreign born have a 2.41-fold relative higher incidence rate of colorectal
cancer, compared to US born females. And such an observation is estimated
to be consistent with a true 2.35-fold to 2.49-fold incidence ratio with 95%
confidence.
After adjustment for age and SEER sites by standardization to the US 1980
census data, the birth place of foreign countries is estimated to be associated
with a 1.047-fold higher incidence rate of colorectal cancer, compared to the
birth place of US among male subjects. Such an observation is estimated to
be consistent with a true 1.004 to 1.091-fold incidence rate ratio with 95%
confidence interval. Based on the point and interval estimates, we conclude
that there is marginal association between birth place and incidence of
colorectal cancer among males after adjusting for SEER sites.
After adjustment for age and SEER sites by standardization to the US 1980
census data, the birth place of foreign countries is estimated to be associated
with a 0.989-fold incidence rate ratio of colorectal cancer, compared to the
birth place of US among female subjects. Such an observation is estimated to
be consistent with a true 0.949 to 1.030-fold incidence rate ratio with 95%
confidence interval. Based on the point and interval estimates, we conclude
that there is no association between birth place and incidence of colorectal
cancer among females after adjusting for age and SEER sites.
For the summary estimate of all subjects, after adjustment for age and SEER
sites by standardization to the US 1980 census data, the birth place of foreign
countries is estimated to be associated with a 1.02-fold incidence rate ratio of
colorectal cancer, compared to the birth place of US. Such an observation is
estimated to be consistent with a true 0.99 to 1.05-fold incidence rate ratio
with 95% confidence interval. Based on the point and interval estimates, we
conclude that there is no association between birth place and incidence of
colorectal cancer after adjusting for age, sex and SEER sites.
In addition, since the incidence rate of colorectal cancer is extremely rare
among children, we would also like to restrict the analysis to adults. After
this further restriction, male adults who are foreign born have a 1.05-fold
higher incidence risk of colorectal cancer, compared to US born male adults
(95% CI: 1.00-1.09). And female adults who are foreign born have an
incidence rate ratio of 0.99 on colorectal cancer, compared to US born female
adults (95% CI: 0.95-1.03). The overall estimates after adjust for age, sex and
SEER sites among adults provide an estimate of 1.02-fold incidence ratio on
colorectal cancer (95% CI: 0.99-1.05). The results do not materially change.
Since there is difference in the direction of association among male and
females, we will report the standardized incidence ratio separately for males
and females.
b. Methods: The age and SEER sites adjusted estimates of birth place and
colorectal cancer incidence rate for males and females, along with
corresponding confidence interval were based on a Poisson regression of the
binary indicator of colorectal cancer incidence on a model that included an
indicator of birth place, age, SEER sites and the interaction term between
age and SEER sites as covariates and person-year observation time as time
variable. The adjusted estimate of colorectal cancer was based on the
regression parameter for the indicator of birth place. A 95% confidence
interval for the age and SEER sites adjusted difference in colorectal cancer
was calculated assuming the approximate normal distribution for the
regression parameter estimates along with the standard errors based on
Huber-White sandwich estimator in order to allow for the mean-variance
relationship of binary response variables.
In addition, an overall estimate of colorectal cancer incidence for all subjects
was calculated based on a Poisson regression of the binary indicator of
colorectal cancer on a model that included an indicator of birth place, age,
sex, SEER sites, and the interaction term between sex, age and SEER sits as
covariates. The person-year observation time is used as time variable. The
adjusted estimate of colorectal cancer was based on the regression parameter
for the indicator of birth place. A 95% confidence interval for the sex, age
and SEER sites adjusted difference in colorectal cancer was calculated
assuming the approximate normal distribution for the regression parameter
estimates along with the standard errors based on Huber-White sandwich
estimator in order to allow for the mean-variance relationship of binary
response variables.
Results:
The crude incidence rate of colorectal cancer among subjects who were born
in the US is 325.9 cases per 1,000,000 person-years, and the crude incidence
rate of colorectal cancer among subjects who were born in the US is 892.7
cases per 1,000,000 person-years. Subjects who were born in foreign
countries are estimated to have a 2.74-fold higher incidence rate of colorectal
cancer, compared to those who were born in the US. And such an
observation is estimated to be consistent with a true 2.68-fold to 2.80-fold
incidence ratio with 95% confidence.
Among males, the crude incidence rate of colorectal cancer was 309.5 cases
per 1,000,000 person-years among US born subjects, and 972.6 cases per
1,000,000 person-years among foreign born subjects. Males who are foreign
born have a 3.14-fold relative higher incidence rate of colorectal cancer,
compared to US born males. And such an observation is estimated to be
consistent with a true 3.05-fold to 3.23-fold incidence ratio with 95%
confidence.
Among female subjects, the crude incidence rate of colorectal cancer was
341.7 cases per 1,000,000 person-years among US born subjects and 825.2
cases per 1,000,000 person-years among foreign born subjects. Females who
are foreign born have a 2.41-fold relative higher incidence rate of colorectal
cancer, compared to US born females. And such an observation is estimated
to be consistent with a true 2.35-fold to 2.49-fold incidence ratio with 95%
confidence.
After adjusting for age and SEER sites among males, the birth place of
foreign countries is estimated to be associated with a 1.062-fold higher
incidence rate of colorectal cancer, compared to the birth place of US among
male subjects. Such an observation is estimated to be consistent with a true
1.026 to 1.010-fold incidence rate ratio with 95% confidence interval. Based
on the p-value of 0.001, we conclude that there is association between birth
place and incidence of colorectal cancer among males after adjusting for age
and SEER sites.
After adjusting for age and SEER sites among females, the birth place of
foreign countries is estimated to be associated with a 2.3% lower in relative
incidence rate of colorectal cancer, compared to the birth place of US among
female subjects. Such an observation is estimated to be consistent with a true
0.941 to 1.014-fold incidence rate ratio with 95% confidence interval. Based
on the p-value of 0.213, we conclude that there is no association between
birth place and incidence of colorectal cancer among females after adjusting
for age and SEER sites.
For the overall estimate of incidence rate ratio, after adjusting for sex, age
and SEER sites, the birth place of foreign countries is estimated to be
associated with a 1.017-fold higher incidence rate of colorectal cancer,
compared to the birth place of US. Such an observation is estimated to be
consistent with a true 0.989 to 1.045-fold incidence rate ratio with 95%
confidence interval. Based on the p-value of 0.242, we conclude that there is
no association between birth place and incidence of colorectal cancer after
adjusting for sex, age and SEER sites.
Based on the results of males and females, we would suspect that the
association between birth place and colorectal cancer incidence differs
between males and females. So, we would like to report the association
between males and females separately.
c. By comparing results form problem 6a and 6b, we can see that the point and
interval estimates of age and sex adjusted relative risk are similar to each
other, though not exactly the same. The estimates are close to each other
because we adjusted for both potential confounding and effect modification
effects in both stratified analysis and Poisson regression analysis. By
including the interaction model in the Poisson regression, we do not borrow
information across subgroups, which we do if only adjust for the main effect
of age and sex. However, the estimates are not exactly the same because the
underlying mathematical methods are different. For stratified analysis, we
are taking the average of rate of exposed and unexposed groups separately
across strata, and then take the ratio, while in the regression analysis, we
calculate the rate ratio for each stratum first, and then take the average of
log ratios, and convert back to the overall rate ratio. In addition, the Poisson
regression model takes into account the mean-variance relationship when
calculating standard errors, while the Mantel-Haenszel method ignored that.
Thus, the interval estimates has a higher precision compared to the stratified
analysis. But in the case of incidence rate ratio, Mantel-Haenszel method
handles it well, so the estimates are similar from both methods.
Download