2902 - Emerson Statistics

advertisement
BIOST 536
Homework #2
Due Date: 10-12-2014
Question 1:
We are interested in analyzing associations between 5-year mortality and prevalence of
ASCVD at study enrollment using statistical methods appropriate for binary response
variables. The observation time for death among these subjects is potentially subject to
censoring. Provide a statistical analysis demonstrating that such methods as logistic
regression can be used to answer this question.
ANSWER:
Methods:
First, I converted the units for the variable “obstime” from days to years (“obstime_yrs”). Then I
checked to see when the first observed censoring time was for patients without mortality.
Second, I had to create a new variable called ASCVD and identify patients who had chd (angina
and myocardial infarction) and stroke (transient ischemic attack and stroke).
To evaluate the association between mortality and ASCVD, a logistic regression was performed
with the dependent variable defined as mortality (“death”) and the predictor of interested defined
as ASCVD. 95% confidence intervals were estimated using Huber-White sandwich estimators
for standard errors.
Results:
First censoring time for patients without mortality was 5.00 years with a max of 5.91. The
average censoring time was 5.33 years (SD, 0.297). Therefore, it was appropriate to use a logistic
regression to identify patients who died within 5 years and not worry about censoring.
There were a total of 735 patients in the study (23 with ASCVD and 579 without). Among the
ASCVD patients, 14 (37.84%) experienced mortality. Among the non-ASCVD patients, 119
(17.05%) experienced mortality.
In the logistic regression model, the odds ratio for death was higher with ASCVD patients
compared to non-ASCVD patients (OR=2.962). Based on the 95% confidence interval, this
observation was not atypical if the true odds ratio was anywhere between 1.480 and 5.926. The
two-tailed p-value was 0.002. Therefore, we have enough confidence to reject the null hypothesis
that there was no association between ASCVD and mortality.
Question 2:
Using the risk difference (RD) as a measure of association, provide statistical inference
regarding an association between 5-year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cs or an
equivalent analysis in R).
ANSWER:
Method:
I evaluated the risk difference of mortality across patients with ASCVD or no ASCVD adjusting
for age and sex. In order to have enough observations in the age categories, I grouped them as
age < 85 and age >= 85 years old. Sex was categorized as male (1==yes, and 0==no). I presented
the results in four connected tables: (1) Females less than 85 years old, (2) Females 85 years old
and greater, (3) Males less than 85 years old, and (4) Males 85 years old and greater.
Since the summary measure was risk difference, I evaluated the difference in proportion of death
for the different strata. However, I needed to define my weights since we could not perform the
Mantel-Haenzel test. Since I was concerned with what would happen if smokers quit, I based the
weights on the exposed patients. 95% confidence intervals were generated based on Wald
statistics for standard errors. Statistical inference was estimated using the 95% confidence
intervals to see if the range included 0 for risk difference.
Results:
Females Less than 85 years old
Death
History of
ASCVD
No
Yes
No
302 (97.11%) 38 (95.00%)
Yes
9 (2.89%)
2 (5.00%)
Total
311
40
Males Less than 85 years old
Death
History of
ASCVD
No
Yes
No
252 (94.74%) 64 (87.67%)
Yes
14 (5.26%)
9 (12.33%)
Total
266
73
Total
340
11
351
Total
316
23
339
Females 85 years old and greater
Death
History of
ASCVD
No
Yes
No
11 (100%)
6 (85.71%)
Yes
0 (0%)
1 (14.29%)
Total
11
7
Males 85 years old and greater
Death
History of
ASCVD
No
Yes
No
14 (100%) 11 (84.62%)
Yes
0 (0%)
2 (15.38%)
Total
14
13
For female patients less than 85 years old, the risk of mortality was 7.00% higher in those with
ASCVD compared to those without ASCVD. Based on the 95% confidence interval, this
observation was not atypical if the true difference was anywhere between 16.03% lower and
30.04% higher in ASCVD females that were less than 85 years old compared to non-ASCVD
females that were less than 85 years old. Since the 95% confidence interval crossed 0, we cannot
reject the null that there is no difference in mortality risk between patients with and without a
history of ASCVD in females with were less than 85 years old.
Female patients that were 85 years old and greater had a 64.71% increased risk of mortality if
they had ASCVD compared to those without ASCVD. Based on the 95% confidence interval,
this observation was not atypical if the true difference in mortality risk was anywhere between
41.99% higher and 87.42% higher in ASCVD females that were 85 years old and greater
compared to non-ASCVD females that were 85 years old and greater. Since the 95% confidence
Total
17
1
18
Total
25
2
27
interval did not cross 0, we reject the null that there is no difference in mortality risk between
patients with and without a history of ASCVD in females 85 years old and greater.
Male patients less than 85 years old with ASCVD had an 18.88% increased mortality risk
compared to similar patients without ASCVD. Based on the 95% confidence interval, this
observation was not atypical if the true difference in mortality risk was anywhere between 1.55%
lower and 39.31% higher in male patient less than 85 years old with ASCVD relative to similar
patients with no ASCVD. Since the 95% confidence interval crossed 0, we cannot reject the null
that there is no difference in mortality risk between patients with and without a history of
ASCVD in males that were less than 85 years old.
Male patients that were 85 years old and greater with ASCVD had a 56% increase mortality risk
compared to similar patients without ASCVD. Based on the 95% confidence interval, this
observation was not atypical if the true difference in mortality risk was anywhere between
36.54% higher and 75.46% higher in male patients that were 85 years old and greater with
ASCVD compared to similar patients without ASCVD. Since the 95% confidence interval did
not cross 0, we reject the null that there is no difference in mortality risk between patients with
and without a history of ASCVD in males 85 years old and greater.
The overall weighted mortality risk difference was 18.60% higher in patients with a history of
ASCVD. Based on the 95% confidence this observation was not atypical if the true mortality risk
for patients with ASCVD was anywhere between 4.11% higher and 33.07% higher relative to
patients with no history of ASCVD. The unadjusted risk difference was 20.79% higher in
patients with a history of ASCVD compared to those without.
b. Answer the question using an appropriate regression model.
ANSWER:
Method:
The summary measure is risk difference, therefore, I opted to use a multiple linear regression
model. The dependent variable was death with the predictor of interest being ASCVD; and
controlling for sex and age. Thus, patients with a history of ASCVD were compared to patients
without a history of ASCVD using the difference in probability of mortality. Crude unadjusted
analysis estimates of the probability of mortality were estimated for patients with and without a
history of ASCVD using sample proportions. Points and interval estimates for the difference in
remission probabilities were based on a linear regression of the binary indicator of mortality on a
model that included an indicator variable of history of ASCVD, an indicator for sex, and an
indicator for being greater than 85 years old. The sex and age-adjusted estimate of treatment
effect was based on the regression parameter for the indicator of history of ASCVD. The 95%
confidence intervals for sex and age-adjusted difference in mortality were determined using
Wald type inference based on the Huber-White sandwich estimator for standard errors in order to
allow for the mean-variance relationship associated with binary random variables. Confidence
intervals and p-values assumed the approximate normal distribution for the regression parameter
estimates.
Results:
In the crude linear regression model, patients with a history of ASCVD had a 20.79% higher
mortality risk compared to patients without a history of ASCVD. Based on the 95% confidence
interval, this observation was not atypical if the true difference in mortality risk for patients with
a history of ASCVD was anywhere between 4.87% higher and 36.71% higher compared to
patients without a history of ASCVD. The two-sided p-value was 0.011. Therefore we can reject
the null hypothesis that there is no association between having a history of ASCVD and
mortality.
In the sex and age-adjusted multiple linear regression, patients with ASCVD has a 18.47%
higher risk of mortality compared to patients without ASCVD controlling for age and sex. Based
on the 95% confidence interval, this observation was not atypical if the true difference was
anywhere between 3.41% higher and 33.52% higher for ASCVD patients compared to nonASCVD patients controlling for age and sex. The two-tailed p-value was 0.016. Therefore, I am
confident that I can reject the null hypothesis that there is no difference in risk between patients
with ASCVD and no ASCVD while controlling for age and sex.
The crude mortality risk was 20.79% higher in patients with a history of ASCVD compared to
those without. The adjusted model reported a mortality risk that was 18.47% higher in patients
with a history of ASCVD compared to those without. The adjusted model is not that different in
terms of mortality risk. Both estimates report similar magnitude and direction of the ASCVD
effect on mortality.
c. What is the difference in the statistical models you used? That is, how would you explain
any differences between the two analysis approaches?
ANSWER:
This depends on how much weights are placed on the internal versus external weights. I opted to
go with internal weights when performing the stratified analysis. The pooled weight-adjusted
mortality risk in the stratified analysis was 18.60% higher in patients with ASCVD compared to
non-ASCVD in the stratified analysis with a 95% confidence interval of 4.11% and 33.07%
higher in the ASCVD group compared to the non-ASCVD group. This was slightly higher
compared to the multiple linear regression where the increased risk was 18.47%. Difference may
be due to the weights used for the stratified analysis. In addition the differences could also be due
to the methods in the regression models. Liner regression models borrow information from the
other variables. However, the adjusted and crude was only 0.13% different.
In the crude analysis, the stratified pooled risk was 20.79% higher in patients with a history of
ASCVD compared to those without. This was exactly the same for the unadjusted linear
regression model. This is because the crude model was a saturated model and I expected to get
the same results as the stratified analysis, which pooled the data (no internal weights were
applied).
Question 3:
Using the risk difference (OR) as a measure of association, provide statistical inference
regarding an association between 5-year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command cc or an
equivalent analysis in R).
ANSWER:
Method:
In the stratified analysis, I was interested in studying the association between history of ASCVD
and mortality. However, I stratified on both age (defined as 85 years and older) and sex to
investigate any change in treatment effect across the different levels of sex and age. Using the
odds ratio as my summary of measure, statistical inference was determined using the MantelHaenszel chi square test. Estimates and 95% confidence intervals were determined using
Cornfield method and weights for each stratum were determined by Mantel-Haenszel weights.
Mantel-Haenszel test for homogeneity was performed to see if the strata specific estimates were
significantly different at different levels of age and sex. Although, one would need to be concern
about multiple comparisons and too many statistical tests being done that would deflate the pvalue.
Please see Question 2 part (a) for methods and descriptive results which include proportion death
and alive after 5 years across patients with and without a history of ASCVD stratified by age and
sex.
Results:
Because patients who were female and male and 85 years old and greater had 0 counts for those
with a history of ASCVD and no deaths. As a result, I was unable to provide odds ratio estimates
and 95% confidence for all the strata.
However, the pooled and Mantel-Haenszel weight-adjusted odds ratios and 95% confidence
intervals were estimated using Cornfield method. In the pooled results, patients with a history of
ASCVD had higher odds of mortality (OR=2.96) compared to those without. Based on the 95%
confidence interval, this observation was not atypical if the odds ratio for mortality between
patients with a history of ASCVD and without was anywhere between 1.50 and 5.86.
In the Mantel-Haenszel weight-adjusted odds ratio, patients who had a history of ASCVD had
higher odds of mortality compared to those without (OR=2.78). Based on the 95% confidence
interval, patients, the observation was not atypical if the true odds ratio for mortality between
patients with a history of ASCVD and without was anywhere between 1.34 and 5.77.
The Mantel-Haenszel test for homogeneity was not statistically significant at the 5% level (twosided p-value = 0.5455); therefore, we cannot reject the null that there was no association in the
different levels of age and sex that affected the relationship between having a history of ASCVD
and mortality.
In the combined test that evaluated the association between history of ASCVD and mortality, the
two-sided chi square p-value was 0.0057; therefore, I was able to reject the null hypothesis that
there was no association between history of ASCVD and mortality.
b. Answer the question using an appropriate regression model.
ANSWER:
Methods:
Because the summary measure of interest is odds ratio, I opted to use a logistic regression model.
Death is a binary outcome, and ASCVD is a binary predictor; and sex and age were controlled
for in the model. Crude (unadjusted) estimates for the probability of mortality due to a history of
ASCVD and without were estimated using sample proportions. Point and interval estimates for
the treatment odds ratio were based on a logistic regression of the binary indicator for mortality
on a model that included an indicator of ASCVD history, an indicator for sex, and an indicator
for being 85 years old and older. The sex and age-adjusted estimate of treatment effect were
based on the regression parameter for the indicator of treatment. The 95% confidence intervals
were determined using the Huber-White sandwich estimator for standard errors.
Results:
In the crude logistic regression model, patients with a history of ASCVD had higher odds of
morality compared to patients without (OR=2.96). Based on the 95% confidence interval, this
observation was not atypical if the true odds ratio for mortality comparing patients with ASCVD
history to those without was anywhere between 1.48 and 5.93. The two-sided p-value was 0.002;
therefore, based on the results of the crude model, I can reject the null hypothesis that there is no
association between ASCVD history and death.
In the adjusted logistic regression model, patients with ASCVD had higher odds of mortality
compared to patients without ASCVD (OR=2.68) while controlling for age and sex. Based on
the 95% confidence interval, this observation was not atypical if the true odds ratio of mortality
with ASCVD was anywhere between 1.35 and 5.31 for ASCVD patients and non-ASCVD
patients controlling for age and sex. The two-sided p-value was 0.005; therefore, I had enough
evidence to reject the null hypothesis that there was no difference in odds of mortality between
patients with ASCVD and without ASCVD while controlling for age and sex.
c. What is the difference in the statistical models you used? That is, how would you explain
any differences between the two analysis approaches?
ANSWER:
The Mantel-Haenszel OR in the stratified analysis was 2.78, which was higher than the adjustedregression OR (OR=2.68). The point and interval estimates from the stratified analysis were the
weighted average of stratum-specific estimates. Unlike the adjusted model, the stratified analysis
evaluated each stratum separately. Unfortunately, we did not have values in some of the cells,
which yielded an unintuitive result.
The adjusted-regression model borrowed associations from within the different strata to estimate
the point and interval estimates. In other words, the regression model averaged the treatment
effects across all strata yielding a single point and interval estimates. Since the magnitude and
direction for the stratified and regression results were similar, I was not too concerned about
interaction affects. Consequently, I was concerned with confounding, which was addressed in the
adjusted regression model.
Question 4:
Using the risk difference (RR) as a measure of association, provide statistical inference
regarding an association between 5-year survival and baseline prevalence of ASCVD,
adjusting for age and sex.
a. Answer the question using a stratified analysis (e.g., using Stata command ir or an
equivalent analysis in R).
ANSWER:
Method:
In the stratified analysis, I am interested in studying the association between history of ASCVD
and mortality. However, I will stratify on both age (defined as 85 years and older) and sex. Using
the risk ratio as my summary of measure, statistical inference was determined using the MantelHaenszel chi square test. Estimates and 95% confidence intervals were determined using
Cornfield method and weights for each stratum were determined by Mantel-Haenszel weights.
Mantel-Haenszel test for homogeneity was performed to see if the strata specific estimates were
significantly different at different levels of age and sex. Although, one would need to be concern
about multiple comparisons and too many statistical tests being done that would deflate the pvalue.
Please see Question 2 part (a) for methods and descriptive results which include proportion death
and alive after 5 years across patients with and without a history of ASCVD stratified by age and
sex.
Results:
In patients who were female and less than 85 years old, there was a 62.68% increased risk of
mortality if they had a history of ASCVD compared to those without (RR=1.63). Based on the
95% confidence interval, the observed mortality risk was not atypical if the true mortality risk
was anywhere between 55.71% less and 590.3% higher in females less than 85 years old with a
history of ASCVD compared to those without (95% CI: 0.448, 5.903). Because the 95% CI
includes 1, I don’t have enough evidence to reject the null hypothesis that there is no association
between ASCVD history and mortality risk in females less than 85 years old.
In patients who were female 85 years old and older, there was a 183.3% increased risk of
mortality if they had a history of ASCVD compared to those without (RR=2.83). Based on the
95% confidence interval, the observed mortality risk was not atypical if the true mortality risk
was anywhere between 48.86% higher and 439.3% higher in female 85 years old and older with
a history of ASCVD compared to those without (95% CI: 1.489, 5.393). Because the 95% CI
does not include 1, I have enough evidence to reject the null hypothesis that there is no
association between ASCVD history and mortality risk in female 85 years old and older.
In patients who were male and less than 85 years old, there was a 93.21% increased risk of
mortality if they had a history of ASCVD compared to those without (RR=1.93). Based on the
95% confidence interval, the observed mortality risk was not atypical if the true mortality risk
was anywhere between 10.95% higher and 236.4% higher in males less than 85 years old with a
history of ASCVD compared to those without (95% CI: 1.109, 3.364). Because the 95% CI
includes 1, I don’t have enough evidence to reject the null hypothesis that there is no association
between ASCVD history and mortality risk in males less than 85 years old.
In patients who were male 85 years old and older, there was a 127.3% increased risk of mortality
if they had a history of ASCVD compared to those without (RR=2.27). Based on the 95%
confidence interval, the observed mortality risk was not atypical if the true mortality risk was
anywhere between 46.05% higher and 253.7% higher in female 85 years old and older with a
history of ASCVD compared to those without (95% CI: 1.460, 3.537). Because the 95% CI does
not include 1, I have enough evidence to reject the null hypothesis that there is no association
between ASCVD history and mortality risk in female 85 years old and older.
The crude risk ratio was 2.219, which indicates that for all patients with a history of ASCVD has
a 21.9% increased risk of mortality compared to those without. Based on the 95% confidence
interval, this observation was not atypical if the true risk ratio was anywhere between 1.42 and
3.46 comparing all patients with a history of ASCVD to those without.
The adjust risk ratio was 1.964, which indicates that for all patients with a history of ASCVD has
a 96.4% increased mortality compare to those without. Based on the 95% confidence interval,
this observation was not atypical if the true risk ratio was anywhere between 1.29 and 2.99
comparing call patients with a history of ASCVD to those without.
b. Answer the question using an appropriate regression model.
ANSWER:
Methods:
Exposure arms were compared using the risk ratio comparing the probability of mortality for
patients with and without a history of ASCVD. Crude (unadjusted) estimates of the probability
of mortality were estimated for patients with a history of ASCVD and without using the
sampling proportions. Point and interval estimates for odds ratio for patients with a history of
ASCVD experiencing mortality compared to those without were based on a Poisson regression
of the binary indicator of mortality on a model that included a an indictor for history of ASCVD,
an indicator for sex, and an indicator for being 85 years and older. The age and sex-adjusted
estimate of the exposure effect was based on the regression parameter for the indicator of having
a history of ASCVD. The 95% confidence interval and p-value were estimated using HuberWhite sandwich estimator for the standard errors.
Results:
In the crude (unadjusted) Poisson regression model, patients with a history of ASCVD had a
higher risk of mortality compared to those without (RR=2.22). Based on the 95% confidence
interval, this observation was not atypical if the true risk ratio for patients with a history of
ASCVD experiencing mortality compared to those without was anywhere between 1.42 and
3.46. The two-sided p-value was <0.0001; therefore, I have enough evidence to reject the null
hypothesis that there was no difference in risk in patients with and without a history of ASCVD
and mortality.
In the adjusted Poisson regression model, patients with ASCVD has a 97.02% higher risk of
mortality compared to patients without ASCVD controlling for age and sex (RR=1.97). Based on
the 95% confidence interval, this increased risk of mortality was not atypical if the true risk was
anywhere between 30.09% higher and 298.4% higher with ASCVD compared to non-ASCVD
patients controlling for age and sex (or 95% CI for RR: 1.30, 2.98). The two-sided p-value was
0.001; therefore, I had enough evidence to reject the null that there was no difference in risk
between patients with ASCVD and without ASCVD controlling for age and sex.
c. What is the difference in the statistical models you used? That is, how would you explain
any differences between the two analysis approaches?
ANSWER:
In both the stratified analysis and the Poisson regression model, the point and interval estimates
were similar in magnitude and direction. Although, in the stratified analysis, females who were
less than 85 years old did not achieve statistical significance, the magnitude and direction were
similar with other strata.
With the adjusted model, the point and interval estimates were determined by borrowing
information from within the strata. Whereas, in the stratified analysis, I can see the different risk
ratio for the different strata as well as the pooled crude and adjusted, which were done with
Mantel-Haenszel weights. The regression model provides me with the average difference of
mortality risk between all patients with and without ASCVD history while averaging the effects
from the sex and age covariates.
Since the stratified and regression analysis had similar direction and magnitude of ASCVD
effects, I was not concerned about interaction.
Question 5:
Comment very briefly on the similarity or differences among the three approaches. Which
would you tend to prefer in general? Why?
ANSWER:
The increased risk of mortality was higher for patents with ASCVD compared to non-ASCVD
patients. This was apparent from the different types of summary measures used in the three
regression models and stratified analyses. With multiple linear regression, the risk difference was
18.47% higher compared with the results from the Poisson regression which reported an increase
risk of mortality of 97.02%. The multiple linear regression model provided a summary measure
that was focused on the absolute risk differences between ASCVD and non-ASCVD patients.
Whereas, the Poisson regression provides us with a risk ratio of mortality between ASCVD and
no-ASCVD patients. Accordingly, the risks are vastly different between the two methods. The
difference in the multiple linear regression method appears much smaller when compared to the
increase risk reported using the Poisson regression method.
The logistic regression method uses and odds ratio which makes interpretation and relative
comparison to risk difference and risk ratio difficult, unless we know that the events are rate. In
which case, we can roughly estimate the risk ratio from the odds ratio, if the events were rare.
Assuming that the events in this case were rare, the OR=2.68 is much larger than the RR of 1.97
from the Poisson regression.
Stratified analysis provides me with the ASCVD effects on different strata for sex and age along
with the pooled crude and adjusted estimates and intervals. However, the pooled estimates were
derived using weights. In the linear regression where risk difference was the summary measure, I
had to choose between internal and external weights (or use the exact weights) based on the type
of hypothesis I intended to perform. In this case, I wanted to measure the effect of getting
patients with a history of ASCVD and remove that factor, which meant that I used the internal
weights. In the case where the summary measures were OR and RR, the Mantel-Haenszel
weights were applied. In regression models, the average weight for all strata were combined to
give an average effect on mortality for patient with and without ASCVD history by borrowing
information from with each strata.
What I noticed between the stratified analysis and the regression models was the similarities in
the magnitude and direction of the point and interval estimates. This was important intelling me
whether or not I had some issue with interactions. Since the direction and magnitude between the
stratified and regression results were similar, I did not have to concern myself with interaction.
Rather, I had to be wary about confounding, in which, the regression models were good at
controlling for.
As for preference, I would opt to use a method that was easy to communicate and interpret.
Therefore, the risk difference summary measure from the multiple linear regression model would
adequately fulfill this role. Another reason why multiple linear regression with risk difference as
a summary measure is preferred is attributed to the results from the stratified analysis. I notice
that there were effects that were different within each strata; however, the direction was similar
(the overall effect of increased risk with a history of ASCVD) with all strata. Therefore, using
the regression model will average all these effects into a combined point estimate and interval
that would be useful to average across all strata. If I felt there were differences within each strata,
then would need to include an interaction term with the regression models. I did not because I
did not believe that there was any interaction with age and sex on the ASCVD history to
mortality relationship.
Question 6:
Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the
U.S. as a function of birthplace (U.S. born vs foreign born) (see datafile
surveillance.txt and documentation surveillance.doc on the class web pages).
Using the incidence ratio as a measure of association, provide inference for an association
between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and
SEER.
a. Answer the question using directly standardized rates, with standardization to the U.S.
population.
ANSWER:
Method:
Since I am using the incidence as a ratio of measure of association, I can look at the risk ration as
my summary measure. The response variable is having a case of colorectal cancer. The main
predictor of interest is birthplace (1==US or 0==non-US (foreign)). For part (a), I am interested
in seeing how this relationship differs across several strata: SEER site, age, and sex. An indicator
variable created to represent this three-way interaction. The indicator variable took into account
SEER site, male, and sex using a formula where the first digit was coded as 1 for male and “_”
for female; the second term was for SEER site (range 1 to 9), and the last term was for the age
group. The indicator variable was coded as “mSEERage.”
US standardized weights were estimated based on the number persons-years of the US
population with each strata of SEER site, sex, and age group.
The crude and adjusted (based on internal weights) point estimates and confidence intervals were
computing based on the internal weights because the question I was interested in was whether
patients who were born in the US were, instead, born outside the US and the potential association
with cancer incidence. Since a p-value was not generated, I used the inclusion of 1 in the risk
ratio as an indication of non-statistical significance at the 5% level.
I restricted my analysis to patients who were 18 years old an greater and excluded patients who
did not know where their place of birth was.
Results:
In the stratified analysis for the adults (greater than 18 years), the crude risk ratio was 53.5% less
for patients who were US born whites compared to foreign born whites in having colorectal
cancer (IRR=0.465). Based on the 95% confidence interval, this observation was not atypical if
the true incidence risk ratio for having colorectal cancer was anywhere between 0.456 and 0.475.
In the adjusted risk ratio using internal weights, the risk of developing colorectal cancer was
1.62% less in US born relative to foreign born whites (IRR=0.984). Based on the 95%
confidence interval, this observation was not atypical if the true incidence risk ratio or
developing colorectal cancer was anywhere between 0.953 and 1.02. Both analyses yield
different statistical conclusions for both males and females. In the crude, the confidence interval
did not include 1; whereas, in the adjusted analysis, it did.
When the analysis was restricted to females, the crude risk was 47.3% lower in US-born
compared to non-US born whites for developing colorectal cancer (IRR=0.527). Based on the
95% confidence interval, this observation was not atypical if the true incidence risk ratio was
anywhere between 0.512 and 0.542. In the adjusted analysis using internal weights, the risk was
similar between US- and non-US-born whites in developing colorectal cancer (IRR=1.008).
Based on the 95% confidence interval, this observation was not atypical if the true incidence risk
ratio was anywhere between 0.964 and 1.054. Both analyses yield different statistical
conclusions for females. In the crude, the confidence interval did not include 1; whereas, in the
adjusted analysis, it did.
When the analysis was restricted to males, the crude risk was 59.4% lower in US-born compared
to non-US born whites for developing colorectal cancer (IRR=0.406). Based on the 95%
confidence interval, this observation was not atypical if the true incidence risk ratio was
anywhere between 0.394 and 0.418. In the adjusted analysis using internal weights, the risk was
similar between US- and non-US-born whites in developing colorectal cancer (IRR=0.957).
Based on the 95% confidence interval, this observation was not atypical if the true incidence risk
ratio was anywhere between 0.915 and 1.001. Both analyses yield different statistical
conclusions for males. In the crude, the confidence interval did not include 1; whereas, in the
adjusted analysis, it did.
The crude risk ratios were significant based on the 5% level; however, the adjusted risk ratios
were not. However, the magnitude and direction were similar for males and females. This
indicates that there is some kind of confounding going when sex is included.
b. Answer the question using an appropriate regression model.
ANSWER:
Method:
The association between US- and non-US born whites in the SEER sites with incidence of
colorectal cancer was evaluated using a Poisson regression model. Crude (unadjusted) analysis
was performed using an indicator variable for the outcome (cases), exposure (birthplace), and
person-years contributed by the population. In the adjusted model, the same outcome, exposure,
and time variables were used along with indicator variables for sex (male; 1==yes, 0==no), age
group, and SEER site. Point and interval estimates and P-values were determined using the
Huber-White sandwich estimator for standard errors.
Results:
In the crude analysis, the risk ratio was 94.2% higher in US-born whites for developing
colorectal cancer compared to non-US-born whites (RR=1.942). Based on the 95% confidence
interval, this observation was not atypical if the true risk ratio was anywhere between 1.628 and
2.557. The two-sided p-value was <0.0001; therefore, I had enough evidence to reject the null
hypothesis that there was no association between birthplace and incidence of colorectal cancer.
In the adjusted analysis, the risk ratio was 74.7% higher in US-born whites for developing
colorectal cancer compared to non-US-born whites (RR=1.737). Based on the 95% confidence
interval, this observation was not atypical if the true risk ratio was anywhere between 1.492 and
1.982. The two-sided p-value was <0.0001; therefore, I had enough evidence to reject the null
hypothesis that there was no association between birthplace and incidence of colorectal cancer.
c. What is the difference in the statistical models you used? That is, how would you explain
any differences between the two analysis approaches?
ANSWER:
The stratified analysis yield crude estimates that were significant both males and females that
US-born whites had lower risk of developing colorectal cancer compared to foreign born whites.
However, the adjusted estimates did not yield significant associations. In the Poisson regression
model, the crude model reported significantly higher risk with US-born whites in developing of
colorectal cancer compared to non-US-born whites. Even after adjusting for sex, age, and SEER
site, the associations remained significant.
In the Poisson regression model, the estimates were averaged across the different strata.
However, the stratified analysis did not do this, rather, the risk was reported for the different
combinations of strata for sex, age, and SEER site. The Poisson regression model would not be
able to detect interaction affects without including an interaction term. In my analysis I opted to
not include the interaction term to see how the model would average the point and interval
estimates. It’s clear that the average effects do not take into account potential interactions
between age, sex, and SEER site. A better model may need to include these terms along with all
2-way combinations and a 3-way interaction term.
Download