BIOST 536 Homework #2 Due Date: 10-12-2014 Question 1: We are interested in analyzing associations between 5-year mortality and prevalence of ASCVD at study enrollment using statistical methods appropriate for binary response variables. The observation time for death among these subjects is potentially subject to censoring. Provide a statistical analysis demonstrating that such methods as logistic regression can be used to answer this question. ANSWER: Methods: First, I converted the units for the variable “obstime” from days to years (“obstime_yrs”). Then I checked to see when the first observed censoring time was for patients without mortality. Second, I had to create a new variable called ASCVD and identify patients who had chd (angina and myocardial infarction) and stroke (transient ischemic attack and stroke). To evaluate the association between mortality and ASCVD, a logistic regression was performed with the dependent variable defined as mortality (“death”) and the predictor of interested defined as ASCVD. 95% confidence intervals were estimated using Huber-White sandwich estimators for standard errors. Results: First censoring time for patients without mortality was 5.00 years with a max of 5.91. The average censoring time was 5.33 years (SD, 0.297). Therefore, it was appropriate to use a logistic regression to identify patients who died within 5 years and not worry about censoring. There were a total of 735 patients in the study (23 with ASCVD and 579 without). Among the ASCVD patients, 14 (37.84%) experienced mortality. Among the non-ASCVD patients, 119 (17.05%) experienced mortality. In the logistic regression model, the odds ratio for death was higher with ASCVD patients compared to non-ASCVD patients (OR=2.962). Based on the 95% confidence interval, this observation was not atypical if the true odds ratio was anywhere between 1.480 and 5.926. The two-tailed p-value was 0.002. Therefore, we have enough confidence to reject the null hypothesis that there was no association between ASCVD and mortality. Question 2: Using the risk difference (RD) as a measure of association, provide statistical inference regarding an association between 5-year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cs or an equivalent analysis in R). ANSWER: Method: I evaluated the risk difference of mortality across patients with ASCVD or no ASCVD adjusting for age and sex. In order to have enough observations in the age categories, I grouped them as age < 85 and age >= 85 years old. Sex was categorized as male (1==yes, and 0==no). I presented the results in four connected tables: (1) Females less than 85 years old, (2) Females 85 years old and greater, (3) Males less than 85 years old, and (4) Males 85 years old and greater. Since the summary measure was risk difference, I evaluated the difference in proportion of death for the different strata. However, I needed to define my weights since we could not perform the Mantel-Haenzel test. Since I was concerned with what would happen if smokers quit, I based the weights on the exposed patients. 95% confidence intervals were generated based on Wald statistics for standard errors. Statistical inference was estimated using the 95% confidence intervals to see if the range included 0 for risk difference. Results: Females Less than 85 years old Death History of ASCVD No Yes No 302 (97.11%) 38 (95.00%) Yes 9 (2.89%) 2 (5.00%) Total 311 40 Males Less than 85 years old Death History of ASCVD No Yes No 252 (94.74%) 64 (87.67%) Yes 14 (5.26%) 9 (12.33%) Total 266 73 Total 340 11 351 Total 316 23 339 Females 85 years old and greater Death History of ASCVD No Yes No 11 (100%) 6 (85.71%) Yes 0 (0%) 1 (14.29%) Total 11 7 Males 85 years old and greater Death History of ASCVD No Yes No 14 (100%) 11 (84.62%) Yes 0 (0%) 2 (15.38%) Total 14 13 For female patients less than 85 years old, the risk of mortality was 7.00% higher in those with ASCVD compared to those without ASCVD. Based on the 95% confidence interval, this observation was not atypical if the true difference was anywhere between 16.03% lower and 30.04% higher in ASCVD females that were less than 85 years old compared to non-ASCVD females that were less than 85 years old. Since the 95% confidence interval crossed 0, we cannot reject the null that there is no difference in mortality risk between patients with and without a history of ASCVD in females with were less than 85 years old. Female patients that were 85 years old and greater had a 64.71% increased risk of mortality if they had ASCVD compared to those without ASCVD. Based on the 95% confidence interval, this observation was not atypical if the true difference in mortality risk was anywhere between 41.99% higher and 87.42% higher in ASCVD females that were 85 years old and greater compared to non-ASCVD females that were 85 years old and greater. Since the 95% confidence Total 17 1 18 Total 25 2 27 interval did not cross 0, we reject the null that there is no difference in mortality risk between patients with and without a history of ASCVD in females 85 years old and greater. Male patients less than 85 years old with ASCVD had an 18.88% increased mortality risk compared to similar patients without ASCVD. Based on the 95% confidence interval, this observation was not atypical if the true difference in mortality risk was anywhere between 1.55% lower and 39.31% higher in male patient less than 85 years old with ASCVD relative to similar patients with no ASCVD. Since the 95% confidence interval crossed 0, we cannot reject the null that there is no difference in mortality risk between patients with and without a history of ASCVD in males that were less than 85 years old. Male patients that were 85 years old and greater with ASCVD had a 56% increase mortality risk compared to similar patients without ASCVD. Based on the 95% confidence interval, this observation was not atypical if the true difference in mortality risk was anywhere between 36.54% higher and 75.46% higher in male patients that were 85 years old and greater with ASCVD compared to similar patients without ASCVD. Since the 95% confidence interval did not cross 0, we reject the null that there is no difference in mortality risk between patients with and without a history of ASCVD in males 85 years old and greater. The overall weighted mortality risk difference was 18.60% higher in patients with a history of ASCVD. Based on the 95% confidence this observation was not atypical if the true mortality risk for patients with ASCVD was anywhere between 4.11% higher and 33.07% higher relative to patients with no history of ASCVD. The unadjusted risk difference was 20.79% higher in patients with a history of ASCVD compared to those without. b. Answer the question using an appropriate regression model. ANSWER: Method: The summary measure is risk difference, therefore, I opted to use a multiple linear regression model. The dependent variable was death with the predictor of interest being ASCVD; and controlling for sex and age. Thus, patients with a history of ASCVD were compared to patients without a history of ASCVD using the difference in probability of mortality. Crude unadjusted analysis estimates of the probability of mortality were estimated for patients with and without a history of ASCVD using sample proportions. Points and interval estimates for the difference in remission probabilities were based on a linear regression of the binary indicator of mortality on a model that included an indicator variable of history of ASCVD, an indicator for sex, and an indicator for being greater than 85 years old. The sex and age-adjusted estimate of treatment effect was based on the regression parameter for the indicator of history of ASCVD. The 95% confidence intervals for sex and age-adjusted difference in mortality were determined using Wald type inference based on the Huber-White sandwich estimator for standard errors in order to allow for the mean-variance relationship associated with binary random variables. Confidence intervals and p-values assumed the approximate normal distribution for the regression parameter estimates. Results: In the crude linear regression model, patients with a history of ASCVD had a 20.79% higher mortality risk compared to patients without a history of ASCVD. Based on the 95% confidence interval, this observation was not atypical if the true difference in mortality risk for patients with a history of ASCVD was anywhere between 4.87% higher and 36.71% higher compared to patients without a history of ASCVD. The two-sided p-value was 0.011. Therefore we can reject the null hypothesis that there is no association between having a history of ASCVD and mortality. In the sex and age-adjusted multiple linear regression, patients with ASCVD has a 18.47% higher risk of mortality compared to patients without ASCVD controlling for age and sex. Based on the 95% confidence interval, this observation was not atypical if the true difference was anywhere between 3.41% higher and 33.52% higher for ASCVD patients compared to nonASCVD patients controlling for age and sex. The two-tailed p-value was 0.016. Therefore, I am confident that I can reject the null hypothesis that there is no difference in risk between patients with ASCVD and no ASCVD while controlling for age and sex. The crude mortality risk was 20.79% higher in patients with a history of ASCVD compared to those without. The adjusted model reported a mortality risk that was 18.47% higher in patients with a history of ASCVD compared to those without. The adjusted model is not that different in terms of mortality risk. Both estimates report similar magnitude and direction of the ASCVD effect on mortality. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? ANSWER: This depends on how much weights are placed on the internal versus external weights. I opted to go with internal weights when performing the stratified analysis. The pooled weight-adjusted mortality risk in the stratified analysis was 18.60% higher in patients with ASCVD compared to non-ASCVD in the stratified analysis with a 95% confidence interval of 4.11% and 33.07% higher in the ASCVD group compared to the non-ASCVD group. This was slightly higher compared to the multiple linear regression where the increased risk was 18.47%. Difference may be due to the weights used for the stratified analysis. In addition the differences could also be due to the methods in the regression models. Liner regression models borrow information from the other variables. However, the adjusted and crude was only 0.13% different. In the crude analysis, the stratified pooled risk was 20.79% higher in patients with a history of ASCVD compared to those without. This was exactly the same for the unadjusted linear regression model. This is because the crude model was a saturated model and I expected to get the same results as the stratified analysis, which pooled the data (no internal weights were applied). Question 3: Using the risk difference (OR) as a measure of association, provide statistical inference regarding an association between 5-year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cc or an equivalent analysis in R). ANSWER: Method: In the stratified analysis, I was interested in studying the association between history of ASCVD and mortality. However, I stratified on both age (defined as 85 years and older) and sex to investigate any change in treatment effect across the different levels of sex and age. Using the odds ratio as my summary of measure, statistical inference was determined using the MantelHaenszel chi square test. Estimates and 95% confidence intervals were determined using Cornfield method and weights for each stratum were determined by Mantel-Haenszel weights. Mantel-Haenszel test for homogeneity was performed to see if the strata specific estimates were significantly different at different levels of age and sex. Although, one would need to be concern about multiple comparisons and too many statistical tests being done that would deflate the pvalue. Please see Question 2 part (a) for methods and descriptive results which include proportion death and alive after 5 years across patients with and without a history of ASCVD stratified by age and sex. Results: Because patients who were female and male and 85 years old and greater had 0 counts for those with a history of ASCVD and no deaths. As a result, I was unable to provide odds ratio estimates and 95% confidence for all the strata. However, the pooled and Mantel-Haenszel weight-adjusted odds ratios and 95% confidence intervals were estimated using Cornfield method. In the pooled results, patients with a history of ASCVD had higher odds of mortality (OR=2.96) compared to those without. Based on the 95% confidence interval, this observation was not atypical if the odds ratio for mortality between patients with a history of ASCVD and without was anywhere between 1.50 and 5.86. In the Mantel-Haenszel weight-adjusted odds ratio, patients who had a history of ASCVD had higher odds of mortality compared to those without (OR=2.78). Based on the 95% confidence interval, patients, the observation was not atypical if the true odds ratio for mortality between patients with a history of ASCVD and without was anywhere between 1.34 and 5.77. The Mantel-Haenszel test for homogeneity was not statistically significant at the 5% level (twosided p-value = 0.5455); therefore, we cannot reject the null that there was no association in the different levels of age and sex that affected the relationship between having a history of ASCVD and mortality. In the combined test that evaluated the association between history of ASCVD and mortality, the two-sided chi square p-value was 0.0057; therefore, I was able to reject the null hypothesis that there was no association between history of ASCVD and mortality. b. Answer the question using an appropriate regression model. ANSWER: Methods: Because the summary measure of interest is odds ratio, I opted to use a logistic regression model. Death is a binary outcome, and ASCVD is a binary predictor; and sex and age were controlled for in the model. Crude (unadjusted) estimates for the probability of mortality due to a history of ASCVD and without were estimated using sample proportions. Point and interval estimates for the treatment odds ratio were based on a logistic regression of the binary indicator for mortality on a model that included an indicator of ASCVD history, an indicator for sex, and an indicator for being 85 years old and older. The sex and age-adjusted estimate of treatment effect were based on the regression parameter for the indicator of treatment. The 95% confidence intervals were determined using the Huber-White sandwich estimator for standard errors. Results: In the crude logistic regression model, patients with a history of ASCVD had higher odds of morality compared to patients without (OR=2.96). Based on the 95% confidence interval, this observation was not atypical if the true odds ratio for mortality comparing patients with ASCVD history to those without was anywhere between 1.48 and 5.93. The two-sided p-value was 0.002; therefore, based on the results of the crude model, I can reject the null hypothesis that there is no association between ASCVD history and death. In the adjusted logistic regression model, patients with ASCVD had higher odds of mortality compared to patients without ASCVD (OR=2.68) while controlling for age and sex. Based on the 95% confidence interval, this observation was not atypical if the true odds ratio of mortality with ASCVD was anywhere between 1.35 and 5.31 for ASCVD patients and non-ASCVD patients controlling for age and sex. The two-sided p-value was 0.005; therefore, I had enough evidence to reject the null hypothesis that there was no difference in odds of mortality between patients with ASCVD and without ASCVD while controlling for age and sex. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? ANSWER: The Mantel-Haenszel OR in the stratified analysis was 2.78, which was higher than the adjustedregression OR (OR=2.68). The point and interval estimates from the stratified analysis were the weighted average of stratum-specific estimates. Unlike the adjusted model, the stratified analysis evaluated each stratum separately. Unfortunately, we did not have values in some of the cells, which yielded an unintuitive result. The adjusted-regression model borrowed associations from within the different strata to estimate the point and interval estimates. In other words, the regression model averaged the treatment effects across all strata yielding a single point and interval estimates. Since the magnitude and direction for the stratified and regression results were similar, I was not too concerned about interaction affects. Consequently, I was concerned with confounding, which was addressed in the adjusted regression model. Question 4: Using the risk difference (RR) as a measure of association, provide statistical inference regarding an association between 5-year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command ir or an equivalent analysis in R). ANSWER: Method: In the stratified analysis, I am interested in studying the association between history of ASCVD and mortality. However, I will stratify on both age (defined as 85 years and older) and sex. Using the risk ratio as my summary of measure, statistical inference was determined using the MantelHaenszel chi square test. Estimates and 95% confidence intervals were determined using Cornfield method and weights for each stratum were determined by Mantel-Haenszel weights. Mantel-Haenszel test for homogeneity was performed to see if the strata specific estimates were significantly different at different levels of age and sex. Although, one would need to be concern about multiple comparisons and too many statistical tests being done that would deflate the pvalue. Please see Question 2 part (a) for methods and descriptive results which include proportion death and alive after 5 years across patients with and without a history of ASCVD stratified by age and sex. Results: In patients who were female and less than 85 years old, there was a 62.68% increased risk of mortality if they had a history of ASCVD compared to those without (RR=1.63). Based on the 95% confidence interval, the observed mortality risk was not atypical if the true mortality risk was anywhere between 55.71% less and 590.3% higher in females less than 85 years old with a history of ASCVD compared to those without (95% CI: 0.448, 5.903). Because the 95% CI includes 1, I don’t have enough evidence to reject the null hypothesis that there is no association between ASCVD history and mortality risk in females less than 85 years old. In patients who were female 85 years old and older, there was a 183.3% increased risk of mortality if they had a history of ASCVD compared to those without (RR=2.83). Based on the 95% confidence interval, the observed mortality risk was not atypical if the true mortality risk was anywhere between 48.86% higher and 439.3% higher in female 85 years old and older with a history of ASCVD compared to those without (95% CI: 1.489, 5.393). Because the 95% CI does not include 1, I have enough evidence to reject the null hypothesis that there is no association between ASCVD history and mortality risk in female 85 years old and older. In patients who were male and less than 85 years old, there was a 93.21% increased risk of mortality if they had a history of ASCVD compared to those without (RR=1.93). Based on the 95% confidence interval, the observed mortality risk was not atypical if the true mortality risk was anywhere between 10.95% higher and 236.4% higher in males less than 85 years old with a history of ASCVD compared to those without (95% CI: 1.109, 3.364). Because the 95% CI includes 1, I don’t have enough evidence to reject the null hypothesis that there is no association between ASCVD history and mortality risk in males less than 85 years old. In patients who were male 85 years old and older, there was a 127.3% increased risk of mortality if they had a history of ASCVD compared to those without (RR=2.27). Based on the 95% confidence interval, the observed mortality risk was not atypical if the true mortality risk was anywhere between 46.05% higher and 253.7% higher in female 85 years old and older with a history of ASCVD compared to those without (95% CI: 1.460, 3.537). Because the 95% CI does not include 1, I have enough evidence to reject the null hypothesis that there is no association between ASCVD history and mortality risk in female 85 years old and older. The crude risk ratio was 2.219, which indicates that for all patients with a history of ASCVD has a 21.9% increased risk of mortality compared to those without. Based on the 95% confidence interval, this observation was not atypical if the true risk ratio was anywhere between 1.42 and 3.46 comparing all patients with a history of ASCVD to those without. The adjust risk ratio was 1.964, which indicates that for all patients with a history of ASCVD has a 96.4% increased mortality compare to those without. Based on the 95% confidence interval, this observation was not atypical if the true risk ratio was anywhere between 1.29 and 2.99 comparing call patients with a history of ASCVD to those without. b. Answer the question using an appropriate regression model. ANSWER: Methods: Exposure arms were compared using the risk ratio comparing the probability of mortality for patients with and without a history of ASCVD. Crude (unadjusted) estimates of the probability of mortality were estimated for patients with a history of ASCVD and without using the sampling proportions. Point and interval estimates for odds ratio for patients with a history of ASCVD experiencing mortality compared to those without were based on a Poisson regression of the binary indicator of mortality on a model that included a an indictor for history of ASCVD, an indicator for sex, and an indicator for being 85 years and older. The age and sex-adjusted estimate of the exposure effect was based on the regression parameter for the indicator of having a history of ASCVD. The 95% confidence interval and p-value were estimated using HuberWhite sandwich estimator for the standard errors. Results: In the crude (unadjusted) Poisson regression model, patients with a history of ASCVD had a higher risk of mortality compared to those without (RR=2.22). Based on the 95% confidence interval, this observation was not atypical if the true risk ratio for patients with a history of ASCVD experiencing mortality compared to those without was anywhere between 1.42 and 3.46. The two-sided p-value was <0.0001; therefore, I have enough evidence to reject the null hypothesis that there was no difference in risk in patients with and without a history of ASCVD and mortality. In the adjusted Poisson regression model, patients with ASCVD has a 97.02% higher risk of mortality compared to patients without ASCVD controlling for age and sex (RR=1.97). Based on the 95% confidence interval, this increased risk of mortality was not atypical if the true risk was anywhere between 30.09% higher and 298.4% higher with ASCVD compared to non-ASCVD patients controlling for age and sex (or 95% CI for RR: 1.30, 2.98). The two-sided p-value was 0.001; therefore, I had enough evidence to reject the null that there was no difference in risk between patients with ASCVD and without ASCVD controlling for age and sex. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? ANSWER: In both the stratified analysis and the Poisson regression model, the point and interval estimates were similar in magnitude and direction. Although, in the stratified analysis, females who were less than 85 years old did not achieve statistical significance, the magnitude and direction were similar with other strata. With the adjusted model, the point and interval estimates were determined by borrowing information from within the strata. Whereas, in the stratified analysis, I can see the different risk ratio for the different strata as well as the pooled crude and adjusted, which were done with Mantel-Haenszel weights. The regression model provides me with the average difference of mortality risk between all patients with and without ASCVD history while averaging the effects from the sex and age covariates. Since the stratified and regression analysis had similar direction and magnitude of ASCVD effects, I was not concerned about interaction. Question 5: Comment very briefly on the similarity or differences among the three approaches. Which would you tend to prefer in general? Why? ANSWER: The increased risk of mortality was higher for patents with ASCVD compared to non-ASCVD patients. This was apparent from the different types of summary measures used in the three regression models and stratified analyses. With multiple linear regression, the risk difference was 18.47% higher compared with the results from the Poisson regression which reported an increase risk of mortality of 97.02%. The multiple linear regression model provided a summary measure that was focused on the absolute risk differences between ASCVD and non-ASCVD patients. Whereas, the Poisson regression provides us with a risk ratio of mortality between ASCVD and no-ASCVD patients. Accordingly, the risks are vastly different between the two methods. The difference in the multiple linear regression method appears much smaller when compared to the increase risk reported using the Poisson regression method. The logistic regression method uses and odds ratio which makes interpretation and relative comparison to risk difference and risk ratio difficult, unless we know that the events are rate. In which case, we can roughly estimate the risk ratio from the odds ratio, if the events were rare. Assuming that the events in this case were rare, the OR=2.68 is much larger than the RR of 1.97 from the Poisson regression. Stratified analysis provides me with the ASCVD effects on different strata for sex and age along with the pooled crude and adjusted estimates and intervals. However, the pooled estimates were derived using weights. In the linear regression where risk difference was the summary measure, I had to choose between internal and external weights (or use the exact weights) based on the type of hypothesis I intended to perform. In this case, I wanted to measure the effect of getting patients with a history of ASCVD and remove that factor, which meant that I used the internal weights. In the case where the summary measures were OR and RR, the Mantel-Haenszel weights were applied. In regression models, the average weight for all strata were combined to give an average effect on mortality for patient with and without ASCVD history by borrowing information from with each strata. What I noticed between the stratified analysis and the regression models was the similarities in the magnitude and direction of the point and interval estimates. This was important intelling me whether or not I had some issue with interactions. Since the direction and magnitude between the stratified and regression results were similar, I did not have to concern myself with interaction. Rather, I had to be wary about confounding, in which, the regression models were good at controlling for. As for preference, I would opt to use a method that was easy to communicate and interpret. Therefore, the risk difference summary measure from the multiple linear regression model would adequately fulfill this role. Another reason why multiple linear regression with risk difference as a summary measure is preferred is attributed to the results from the stratified analysis. I notice that there were effects that were different within each strata; however, the direction was similar (the overall effect of increased risk with a history of ASCVD) with all strata. Therefore, using the regression model will average all these effects into a combined point estimate and interval that would be useful to average across all strata. If I felt there were differences within each strata, then would need to include an interaction term with the regression models. I did not because I did not believe that there was any interaction with age and sex on the ASCVD history to mortality relationship. Question 6: Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the U.S. as a function of birthplace (U.S. born vs foreign born) (see datafile surveillance.txt and documentation surveillance.doc on the class web pages). Using the incidence ratio as a measure of association, provide inference for an association between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and SEER. a. Answer the question using directly standardized rates, with standardization to the U.S. population. ANSWER: Method: Since I am using the incidence as a ratio of measure of association, I can look at the risk ration as my summary measure. The response variable is having a case of colorectal cancer. The main predictor of interest is birthplace (1==US or 0==non-US (foreign)). For part (a), I am interested in seeing how this relationship differs across several strata: SEER site, age, and sex. An indicator variable created to represent this three-way interaction. The indicator variable took into account SEER site, male, and sex using a formula where the first digit was coded as 1 for male and “_” for female; the second term was for SEER site (range 1 to 9), and the last term was for the age group. The indicator variable was coded as “mSEERage.” US standardized weights were estimated based on the number persons-years of the US population with each strata of SEER site, sex, and age group. The crude and adjusted (based on internal weights) point estimates and confidence intervals were computing based on the internal weights because the question I was interested in was whether patients who were born in the US were, instead, born outside the US and the potential association with cancer incidence. Since a p-value was not generated, I used the inclusion of 1 in the risk ratio as an indication of non-statistical significance at the 5% level. I restricted my analysis to patients who were 18 years old an greater and excluded patients who did not know where their place of birth was. Results: In the stratified analysis for the adults (greater than 18 years), the crude risk ratio was 53.5% less for patients who were US born whites compared to foreign born whites in having colorectal cancer (IRR=0.465). Based on the 95% confidence interval, this observation was not atypical if the true incidence risk ratio for having colorectal cancer was anywhere between 0.456 and 0.475. In the adjusted risk ratio using internal weights, the risk of developing colorectal cancer was 1.62% less in US born relative to foreign born whites (IRR=0.984). Based on the 95% confidence interval, this observation was not atypical if the true incidence risk ratio or developing colorectal cancer was anywhere between 0.953 and 1.02. Both analyses yield different statistical conclusions for both males and females. In the crude, the confidence interval did not include 1; whereas, in the adjusted analysis, it did. When the analysis was restricted to females, the crude risk was 47.3% lower in US-born compared to non-US born whites for developing colorectal cancer (IRR=0.527). Based on the 95% confidence interval, this observation was not atypical if the true incidence risk ratio was anywhere between 0.512 and 0.542. In the adjusted analysis using internal weights, the risk was similar between US- and non-US-born whites in developing colorectal cancer (IRR=1.008). Based on the 95% confidence interval, this observation was not atypical if the true incidence risk ratio was anywhere between 0.964 and 1.054. Both analyses yield different statistical conclusions for females. In the crude, the confidence interval did not include 1; whereas, in the adjusted analysis, it did. When the analysis was restricted to males, the crude risk was 59.4% lower in US-born compared to non-US born whites for developing colorectal cancer (IRR=0.406). Based on the 95% confidence interval, this observation was not atypical if the true incidence risk ratio was anywhere between 0.394 and 0.418. In the adjusted analysis using internal weights, the risk was similar between US- and non-US-born whites in developing colorectal cancer (IRR=0.957). Based on the 95% confidence interval, this observation was not atypical if the true incidence risk ratio was anywhere between 0.915 and 1.001. Both analyses yield different statistical conclusions for males. In the crude, the confidence interval did not include 1; whereas, in the adjusted analysis, it did. The crude risk ratios were significant based on the 5% level; however, the adjusted risk ratios were not. However, the magnitude and direction were similar for males and females. This indicates that there is some kind of confounding going when sex is included. b. Answer the question using an appropriate regression model. ANSWER: Method: The association between US- and non-US born whites in the SEER sites with incidence of colorectal cancer was evaluated using a Poisson regression model. Crude (unadjusted) analysis was performed using an indicator variable for the outcome (cases), exposure (birthplace), and person-years contributed by the population. In the adjusted model, the same outcome, exposure, and time variables were used along with indicator variables for sex (male; 1==yes, 0==no), age group, and SEER site. Point and interval estimates and P-values were determined using the Huber-White sandwich estimator for standard errors. Results: In the crude analysis, the risk ratio was 94.2% higher in US-born whites for developing colorectal cancer compared to non-US-born whites (RR=1.942). Based on the 95% confidence interval, this observation was not atypical if the true risk ratio was anywhere between 1.628 and 2.557. The two-sided p-value was <0.0001; therefore, I had enough evidence to reject the null hypothesis that there was no association between birthplace and incidence of colorectal cancer. In the adjusted analysis, the risk ratio was 74.7% higher in US-born whites for developing colorectal cancer compared to non-US-born whites (RR=1.737). Based on the 95% confidence interval, this observation was not atypical if the true risk ratio was anywhere between 1.492 and 1.982. The two-sided p-value was <0.0001; therefore, I had enough evidence to reject the null hypothesis that there was no association between birthplace and incidence of colorectal cancer. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? ANSWER: The stratified analysis yield crude estimates that were significant both males and females that US-born whites had lower risk of developing colorectal cancer compared to foreign born whites. However, the adjusted estimates did not yield significant associations. In the Poisson regression model, the crude model reported significantly higher risk with US-born whites in developing of colorectal cancer compared to non-US-born whites. Even after adjusting for sex, age, and SEER site, the associations remained significant. In the Poisson regression model, the estimates were averaged across the different strata. However, the stratified analysis did not do this, rather, the risk was reported for the different combinations of strata for sex, age, and SEER site. The Poisson regression model would not be able to detect interaction affects without including an interaction term. In my analysis I opted to not include the interaction term to see how the model would average the point and interval estimates. It’s clear that the average effects do not take into account potential interactions between age, sex, and SEER site. A better model may need to include these terms along with all 2-way combinations and a 3-way interaction term.