Biost 536: Categorical Data Analysis in Epidemiology Emerson, Fall 2014 Homework #2 October 5, 2014 Written problems: To be submitted as a MS-Word compatible file to the class Catalyst dropbox by 5:30 pm on Sunday, October 12, 2014. See the instructions for peer grading of the homework that are posted on the web pages. On this (as all homeworks) Stata / R code and unedited Stata / R output is TOTALLY unacceptable. Instead, prepare a table of statistics gleaned from the Stata output. The table should be appropriate for inclusion in a scientific report, with all statistics rounded to a reasonable number of significant digits. (I am interested in how statistics are used to answer the scientific question.) In all problems requesting “statistical analyses” (either descriptive or inferential), you should present both Methods: A brief sentence or paragraph describing the statistical methods you used. This should be using wording suitable for a scientific journal, though it might be a little more detailed. A reader should be able to reproduce your analysis. DO NOT PROVIDE Stata OR R CODE. Inference: A paragraph providing full statistical inference in answer to the question. Please see the supplementary document relating to “Reporting Associations” for details. Questions 1-5 refer to analyses of at the data in the file mri.txt that is located on the class webpages. In those questions we are interested in associations between 5 year mortality and prevalence of atherosclerotic cardiovascular disease (ASCVD) as defined by history of prior angina, myocardial infarction, transient ischemic attacks, or stroke. You will likely find it useful to create a new variable indicating ASCVD. This variable can be derived from the chd and stroke variables. For these questions, we will be considering adjustment for age and sex using both stratified and regression analyses. For the stratified analyses, it will be necessary to use an appropriate categorization of age. 1. We are interested in analyzing associations between 5 year mortality and prevalence of ASCVD at study enrollment using statistical methods appropriate for binary response variables. The observation time for death among these subjects is potentially subject to censoring. Provide a statistical analysis demonstrating that such methods as logistic regression can be used to answer this question. Answer: Method: To check if there is censoring before the 5-year time point, we summarize the observation time of subjects with and without death. Results: Among the total 735 subjects, there are a 121 (16.5%) death within 5-years after MRI. The average observation time among those who dead within 5 years after MRI is 2.97 years (1083 days), with a minimum time of 0.19 year (68 days) and a maximum of 4.98 years (1819 days). The average observation time among those who did not die within 5 years after MRI is 5.33 years (1946 days), with a minimum time of 5 years (1827 days) and a maximum of 5.91 years (2159 days). Since all the death events prior to the 5-year time point are observed, and no subject is censored without an observation of death prior to the 5-year time point. There is no censoring in regard with our scientific question of interest. Given that, logistic regression analysis by dichotomizing death events regardless of observation time is suitable to answer this question since all the subjects are followed up to 5 years after MRI. 2. Using the risk difference (RD) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cs or an equivalent analysis in R). b. Answer the question using an appropriate regression model. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: a. Methods: For each subject, if he/she had been diagnosed with coronary heart disease or a cerebrovascular disease prior to MRI, he/she is considered to have prevalent ASCVD, which is dichotomous in the analysis. For each subject, if the vital status is dead and the observation time is shorter than 5 years, he/she is considered to be dead within 5 years, and the 5 year mortality for each subject is also dichotomous. In order to do stratified analysis, we also categorized age based on a 5-year period (65-69, 70-74, 75-79, 80-84, and >=85), while subjects who are 90 or older are combined with those who are 85 to 89, due to small sample size for the oldest group. The 5-year age period has a good balance between keeping the precision and accuracy for the age trend, have sufficient cell counts for each stratum and have a small number of categories for stratified analysis. The association between 5-year mortality and ASCVD prevalence adjusted for sex and age is calculated using a Cochran-Mantel-Haenszel approach. The risk difference is calculated within each stratum of male and age categories. The weight for each stratum is calculated based on the age and sex distribution of subjects with ASCVD prior to MRI. The overall risk difference is then calculated. Corresponding 95% confidence interval for adjusted risk difference is also computed. Results: There are 68 people dead within 5 years among 217 subjects with ASCVD prior to MRI (31.34%), and 53 people dead within 5 years among 518 subjects without ASCVD prior to MRI (10.23%). Among subjects with ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI (22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of 236) males and 8.16% (23 out of 282) females dead within 5 years after MRI. Subjects with ASCVD have a higher risk of 5-year mortality at each age group, compared to subjects without ASCVD. The crude risk difference is 0.211, with 95% confidence interval of 0.144 to 0.278. Without adjustment for age and sex, subjects with ASCVD prior to MRI have a 21.1% higher risk of death within 5 years after MRI, compared to subjects without ASCVD. Based on estimates by Cochran-Mantel-Haenszel method adjusting for age and 5-year age category, the risk difference adjusted for age and sex is 0.181, with a 95% confidence interval of 0.115 to 0.246. This indicated that after adjusting for age and sex, subjects with ASCVD prior to MRI have a 18.1% absolute higher risk in 5 year mortality probabilities, compared to subjects without ASCVD prior to MRI. With 95% confidence, such an observation is judged to be consistent with a true absolute higher risk of 5-year mortality between 11.5% and 24.6%. Based on the adjusted risk difference and corresponding 95% confidence interval, we conclude that prevalence of ASCVD is associated with 5-year mortality. b. Methods: The sex and age adjusted estimates of ASCVD and 5-year mortality risk difference was based on a linear regression parameter for the indicator of ASCVD, including age and sex and the interaction term between age and sex as covariates. Age is considered as a continuous variable for both its main effect and the interaction term with sex. A 95% confidence interval for the sex and age adjusted difference in 5-year mortality difference was calculated using Wald type inference based on the Huber-White sandwich estimator of standard errors in order to allow for the mean-variance relationship associated with binary random variables. Confidence intervals and p values assumed the approximate normal distribution for the regression parameter estimates. Results: There are 68 people dead within 5 years among 217 subjects with ASCVD prior to MRI (31.34%), and 53 people dead within 5 years among 518 subjects without ASCVD prior to MRI (10.23%). Among subjects with ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI (22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of 236) males and 8.16% (23 out of 282) females dead within 5 years after MRI. Subjects with ASCVD have a higher risk of 5-year mortality at each age group, compared to subjects without ASCVD. Based on a linear regression model adjusting for sex, age and the interaction term between sex and age, prevalence of ASCVD is estimated to be associated with an absolute 18.7% higher risk in the probability of death within 5 years. With 95% confidence, such an estimate is judged to be consistent with a true absolute higher risk of 5-year mortality between 12.0% and 25.4%. Based on a two-sided p value of 0.000, we reject the null hypothesis that the 5-year mortality is equal among subjects with and without ASCVD prior to MRI. c. By comparing results form problem 2a and 2b, we can see that the estimates of age and sex adjusted risk difference are close to each other although the 95% confidence intervals are a little bit difference. The estimates are close to each other because we adjusted for both potential confounding and effect modification effects in both stratified analysis and linear regression analysis. By including the interaction model in the linear regression, we do not borrow information across subgroups, which we do if only adjust for the main effect of age and sex. However, the estimates are not exactly the same because we are using a 5-year categorized age group in the stratified analysis, while a continuous variable of age is used in the linear regression analysis, which has more precision of capturing the trend of disease risk at different ages. Last but not least, the linear regression model takes into account the meanvariance relationship when calculating standard errors, while the MantelHaenszel model ignores that, thus the interval estimates from the linear regression analysis has a higher precision compared to the stratified analysis. 3. Using the risk difference (OR) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command cc or an equivalent analysis in R). b. Answer the question using an appropriate regression model. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: a. Methods: For each subject, if he/she had been diagnosed with coronary heart disease or a cerebrovascular disease prior to MRI, he/she is considered to have prevalent ASCVD, which is dichotomous in the analysis. For each subject, if the vital status is dead and the observation time is shorter than 5 years, he/she is considered to be dead within 5 years, and the 5 year mortality for each subject is also dichotomous. In order to do stratified analysis, we also categorized age based on a 5-year period (65-69, 70-74, 75-79, 80-84, and >=85), while subjects who are 90 or older are combined with those who are 85 to 89, due to small sample size for the oldest group. The 5-year age period has a good balance between keeping the precision and accuracy for the age trend, have sufficient cell counts for each stratum and have a small number of categories for stratified analysis. The association between 5-year mortality and ASCVD prevalence adjusted for age and sex is calculated using a Cochran-Mantel-Haenszel approach. The odds ratio is calculated within each stratum of male and age categories. The weight for each stratum is calculated based on the marginal counts for each stratum. The overall odds ratio is then calculated. Corresponding 95% confidence interval for adjusted odds ratio is also computed using MantelHaenszel method. Results: There are 68 people dead within 5 years among 217 subjects with ASCVD prior to MRI (31.34%), and 53 people dead within 5 years among 518 subjects without ASCVD prior to MRI (10.23%). Among subjects with ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI (22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of 236) males and 8.16% (23 out of 282) females dead within 5 years after MRI. Subjects with ASCVD have a higher risk of 5-year mortality at each age group, compared to subjects without ASCVD. The crude odds ratio is 4.00, with 95% confidence interval of 2.62 to 6.12. Without adjustment for age and sex, subjects with ASCVD prior to MRI have 4-fold higher odds of death within 5 years after MRI, compared to subjects without ASCVD. Such an observation is judged to be consistent with a true 2.62-fold to 6.12-fold relative increase in the odds or 5-year mortality. Based on estimates by Cochran-Mantel-Haenszel method adjusting for age and 5-year age category, the odds ratio adjusted for age and sex is 3.50, with a 95% confidence interval of 2.31 to 5.29. This indicated that after adjusting for age and sex, subjects with ASCVD prior to MRI have 3.5-fold higher odds in 5 year mortality probabilities, compared to subjects without ASCVD prior to MRI. With 95% confidence, such an observation is judged to be consistent with a true 2.31-fold to 5.29-fold relative higher odds of 5-year mortality. Based on the adjusted odd ratio and corresponding 95% confidence interval, we conclude that prevalence of ASCVD is associated with 5-year mortality. b. Methods: The sex and age adjusted estimates of ASCVD and 5-year mortality odds ratio was based on a logistic regression of the binary indicator of 5-year mortality on a model that included an indicator of ASCVD, age and sex and the interaction term between age and sex as covariates. Age is considered as a continuous variable for both its main effect and the interaction term with sex. The sex and age adjusted estimate of 5-year mortality was based on the regression parameter for the indicator of ASCVD. A 95% confidence interval for the sex and age adjusted difference in 5-year mortality difference was calculated assuming the approximate normal distribution for the regression parameter estimates. Results: There are 68 people dead within 5 years among 217 subjects with ASCVD prior to MRI (31.34%), and 53 people dead within 5 years among 518 subjects without ASCVD prior to MRI (10.23%). Among subjects with ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI (22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of 236) males and 8.16% (23 out of 282) females dead within 5 years after MRI. Subjects with ASCVD have a higher risk of 5-year mortality at each age group, compared to subjects without ASCVD. Based on a logistic regression model adjusting for sex and age, prevalence of ASCVD is estimated to be associated with 3.50-fold higher odds of death within 5 years after MRI. With 95% confidence, such an estimate is judged to be consistent with a true odds ratio of 5-year mortality between 2.32 and 5.29. Based on a two-sided p value of 0.000, we reject the null hypothesis that the 5-year mortality is equal among subjects with and without ASCVD prior to MRI. c. By comparing results form problem 3a and 3b, we can see that the estimates of age and sex adjusted odds ratio are close to each other although the 95% confidence intervals are a little bit difference. The estimates are close to each other because we adjusted for both potential confounding and effect modification effects in both stratified analysis and logistic regression analysis. By including the interaction model in the logistic regression, we do not borrow information across subgroups, which we do if only adjust for the main effect of age and sex. However, the estimates are not exactly the same because the underlying mathematical methods are different. For stratified analysis, we are taking the average of odds of exposed and unexposed groups separately across strata, and then take the ratio, while in the regression analysis, we calculate the odds ratio for each stratum first, and then take the average of log ratios, and convert back to the overall odds ratio. In addition, we are using a 5-year categorized age group in the stratified analysis, while a continuous variable of age is used in the logistic regression analysis, which has more precision. 4. Using the risk difference (RR) as a measure of association, provide statistical inference regarding an association between 5 year survival and baseline prevalence of ASCVD, adjusting for age and sex. a. Answer the question using a stratified analysis (e.g., using Stata command ir or an equivalent analysis in R). b. Answer the question using an appropriate regression model. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: a. Methods: For each subject, if he/she had been diagnosed with coronary heart disease or a cerebrovascular disease prior to MRI, he/she is considered to have prevalent ASCVD, which is dichotomous in the analysis. For each subject, if the vital status is dead and the observation time is shorter than 5 years, he/she is considered to be dead within 5 years, and the 5 year mortality for each subject is also dichotomous. In order to do stratified analysis, we also categorized age based on a 5-year period (65-69, 70-74, 75-79, 80-84, and >=85), while subjects who are 90 or older are combined with those who are 85 to 89, due to small sample size for the oldest group. The 5-year age period has a good balance between keeping the precision and accuracy for the age trend, have sufficient cell counts for each stratum and have a small number of categories for stratified analysis. The association between 5-year mortality and ASCVD prevalence adjusted for age and sex is calculated using a Cochran-Mantel-Haenszel approach. The rate ratio is calculated within each stratum of male and age categories, using observation time as the time variable. The weight for each stratum is calculated based on the marginal counts for each stratum. The overall rate ratio is then calculated. Corresponding 95% confidence interval for adjusted rate ratio is also computed using Mantel-Haenszel method Results: There are 68 people dead within 5 years among 217 subjects with ASCVD prior to MRI (31.34%), and 53 people dead within 5 years among 518 subjects without ASCVD prior to MRI (10.23%). Among subjects with ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI (22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of 236) males and 8.16% (23 out of 282) females dead within 5 years after MRI. Subjects with ASCVD have a higher risk of 5-year mortality at each age group, compared to subjects without ASCVD. The crude rate ratio is 3.32, with 95% confidence interval of 2.28 to 4.85. Without adjustment for age and sex, subjects with ASCVD prior to MRI have a 3.32-fold higher rate of death within 5 years after MRI, compared to subjects without ASCVD. With 95% confidence, such an observation is judged to be consistent with a true 2.28-fold to 4.85-fold relatively higher rate of 5-year mortality. The rate ratio adjusted for age and sex is 2.88, with a 95% confidence interval of 2.01 to 4.12. This indicated that after adjusting for age and sex, subjects with ASCVD prior to MRI have a 2.63-fold higher rate in 5 year mortality probabilities, compared to subjects without ASCVD prior to MRI. With 95% confidence, such an observation is judged to be consistent with a true rate ratio of 5-year mortality between 2.01 and 4.12. Based on the adjusted rate ratio and corresponding 95% confidence interval, we conclude that prevalence of ASCVD is associated with 5-year mortality. b. Methods: The sex and age adjusted estimates of ASCVD and 5-year mortality rate ratio and corresponding confidence interval were based on a Poisson regression of the binary indicator of 5-year mortality on a model that included an indicator of ASCVD, age and sex and the interaction term between age and sex as covariates and observation time as time variable. Age was considered as a continuous variable for both its main effect and the interaction term with sex. The sex and age adjusted estimate of 5-year mortality was based on the regression parameter for the indicator of ASCVD. A 95% confidence interval for the sex and age adjusted difference in 5-year mortality difference was calculated assuming the approximate normal distribution for the regression parameter estimates along with the standard errors based on Huber-White sandwich estimator in order to allow for the mean-variance relationship of binary response variables. Results: There are 68 people dead within 5 years among 217 subjects with ASCVD prior to MRI (31.34%), and 53 people dead within 5 years among 518 subjects without ASCVD prior to MRI (10.23%). Among subjects with ASCVD prior to MRI, there are 48 out of 130 males dead within 5 years after MRI (36.92%), and 20 out of 87 females dead within 5 years after MRI (22.99%). Among subjects without ASCVD prior to MRI, 12.71% (30 out of 236) males and 8.16% (23 out of 282) females dead within 5 years after MRI. Subjects with ASCVD have a higher risk of 5-year mortality at each age group, compared to subjects without ASCVD. Based on a Poisson regression model adjusting for sex and age, prevalence of ASCVD is estimated to be associated with an 2.89-fold higher rate in death within 5 years. With 95% confidence, such an estimate is judged to be consistent with a true 2.02-fold to 4.12-fold relatively higher risk of 5-year mortality. Based on a two-sided p value of 0.000, we reject the null hypothesis that the 5-year mortality is equal among subjects with and without ASCVD prior to MRI. c. By comparing results form problem 4a and 4b, we can see that the point and interval estimates of age and sex adjusted relative risk are similar to each other. The estimates are close to each other because we adjusted for both potential confounding and effect modification effects in both stratified analysis and Poisson regression analysis. By including the interaction model in the Poisson regression, we do not borrow information across subgroups, which we do if only adjust for the main effect of age and sex. However, the estimates are not exactly the same because the underlying mathematical methods are different. For stratified analysis, we are taking the average of rate of exposed and unexposed groups separately across strata, and then take the ratio, while in the regression analysis, we calculate the rate ratio for each stratum first, and then take the average of log ratios, and convert back to the overall rate ratio. In addition, we are using a 5-year categorized age group in the stratified analysis, while a continuous variable of age is used in the Poisson regression analysis, which has more precision. Last but not least, the Poisson regression model takes into account the mean-variance relationship when calculating standard errors, while the Mantel-Haenszel method ignored that. Thus, the interval estimates has a higher precision compared to the stratified analysis. But in the case of incidence rate ratio, MantelHaenszel method handles it well, so the estimates are similar from both methods. 5. Comment very briefly on the similarity or differences among the three approaches. Which would you tend to prefer in general? Why? Answer: All three methods compared the difference of 5-year mortality between subjects with and without ASCVD prior to MRI. And we observe significant association between ASCVD and 5-year mortality regardless of the measurement. RD and RR estimates the risk on absolute and relative scale, while OR estimates odds, instead of risks, on a relative scale. Risk difference needs risk estimates for each group, while rate ratio takes into account the time of event and odds ratio does not need absolute risk estimates or the time to event. Both rate ratio and odds ratio provide estimates on a relative scale, but the risk difference provides estimate based on an absolute scale. The estimates of risk difference is more relevant to quantifying the excess risk of 5-year mortality due to ASCVD, and the relative estimates are more relevant to quantifying the association between ASCVD and 5-year mortality. Since the 5-year mortality is not a rare event among exposed and unexposed groups (31.34% and 10.23%), odds ratio may not be an overestimate for the association between 5-year mortality and ASCVD. And since our scientific question is the association between ASCVD and the mortality at the time point of 5 year, we do not care about the time to event that much. So we are not that interested in rate ratio. Thus, we would prefer to use risk difference as our estimates on this question of interest. Furthermore, Mantel-Haenszel does not take into account the mean-variance association of proportions, while linear regression analysis with robust standard error does. So we would prefer using risk difference estimated based on linear regression model with robust standard error. Question 6 pertains to the analysis of colorectal cancer incidence for whites living in the U.S. as a function of birthplace (U.S. born vs foreign born) (see datafile surveillance.txt and documentation surveillance.doc on the class web pages). 6. Using the incidence ratio as a measure of association, provide inference for an association between incidence of colorectal cancer and birthplace, after adjustment for age, sex, and SEER.. a. Answer the question using directly standardized rates, with standardization to the U.S. population. b. Answer the question using an appropriate regression model. c. What is the difference in the statistical models you used? That is, how would you explain any differences between the two analysis approaches? Answer: a. Methods: For standardization analysis, only subjects with a known birth place are included. Subjects reported unknown on their birth place are excluded from any further analysis. An indicator variable is created for subjects who are born in foreign countries, and US born subjects are considered as unexposed group in this analysis. An indicator variable is created for each stratum by age and SEER sites. The standardization analysis of incidence ratio is conducted among males and females separately. And another indicator variable is created for each stratum by age, sex and SEER sites, a summary estimate on incidence ratio is then calculated. The US population census data in 1980 is used as the reference group for standardization. The sum of person-years in the census by age, sex and SEER sites is used as weights in the standardization. Results: The crude incidence rate of colorectal cancer among subjects who were born in the US is 325.9 cases per 1,000,000 person-years, and the crude incidence rate of colorectal cancer among subjects who were born in the US is 892.7 cases per 1,000,000 person-years. Subjects who were born in foreign countries are estimated to have a 2.74-fold higher incidence rate of colorectal cancer, compared to those who were born in the US. And such an observation is estimated to be consistent with a true 2.68-fold to 2.80-fold incidence ratio with 95% confidence. Among males, the crude incidence rate of colorectal cancer was 309.5 cases per 1,000,000 person-years among US born subjects, and 972.6 cases per 1,000,000 person-years among foreign born subjects. Males who are foreign born have a 3.14-fold relative higher incidence rate of colorectal cancer, compared to US born males. And such an observation is estimated to be consistent with a true 3.05-fold to 3.23-fold incidence ratio with 95% confidence. Among female subjects, the crude incidence rate of colorectal cancer was 341.7 cases per 1,000,000 person-years among US born subjects and 825.2 cases per 1,000,000 person-years among foreign born subjects. Females who are foreign born have a 2.41-fold relative higher incidence rate of colorectal cancer, compared to US born females. And such an observation is estimated to be consistent with a true 2.35-fold to 2.49-fold incidence ratio with 95% confidence. After adjustment for age and SEER sites by standardization to the US 1980 census data, the birth place of foreign countries is estimated to be associated with a 1.047-fold higher incidence rate of colorectal cancer, compared to the birth place of US among male subjects. Such an observation is estimated to be consistent with a true 1.004 to 1.091-fold incidence rate ratio with 95% confidence interval. Based on the point and interval estimates, we conclude that there is marginal association between birth place and incidence of colorectal cancer among males after adjusting for SEER sites. After adjustment for age and SEER sites by standardization to the US 1980 census data, the birth place of foreign countries is estimated to be associated with a 0.989-fold incidence rate ratio of colorectal cancer, compared to the birth place of US among female subjects. Such an observation is estimated to be consistent with a true 0.949 to 1.030-fold incidence rate ratio with 95% confidence interval. Based on the point and interval estimates, we conclude that there is no association between birth place and incidence of colorectal cancer among females after adjusting for age and SEER sites. For the summary estimate of all subjects, after adjustment for age and SEER sites by standardization to the US 1980 census data, the birth place of foreign countries is estimated to be associated with a 1.02-fold incidence rate ratio of colorectal cancer, compared to the birth place of US. Such an observation is estimated to be consistent with a true 0.99 to 1.05-fold incidence rate ratio with 95% confidence interval. Based on the point and interval estimates, we conclude that there is no association between birth place and incidence of colorectal cancer after adjusting for age, sex and SEER sites. In addition, since the incidence rate of colorectal cancer is extremely rare among children, we would also like to restrict the analysis to adults. After this further restriction, male adults who are foreign born have a 1.05-fold higher incidence risk of colorectal cancer, compared to US born male adults (95% CI: 1.00-1.09). And female adults who are foreign born have an incidence rate ratio of 0.99 on colorectal cancer, compared to US born female adults (95% CI: 0.95-1.03). The overall estimates after adjust for age, sex and SEER sites among adults provide an estimate of 1.02-fold incidence ratio on colorectal cancer (95% CI: 0.99-1.05). The results do not materially change. Since there is difference in the direction of association among male and females, we will report the standardized incidence ratio separately for males and females. b. Methods: The age and SEER sites adjusted estimates of birth place and colorectal cancer incidence rate for males and females, along with corresponding confidence interval were based on a Poisson regression of the binary indicator of colorectal cancer incidence on a model that included an indicator of birth place, age, SEER sites and the interaction term between age and SEER sites as covariates and person-year observation time as time variable. The adjusted estimate of colorectal cancer was based on the regression parameter for the indicator of birth place. A 95% confidence interval for the age and SEER sites adjusted difference in colorectal cancer was calculated assuming the approximate normal distribution for the regression parameter estimates along with the standard errors based on Huber-White sandwich estimator in order to allow for the mean-variance relationship of binary response variables. In addition, an overall estimate of colorectal cancer incidence for all subjects was calculated based on a Poisson regression of the binary indicator of colorectal cancer on a model that included an indicator of birth place, age, sex, SEER sites, and the interaction term between sex, age and SEER sits as covariates. The person-year observation time is used as time variable. The adjusted estimate of colorectal cancer was based on the regression parameter for the indicator of birth place. A 95% confidence interval for the sex, age and SEER sites adjusted difference in colorectal cancer was calculated assuming the approximate normal distribution for the regression parameter estimates along with the standard errors based on Huber-White sandwich estimator in order to allow for the mean-variance relationship of binary response variables. Results: The crude incidence rate of colorectal cancer among subjects who were born in the US is 325.9 cases per 1,000,000 person-years, and the crude incidence rate of colorectal cancer among subjects who were born in the US is 892.7 cases per 1,000,000 person-years. Subjects who were born in foreign countries are estimated to have a 2.74-fold higher incidence rate of colorectal cancer, compared to those who were born in the US. And such an observation is estimated to be consistent with a true 2.68-fold to 2.80-fold incidence ratio with 95% confidence. Among males, the crude incidence rate of colorectal cancer was 309.5 cases per 1,000,000 person-years among US born subjects, and 972.6 cases per 1,000,000 person-years among foreign born subjects. Males who are foreign born have a 3.14-fold relative higher incidence rate of colorectal cancer, compared to US born males. And such an observation is estimated to be consistent with a true 3.05-fold to 3.23-fold incidence ratio with 95% confidence. Among female subjects, the crude incidence rate of colorectal cancer was 341.7 cases per 1,000,000 person-years among US born subjects and 825.2 cases per 1,000,000 person-years among foreign born subjects. Females who are foreign born have a 2.41-fold relative higher incidence rate of colorectal cancer, compared to US born females. And such an observation is estimated to be consistent with a true 2.35-fold to 2.49-fold incidence ratio with 95% confidence. After adjusting for age and SEER sites among males, the birth place of foreign countries is estimated to be associated with a 1.062-fold higher incidence rate of colorectal cancer, compared to the birth place of US among male subjects. Such an observation is estimated to be consistent with a true 1.026 to 1.010-fold incidence rate ratio with 95% confidence interval. Based on the p-value of 0.001, we conclude that there is association between birth place and incidence of colorectal cancer among males after adjusting for age and SEER sites. After adjusting for age and SEER sites among females, the birth place of foreign countries is estimated to be associated with a 2.3% lower in relative incidence rate of colorectal cancer, compared to the birth place of US among female subjects. Such an observation is estimated to be consistent with a true 0.941 to 1.014-fold incidence rate ratio with 95% confidence interval. Based on the p-value of 0.213, we conclude that there is no association between birth place and incidence of colorectal cancer among females after adjusting for age and SEER sites. For the overall estimate of incidence rate ratio, after adjusting for sex, age and SEER sites, the birth place of foreign countries is estimated to be associated with a 1.017-fold higher incidence rate of colorectal cancer, compared to the birth place of US. Such an observation is estimated to be consistent with a true 0.989 to 1.045-fold incidence rate ratio with 95% confidence interval. Based on the p-value of 0.242, we conclude that there is no association between birth place and incidence of colorectal cancer after adjusting for sex, age and SEER sites. Based on the results of males and females, we would suspect that the association between birth place and colorectal cancer incidence differs between males and females. So, we would like to report the association between males and females separately. c. By comparing results form problem 6a and 6b, we can see that the point and interval estimates of age and sex adjusted relative risk are similar to each other, though not exactly the same. The estimates are close to each other because we adjusted for both potential confounding and effect modification effects in both stratified analysis and Poisson regression analysis. By including the interaction model in the Poisson regression, we do not borrow information across subgroups, which we do if only adjust for the main effect of age and sex. However, the estimates are not exactly the same because the underlying mathematical methods are different. For stratified analysis, we are taking the average of rate of exposed and unexposed groups separately across strata, and then take the ratio, while in the regression analysis, we calculate the rate ratio for each stratum first, and then take the average of log ratios, and convert back to the overall rate ratio. In addition, the Poisson regression model takes into account the mean-variance relationship when calculating standard errors, while the Mantel-Haenszel method ignored that. Thus, the interval estimates has a higher precision compared to the stratified analysis. But in the case of incidence rate ratio, Mantel-Haenszel method handles it well, so the estimates are similar from both methods.