Race and Breast Cancer Survival Disparities 1. Abstract Modify abstract at end when all else is finished This paper explores the effect of race on the white-black breast cancer survival disparity in the US between 1973 and 2002. To do so, grade and stage of cancer, type of district, income, education level, marital status, age at diagnosis, year of diagnosis and race are used to create a control and treatment group that differ only by race. A weighted multiple linear regression is used to show that race does have an effect on cause-specific survival with a p-value of 0.000. Compared to their white counterparts, black patients show about 4.361 percent decrease in cause-specific survival. All besides type of district show strong to medium effect on cause specific survival. Higher values of stage and grade decrease cause-specific survival with a p-value of 0.000. Increase in income correlates with increase in cause-specific survival with a p-value of 0.002, whereas increase in education shows slight decrease in survival with p values ranging from 0.014-0.641. Being married increases survival by 0.979 percent compared to being single with a p-value of 0.000. Cause specific survival mostly increases with age, with p-values ranging from 0.001-0.028. Year of diagnosis also shows strong effect, since cause-specific survival increases with year of diagnosis, with p-values of 0.000 except for the second cohort (years 19781982), which has a p-value of 0.625. While there may be omitted variables such age type of treatment, the disparity in survival reported after the above factors were controlled implies that race, affect cause-specific survival significantly. Moreover, it shows that other factors such as age at diagnosis and year of diagnosis have even higher effects. 2. Background and Introduction In studies of cancer incidence or mortality, even when age adjusted, show a marked difference between races. 1 In particular black women have a 30%-40% higher rate of breast cancer than white women. It is obviously a matter of great interest to understand whether, or to what extent, this is a genetic effect, an environmental effect or has socio economic orgins. In a recent study of cancer survival from this group2 of cancer survival there is also a significant difference between survival of black patients as compared to rwhite patients. This racial disparity appears in most cancers but pronounced in breast cancer among women which is the highest women’s health concern, especially in industrialized countries.3. where there also exist more data. The “raw” data suggested that 5 year survival for white women diagnosed with breast cancer was 97% compared to 86% among black women. Otherrecent studies of breast cancer incidence, mortality and survival are divided.5 6 7 8 In general it is of great interest to understand the reasons 1 Breast Cancer Trends Among Black and White Women in the United States Jatoi, J, Anderson, W.F., Rao, S.K.,,and Devesa, S>S>, J. Clin Oncol 23:7836-7841. 2 “Cancer Survival as a Function of Age at Diagnosis: A Study of the Surveillance, Epidemiology and End Results Database.”, Bassily, M.N.,;Wilson, R., Pompei; F., Burmistrov, D., International Journal of Cancer Epidemiology, 34(6):667-681 (2010) 3 Ghafoor, A, Jemal, A, Ward, E.,, Cokkinides, V.,, Smith, R.,, Thun, M., “Trends in Breast Cancer by Race and Ethnicity:” A Cancer Journal for Clinicians 53 (2003): 342-355. 5 Otis , B., Freeman, H.,, “Race and Outcomes: Is This the End of the Beginning for Minority Health Research?” Downloaded from jnci.oxfordjournals.org by guest on April 25, 2011. 6 Idan, M, Anderson, W., Jatoi, I./and , Rosenberg, P., “Underlying Causes of the Black-White Racial Disparity in Brest Cancer Mortality: A Population-Based Analysis” Downloaded from jnci.oxfordjournals.org by guest on April 25, 2011. for this disparity. Is it genetic? Or are there societal issues which cause it? Of the various socioeconomic affects which might be a cause, many may be surogates for a deeper cuase. This paper describes a mutivariate study of a number of socioeconomic and cancer-specific factors which have been suggested. When account is taken of these the health disparity is reduced to 4.7% but still statistically significant. We suggest that this residual difference may be genetic. 3. Methods and Variables Data The data used in this paper comes from the Surveillance Epidemiology, and End results (SEER*Stat) database.10 The data set used ranges from 1973-2002, allowing for the inclusion of all years available in the database with the appropriate five year survival recorded. In order to compare breast cancer survival of black and white women, control and treatment groups will be constructed where two groups being compared will be similar socioeconomically and in advancement of breast cancer, but will be of different 7 Breast Cancer Trends Among Black and White Women in the United States Jatoi, J, Anderson, W.F., Rao, S.K.,,and Devesa, S>S>, J. Clin Oncol 23:7836-7841. 8 10 This is a statistical software that provides a mechanism for the analysis of SEER and other cancerrelated database. The data collection for this program began in 1973 and currently spans a wide range of the United States in terms of data spread. races. After for controlling for these, whether there remains a small disparity in survival should address the research question. The factors being considered are the following CAN WE HAVE A SPECIFIC REFEREnce to the procedure for multivariate analysis what are control and treaTment groups in our context? WHAT IS A CATEGORICAL VARIABLE? WHAT is a DUMMY variable and the distinction? When we have variables on Race: This is a categorical variable that will include all races, white and black. It is the primary variable whose effect on survival we are interested in seeing. The definition of white and back is that of SEER. Grade, a measure of the progress of tumors and the neoplasms, is going to be going to be a categorical variable. The grades are those in SEER: Several known gradesSeveral known grades-well differentiated, moderately differentiated, poorly differentiated, undifferentiated, with T-cell, with B-cell, with Null Cell and natural killer cell.-are included in this study. Stage, a description of the extent a cancer has spread, is also going to me a categorical measure. The list of stages includes in situated, localized, regional, distant, and localized/regional. Marital status is a dummy variable, where married individuals will be assigned a value of 0 and non married individuals will be assigned a value of 1. Marital status is added because of the potential emotional and economic advantages it may have on the patients. Education level is another categorical factor. Patients will be grouped into high education, medium education and low education. Education is assumed to be highly correlated with knowledge of cancer and its treatment, which may result in increase in survival. Income is a factor that will be divided into high, median and low income. Increased income is assumed to allow better treatment aafter after diagnosis. Type of district is a measure of the size and type of district the patients live in. In this study, it is a dummy variable. Only two types are distinguished: metrropolitan areas and rural areas Age at diagnosis s a factor that is divided into 7 groups: Ages 0=19, 20-29, 30-39, 40-49, 50-59, 60-69, and 70+. This is consisdered as the period in which the cancer occurs and the distinction between early and late diagnosis is ignored. Year of diagnosis is a factor that is assumed to strongly correlate with the developments medicine and technology. In this study, the available years are divided into 6 cohorts spanning five years each: 1973-1977, 1978-1982, 1983-1987, 1993-1997, and 1998-2002. Survival: is defined to be living more than 5 years after diagnosis. To calculate cause-specific survival, SEER*Stat uses the Kaplan-Meier method. This calculates the survival probability at a defined period of time based on calculation of the survival estimate at the end of each month of this period. This allows for early exclusion of death due to other factors during the specified time interval as well as censoring of cases lost from follow-up. For the interval being considered, survival probability is calculated as the number of patients surviving for certain time divided by the number of patients at risk. There are 7533 data points that are extracted from SEER*Stat. Each data point will have N individuals that are the same in all the data factors that are being considered. Hence, a multiple weighted linear regression will be used, For each case the probability that the result can be due to chance (p value) is determined Statistical significance is claimed if p<0.05. 4. Results After controlling for the above factors, what does this mean? I thought hat controlling was after the multiple regression? SEER*Stat gives 3381 data points for groups including all races, 3200 data points for white individuals, and 952 data points for black individuals. A weighted multiple linear regression for cause specific survival will give the comparative difference in survival of each cohort of each factor. Prior to getting the overall result, each variable is regressed against cause specific survival to see how each affects survival when the other variables are not around. Our primary variable of interest, a simple weighted regression on race shows that, without taking other factors into account, race does have a statistically significant effect on survival for black individuals. Compared to white patients where survival is 97%, black patients have a reduced survival of 86% (a reduction of 10 percent. (P=0.00). This effect is less when no account is taken of the other variables.. Of these, both Grade and stage show highly statistically significant, and more dramatic, effects on survival both for white and black.patients . The more a cancer hass developed or has progressed, the lower the survival will be. Marital status, income and education show moderate effects on cause-specific survival, with few statistically significant results. Single patients have a 2.7 percent decrease in cause-specific survival compared to their married counterparts. Income affects survival up to 6.7 percent while education affects survival a lower, 1.1 percent. Similarly, rural areas show a decrease of 3.9% from urban areas in causespecific survival with a p-value of 0.000. Age at diagnosis showed a high effect on cause-specific survival. Patients aged 20-29, had decreased survival of 9.6 to 18.7 % compared with older patients ( p-values of 0.114 to 0.002). Similarity, the year of diagnosis showed a significant effect on survival with an increased survival of 1.3 to 19.3 % increase in survival in the later year compared to the cohort 1973-1977. P-values ranged from 0.74 to 0.00. Correlating between the above variables reveals that there is no significant multicollinearity, where significant is defined to a correlation constant of 0.5 or greater. In fact, most correlation constants are less than 0.10. This implies that, when a weighted multiple linear regression is applied on all the variables, “over fitting” is unlikely. Table 2 presents the weighted multiple linear regressions that shows the effect of all the variables listed above on cause-specific survival. Most cohorts show statistically significant results, with the primary variable of interest being cause-specific survival of black patients. Compared to their white counterparts, black patients show a decreased survival of 4.4 percent with a p-value of 0.00 even after all the other variables have been controlled. White patients show a slight increase of 0.09 percent in survival from a group including all races with a p-value of 0.537. Moreover, other variables show significant effects on cause-specific survival. Stage and grade of cancer, which measure the spread and growth of the cancer on the specific patients show a high effect on cause specific survival, both for white and black patients with cancer with a distant stage showing smaller survival by 62 % percent compared to an in situated one, while undifferentiated cancer shows a decrease of 9.3 compared to a well-differentiated one. All values in stage and grade categories are highly significant (p=0.00) This paper is about the B-W difference. The above paragraph seems written for all. WHAT I THE BW DIFFERENCE IF ONLY LATE STAGE AND HIGH GRADE ARE USED?. For both white and black patients age at diagnosis also shows a statistically significant result with patients of ages 60-69 showing an increase of 7.7 compared to their 20-29 counterparts. (p=0.00)1. Survival increases with age except for the last age group of 70+ years which shows a decrease compared to ages 4069. But it is still 0.72 %grater than for patients of ages 30-39 ( p = 0.01). The calendar year at diagnosis also shows a large effect on cause-specific survival with survival improving (increasing) at the older years.. Compared to the first year cohort, 1973-1977, cause-specific survival increases by 0.76 percent, building up to 9.9 percent by the last cohort of year 1998-2002. The first cohort which has a p-value of 0.6 , but all other values have a p-value of 0.000 percent. Other socioeconomic factors also show some effect on cause-specific survival. Patients from rural areas show a slight increase of 0.46 percent compared to patients from metropolitan areas with a p-value of 0.230. Marital status also shows a slight effect on cause-specific survival with single patients having a 0.98 decrease in survival compared to married patients ( p =f 0.00). Increase in education and income also show positive effects on cause-specific survival. Income increases cause-specific survival by 1.7 per group ( p =0.02). Compared to patients with high education, patients with medium education have a 0.42 percent less survival rate ( p= 0.014). This ceases to be statistically significant when comparing patients with low education. (Patients with low education only show a 0.16 percent decrease.) The data extracted from the table accounts for 85 % of all patients in the the group . The t-values in this table are not small, giving no reason to remove the variables. Lastly, since the Prob > F= 0.000, the study concluded that all the variables listen in the table are necessary. 5. Conclusions and Discussion One postulate about the black-white disparity in breast cancer incidence is that black women are diagnosed later in life than white women after a cancer has developed to a latter stage for which survival is less. But this postulate does not seem to be borne out by the survival data. The weighted multiple linear regression indicates that the white-black breast cancer survival disparity might be directly affected by race. There is a 4.3 %t decrease from the controlled group (all patients) to the treatment group (black patients) (p = 0.00). White patients show an increase of 0.09 % (p =0.534. However , stage and grade affect survival more than race, also with a p-value of 0.000. Socioeconomic factors show a moderate, but mostly statistically significant effect on survival. For example, patients with high education have a 0.4 %t higher survival rate than patients with medium education. Using Table 2, it can, in fact, be observed which of the variables have more of an effect on breast cancer survival than others. Which result?) The result gives a high R-squared value of 85 %. A potential error in this study is that there might be important variables that are omitted that affect survival such as type of treatment.. While we expect that type of treatment might be strongly correlated with income level, it cannot be concluded that accounting for this variable might have closed the white-black breast cancer survival disparity. Regardless, even after controlling for significant cancer-specific and socioeconomic factors, there is 4.4 % gap in survival between black and white patients that is significant at the 95 percentile. This gives a necessary, but not sufficient, reason to believe that differences in breast cancer might have biological origins. . We note the general problem with all observational studies is that they cannot directly prove what is happening, but if a theory (model) can be developed observational studies can disprove its validity. At the present time we see no model of women’s breast cancer that fits the data, including incidence, mortality and cause specific survival without assuming some direct racial (biological) effect, albeit smaller than the raw data suggest. 6. References initials needed see JNCI guidelines get these straight at end 1. Breast Cancer Trends Among Black and White Women in the United States Jatoi, J, Anderson, W.F., Rao, S.K.,,and Devesa, S>S>, J. Clin Oncol 23:7836-7841. 2 “Cancer Survival as a Function of Age at Diagnosis: A Study of the Surveillance, Epidemiology and End Results Database.”, Bassily, M.N.,;Wilson, R., Pompei; F., Burmistrov, D., International Journal of Cancer Epidemiology, 34(6):667-681 (2010) 3. Ghafoor, A, Jemal, A, Ward, E.,, Cokkinides, V.,, Smith, R.,, Thun, M., “Trends in Breast Cancer by Race and Ethnicity:” A Cancer Journal for Clinicians 53 (2003): 342-355. “Cancer Survival as a Function of Age at Diagnosis: A Study of the Surveillance, Epidemiology and End Results Database.”, Bassily, M.N.,;Wilson, R., Pompei; F., Burmistrov, D., International Journal of Cancer Epidemiology, 34(6):667-681 (2010) Ghafoor, A, Jemal, A, Ward, E.,, Cokkinides, V.,, Smith, R.,, Thun, M., “Trends in Breast Cancer by Race and Ethnicity:” A Cancer Journal for Clinicians 53 (2003): 342-355. 4. Otis , B., Freeman, H.,, “Race and Outcomes: Is This the End of the Beginning for Minority Health Research?” Downloaded from jnci.oxfordjournals.org by guest on April 25, 2011. 5. Idan, M, Anderson, W., Jatoi, I./and , Rosenberg, P., “Underlying Causes of the Black-White Racial Disparity in Brest Cancer Mortality: A Population-Based Analysis” Downloaded from jnci.oxfordjournals.org by guest on April 25, 2011. 7. Tables Variable Cohorts Race All Race White Black Constant Age at Diagnosis 20-29 30-39 40-49 50-59 60-69 70+ Constant Type of district Metropolitan Rural Constant Stage In situated Regional Distant Constant Grade Well differentiated Moderately differentiated Poorly differentiated Undifferentiated Constant Education High Medium Low Constant Marital Status Married Single Constant Year at Diagnosis 1973-1977 1978-1982 1983-1987 1988-1992 Effect on CauseSpecific Survival 0 0.322 -10.007 96.398 0 9.613 17.833 18.735 18.005 15.353 78.871 0 -3.926 96.591 0 -9.808 -69.350 97.959 0 -2.898 -14.204 -14.745 99.180 0 0.264 -1.097 96.472 0 -2.72 97.184 0 1.250 9.332 14.878 Standard Error Ppvalue v a l u e 0.381 1.730 0.264 0.398 0.000 0.000 6.074 5.942 5.936 5.939 5.944 5.928 0.114 0.003 0.002 0.002 0.010 0.000 0.959 0.194 0.000 0.000 0.361 1.362 0.119 0.000 0.000 0.000 0.342 0.000 0.522 1.679 0.219 0.000 0.000 0.000 0.419 0.579 0.315 0.529 0.058 0.000 0.421 0.222 0.000 0.000 3.829 3.254 2.991 0.744 0.004 0.000 Income 1993-1997 1998-2002 Low Medium High Constant 17.328 2.972 18.311 2.963 0 4.470 2.486 6.655 2.467 90.363 2.457 Table 1 Summary of Individual Regression Results Variable Cohorts Race All Race White Black Age at Diagnosis 20-29 30-39 40-49 50-59 60-69 70+ Type of district Metropolitan Rural Stage In situated Regional Distant Grade Well differentiated Moderately differentiated Poorly differentiated Undifferentiated Education High Medium Low Marital Status Married Single Year at Diagnosis 1973-1977 1978-1982 1983-1987 1988-1992 1993-1997 1998-2002 Income Effect on CauseSpecific Survival 0 0.091 -4.270 0 5.226 7.797 7.599 7.691 5.942 0 0.459 0 -7.668 -62.147 0 -1.536 -8.874 -9.265 0 -0.417 -0.163 0 -0.979 0 .754 5.239 8.202 9.139 9.921 1.673 Standard Error 0.000 0.000 0.072 0.007 0.000 PP value v a l u e 0.147 0.676 0.537 0.000 2.380 2.333 2.332 2.333 2.334 0.028 0.001 0.001 0.001 0.011 0.442 0.299 0.242 0.903 0.000 0.000 0.167 0.000 0.265 0.795 0.000 0.000 0.169 0.349 0.014 0.641 0.170 0.000 1.542 1.315 1.236 1.216 1.210 .545 0.625 0.000 0.000 0.000 0.000 0.002 Constant 79.726 2.852 Table 2 Summary of ??? Results MISSING FROM TEXT IS A DESCRIPTION OF HOOW TO READ THESE TABLES 0.000