Survival Disparities rw 07-07

advertisement
Race and Breast Cancer Survival Disparities
1. Abstract
Modify abstract at end when all else is finished
This paper explores the effect of race on the white-black breast cancer survival disparity in the US
between 1973 and 2002. To do so, grade and stage of cancer, type of district, income, education level,
marital status, age at diagnosis, year of diagnosis and race are used to create a control and treatment group
that differ only by race. A weighted multiple linear regression is used to show that race does have an effect
on cause-specific survival with a p-value of 0.000. Compared to their white counterparts, black patients
show about 4.361 percent decrease in cause-specific survival.
All besides type of district show strong to medium effect on cause specific survival. Higher values of
stage and grade decrease cause-specific survival with a p-value of 0.000. Increase in income correlates with
increase in cause-specific survival with a p-value of 0.002, whereas increase in education shows slight
decrease in survival with p values ranging from 0.014-0.641. Being married increases survival by 0.979
percent compared to being single with a p-value of 0.000. Cause specific survival mostly increases with age,
with p-values ranging from 0.001-0.028. Year of diagnosis also shows strong effect, since cause-specific
survival increases with year of diagnosis, with p-values of 0.000 except for the second cohort (years 19781982), which has a p-value of 0.625.
While there may be omitted variables such age type of treatment, the disparity in survival reported after
the above factors were controlled implies that race, affect cause-specific survival significantly. Moreover, it
shows that other factors such as age at diagnosis and year of diagnosis have even higher effects.
2. Background and Introduction
In studies of cancer incidence or mortality, even when age adjusted, show a marked difference
between races. 1 In particular black women have a 30%-40% higher rate of breast cancer than white
women. It is obviously a matter of great interest to understand whether, or to what extent, this is a genetic
effect, an environmental effect or has socio economic orgins. In a recent study of cancer survival from this
group2 of cancer survival there is also a significant difference between survival of black patients as
compared to rwhite patients. This racial disparity appears in most cancers but pronounced in breast cancer
among women which is the highest women’s health concern, especially in industrialized countries.3. where
there also exist more data. The “raw” data suggested that 5 year survival for white women diagnosed with
breast cancer was 97% compared to 86% among black women. Otherrecent studies of breast cancer
incidence, mortality and survival are divided.5 6 7 8 In general it is of great interest to understand the reasons
1
Breast Cancer Trends Among Black and White Women in the United States
Jatoi, J, Anderson, W.F., Rao, S.K.,,and Devesa, S>S>,
J. Clin Oncol 23:7836-7841.
2 “Cancer Survival as a Function of Age at Diagnosis: A Study of the Surveillance, Epidemiology and End Results
Database.”, Bassily, M.N.,;Wilson, R., Pompei; F., Burmistrov, D., International Journal of Cancer Epidemiology,
34(6):667-681 (2010)
3
Ghafoor, A, Jemal, A, Ward, E.,, Cokkinides, V.,, Smith, R.,, Thun, M., “Trends in Breast Cancer by Race and Ethnicity:” A
Cancer Journal for Clinicians 53 (2003): 342-355.
5
Otis , B., Freeman, H.,, “Race and Outcomes: Is This the End of the Beginning for Minority Health Research?” Downloaded
from jnci.oxfordjournals.org by guest on April 25, 2011.
6
Idan, M, Anderson, W., Jatoi, I./and , Rosenberg, P., “Underlying Causes of the Black-White Racial Disparity in Brest
Cancer Mortality: A Population-Based Analysis” Downloaded from jnci.oxfordjournals.org by guest on April 25, 2011.
for this disparity. Is it genetic? Or are there societal issues which cause it? Of the various socioeconomic
affects which might be a cause, many may be surogates for a deeper cuase. This paper describes a
mutivariate study of a number of socioeconomic and cancer-specific factors which have been suggested.
When account is taken of these the health disparity is reduced to 4.7% but still statistically significant. We
suggest that this residual difference may be genetic.
3. Methods and Variables
Data
The data used in this paper comes from the Surveillance Epidemiology, and End results (SEER*Stat)
database.10 The data set used ranges from 1973-2002, allowing for the inclusion of all years available in the
database with the appropriate five year survival recorded. In order to compare breast cancer survival of
black and white women, control and treatment groups will be constructed where two groups being
compared will be similar socioeconomically and in advancement of breast cancer, but will be of different
7
Breast Cancer Trends Among Black and White Women in the United States
Jatoi, J, Anderson, W.F., Rao, S.K.,,and Devesa, S>S>,
J. Clin Oncol 23:7836-7841.
8
10
This is a statistical software that provides a mechanism for the analysis of SEER and other cancerrelated database. The data collection for this program began in 1973 and currently spans a wide range of the
United States in terms of data spread.
races. After for controlling for these, whether there remains a small disparity in survival should address the
research question. The factors being considered are the following
CAN WE HAVE A SPECIFIC REFEREnce to the procedure for multivariate analysis
what are control and treaTment groups in our context?
WHAT IS A CATEGORICAL VARIABLE?
WHAT is a DUMMY variable and the distinction?
When we have variables on
Race: This is a categorical variable that will include all races, white and black. It is the primary variable whose
effect on survival we are interested in seeing. The definition of white and back is that of SEER.
Grade, a measure of the progress of tumors and the neoplasms, is going to be going to be a categorical
variable. The grades are those in SEER: Several known gradesSeveral known grades-well differentiated,
moderately differentiated, poorly differentiated, undifferentiated, with T-cell, with B-cell, with Null Cell and
natural killer cell.-are included in this study.
Stage, a description of the extent a cancer has spread, is also going to me a categorical measure. The list of
stages includes in situated, localized, regional, distant, and localized/regional.
Marital status is a dummy variable, where married individuals will be assigned a value of 0 and non married
individuals will be assigned a value of 1. Marital status is added because of the potential emotional and
economic advantages it may have on the patients.
Education level is another categorical factor. Patients will be grouped into high education, medium education
and low education. Education is assumed to be highly correlated with knowledge of cancer and its
treatment, which may result in increase in survival.
Income is a factor that will be divided into high, median and low income. Increased income is assumed to
allow better treatment aafter after diagnosis.
Type of district is a measure of the size and type of district the patients live in. In this study, it is a dummy
variable. Only two types are distinguished: metrropolitan areas and rural areas
Age at diagnosis s a factor that is divided into 7 groups: Ages 0=19, 20-29, 30-39, 40-49, 50-59, 60-69, and
70+. This is consisdered as the period in which the cancer occurs and the distinction between early and late
diagnosis is ignored. Year of diagnosis is a factor that is assumed to strongly correlate with the
developments medicine and technology. In this study, the available years are divided into 6 cohorts spanning
five years each: 1973-1977, 1978-1982, 1983-1987, 1993-1997, and 1998-2002.
Survival: is defined to be living more than 5 years after diagnosis. To calculate cause-specific survival,
SEER*Stat uses the Kaplan-Meier method. This calculates the survival probability at a defined period of time
based on calculation of the survival estimate at the end of each month of this period. This allows for early
exclusion of death due to other factors during the specified time interval as well as censoring of cases lost
from follow-up. For the interval being considered, survival probability is calculated as the number of patients
surviving for certain time divided by the number of patients at risk.
There are 7533 data points that are extracted from SEER*Stat. Each data point will have N individuals
that are the same in all the data factors that are being considered. Hence, a multiple weighted linear
regression will be used, For each case the probability that the result can be due to chance (p value) is
determined Statistical significance is claimed if p<0.05.
4. Results
After controlling for the above factors,
what does this mean? I thought hat controlling was after the multiple regression?
SEER*Stat gives 3381 data points for groups including all races, 3200 data points for white
individuals, and 952 data points for black individuals. A weighted multiple linear regression for cause specific
survival will give the comparative difference in survival of each cohort of each factor. Prior to getting the
overall result, each variable is regressed against cause specific survival to see how each affects survival when
the other variables are not around.
Our primary variable of interest, a simple weighted regression on race shows that, without taking
other factors into account, race does have a statistically significant effect on survival for black individuals.
Compared to white patients where survival is 97%, black patients have a reduced survival of 86% (a
reduction of 10 percent. (P=0.00). This effect is less when no account is taken of the other variables.. Of
these, both Grade and stage show highly statistically significant, and more dramatic, effects on survival both
for white and black.patients . The more a cancer hass developed or has progressed, the lower the survival
will be.
Marital status, income and education show moderate effects on cause-specific survival, with few
statistically significant results. Single patients have a 2.7 percent decrease in cause-specific survival
compared to their married counterparts. Income affects survival up to 6.7 percent while education affects
survival a lower, 1.1 percent. Similarly, rural areas show a decrease of 3.9% from urban areas in causespecific survival with a p-value of 0.000.
Age at diagnosis showed a high effect on cause-specific survival. Patients aged 20-29, had
decreased survival of 9.6 to 18.7 % compared with older patients ( p-values of 0.114 to 0.002). Similarity, the
year of diagnosis showed a significant effect on survival with an increased survival of 1.3 to 19.3 % increase
in survival in the later year compared to the cohort 1973-1977. P-values ranged from 0.74 to 0.00.
Correlating between the above variables reveals that there is no significant multicollinearity, where
significant is defined to a correlation constant of 0.5 or greater. In fact, most correlation constants are less
than 0.10. This implies that, when a weighted multiple linear regression is applied on all the variables, “over
fitting” is unlikely.
Table 2 presents the weighted multiple linear regressions that shows the effect of all the variables
listed above on cause-specific survival. Most cohorts show statistically significant results, with the primary
variable of interest being cause-specific survival of black patients. Compared to their white counterparts,
black patients show a decreased survival of 4.4 percent with a p-value of 0.00 even after all the other
variables have been controlled. White patients show a slight increase of 0.09 percent in survival from a
group including all races with a p-value of 0.537.
Moreover, other variables show significant effects on cause-specific survival. Stage and grade of
cancer, which measure the spread and growth of the cancer on the specific patients show a high effect on
cause specific survival, both for white and black patients with cancer with a distant stage showing smaller
survival by 62 % percent compared to an in situated one, while undifferentiated cancer shows a decrease of
9.3 compared to a well-differentiated one. All values in stage and grade categories are highly significant
(p=0.00)
This paper is about the B-W difference. The above paragraph seems written for all. WHAT I THE BW DIFFERENCE IF ONLY LATE STAGE AND HIGH GRADE ARE USED?.
For both white and black patients age at diagnosis also shows a statistically significant result with
patients of ages 60-69 showing an increase of 7.7 compared to their 20-29 counterparts. (p=0.00)1. Survival
increases with age except for the last age group of 70+ years which shows a decrease compared to ages 4069. But it is still 0.72 %grater than for patients of ages 30-39 ( p = 0.01).
The calendar year at diagnosis also shows a large effect on cause-specific survival with survival
improving (increasing) at the older years.. Compared to the first year cohort, 1973-1977, cause-specific
survival increases by 0.76 percent, building up to 9.9 percent by the last cohort of year 1998-2002. The first
cohort which has a p-value of 0.6 , but all other values have a p-value of 0.000 percent.
Other socioeconomic factors also show some effect on cause-specific survival. Patients from rural
areas show a slight increase of 0.46 percent compared to patients from metropolitan areas with a p-value of
0.230. Marital status also shows a slight effect on cause-specific survival with single patients having a 0.98
decrease in survival compared to married patients ( p =f 0.00). Increase in education and income also show
positive effects on cause-specific survival. Income increases cause-specific survival by 1.7 per group ( p
=0.02). Compared to patients with high education, patients with medium education have a 0.42 percent less
survival rate ( p= 0.014). This ceases to be statistically significant when comparing patients with low
education. (Patients with low education only show a 0.16 percent decrease.)
The data extracted from the table accounts for 85 % of all patients in the the group . The t-values in
this table are not small, giving no reason to remove the variables. Lastly, since the Prob > F= 0.000, the study
concluded that all the variables listen in the table are necessary.
5. Conclusions and Discussion
One postulate about the black-white disparity in breast cancer incidence is that black women are
diagnosed later in life than white women after a cancer has developed to a latter stage for which survival is
less. But this postulate does not seem to be borne out by the survival data. The weighted multiple linear
regression indicates that the white-black breast cancer survival disparity might be directly affected by race.
There is a 4.3 %t decrease from the controlled group (all patients) to the treatment group (black patients) (p
= 0.00). White patients show an increase of 0.09 % (p =0.534. However , stage and grade affect survival
more than race, also with a p-value of 0.000. Socioeconomic factors show a moderate, but mostly
statistically significant effect on survival. For example, patients with high education have a 0.4 %t higher
survival rate than patients with medium education. Using Table 2, it can, in fact, be observed which of the
variables have more of an effect on breast cancer survival than others.
Which result?)
The result gives a high R-squared value of 85 %.
A potential error in this study is that there might be important variables that are omitted that affect
survival such as type of treatment.. While we expect that type of treatment might be strongly correlated
with income level, it cannot be concluded that accounting for this variable might have closed the white-black
breast cancer survival disparity. Regardless, even after controlling for significant cancer-specific and
socioeconomic factors, there is 4.4 % gap in survival between black and white patients that is significant at
the 95 percentile. This gives a necessary, but not sufficient, reason to believe that differences in breast
cancer might have biological origins.
.
We note the general problem with all observational studies is that they cannot directly prove what is
happening, but if a theory (model) can be developed observational studies can disprove its validity. At the
present time we see no model of women’s breast cancer that fits the data, including incidence, mortality
and cause specific survival without assuming some direct racial (biological) effect, albeit smaller than the
raw data suggest.
6. References
initials needed see JNCI guidelines
get these straight at end
1.
Breast Cancer Trends Among Black and White Women in the United States
Jatoi, J, Anderson, W.F., Rao, S.K.,,and Devesa, S>S>,
J. Clin Oncol 23:7836-7841.
2
“Cancer Survival as a Function of Age at Diagnosis: A Study of the Surveillance, Epidemiology and End Results
Database.”, Bassily, M.N.,;Wilson, R., Pompei; F., Burmistrov, D., International Journal of Cancer Epidemiology,
34(6):667-681 (2010)
3.
Ghafoor, A, Jemal, A, Ward, E.,, Cokkinides, V.,, Smith, R.,, Thun, M., “Trends in Breast Cancer by Race and Ethnicity:” A
Cancer Journal for Clinicians 53 (2003): 342-355.
“Cancer Survival as a Function of Age at Diagnosis: A Study of the Surveillance, Epidemiology and End Results
Database.”, Bassily, M.N.,;Wilson, R., Pompei; F., Burmistrov, D., International Journal of Cancer Epidemiology,
34(6):667-681 (2010)
Ghafoor, A, Jemal, A, Ward, E.,, Cokkinides, V.,, Smith, R.,, Thun, M., “Trends in Breast Cancer by Race and Ethnicity:” A
Cancer Journal for Clinicians 53 (2003): 342-355.
4.
Otis , B., Freeman, H.,, “Race and Outcomes: Is This the End of the Beginning for Minority Health Research?” Downloaded
from jnci.oxfordjournals.org by guest on April 25, 2011.
5.
Idan, M, Anderson, W., Jatoi, I./and , Rosenberg, P., “Underlying Causes of the Black-White Racial Disparity in Brest Cancer
Mortality: A Population-Based Analysis” Downloaded from jnci.oxfordjournals.org by guest on April 25, 2011.
7.
Tables
Variable Cohorts
Race
All Race
White
Black
Constant
Age at Diagnosis 20-29
30-39
40-49
50-59
60-69
70+
Constant
Type of district
Metropolitan
Rural
Constant
Stage
In situated
Regional
Distant
Constant
Grade
Well differentiated
Moderately
differentiated
Poorly differentiated
Undifferentiated
Constant
Education
High
Medium
Low
Constant
Marital Status
Married
Single
Constant
Year at Diagnosis 1973-1977
1978-1982
1983-1987
1988-1992
Effect on CauseSpecific Survival
0
0.322
-10.007
96.398
0
9.613
17.833
18.735
18.005
15.353
78.871
0
-3.926
96.591
0
-9.808
-69.350
97.959
0
-2.898
-14.204
-14.745
99.180
0
0.264
-1.097
96.472
0
-2.72
97.184
0
1.250
9.332
14.878
Standard Error
Ppvalue
v
a
l
u
e
0.381
1.730
0.264
0.398
0.000
0.000
6.074
5.942
5.936
5.939
5.944
5.928
0.114
0.003
0.002
0.002
0.010
0.000
0.959
0.194
0.000
0.000
0.361
1.362
0.119
0.000
0.000
0.000
0.342
0.000
0.522
1.679
0.219
0.000
0.000
0.000
0.419
0.579
0.315
0.529
0.058
0.000
0.421
0.222
0.000
0.000
3.829
3.254
2.991
0.744
0.004
0.000
Income
1993-1997
1998-2002
Low
Medium
High
Constant
17.328
2.972
18.311
2.963
0
4.470
2.486
6.655
2.467
90.363
2.457
Table 1 Summary of Individual Regression Results
Variable Cohorts
Race
All Race
White
Black
Age at Diagnosis 20-29
30-39
40-49
50-59
60-69
70+
Type of district
Metropolitan
Rural
Stage
In situated
Regional
Distant
Grade
Well differentiated
Moderately
differentiated
Poorly differentiated
Undifferentiated
Education
High
Medium
Low
Marital Status
Married
Single
Year at Diagnosis 1973-1977
1978-1982
1983-1987
1988-1992
1993-1997
1998-2002
Income
Effect on CauseSpecific Survival
0
0.091
-4.270
0
5.226
7.797
7.599
7.691
5.942
0
0.459
0
-7.668
-62.147
0
-1.536
-8.874
-9.265
0
-0.417
-0.163
0
-0.979
0
.754
5.239
8.202
9.139
9.921
1.673
Standard Error
0.000
0.000
0.072
0.007
0.000
PP value
v
a
l
u
e
0.147
0.676
0.537
0.000
2.380
2.333
2.332
2.333
2.334
0.028
0.001
0.001
0.001
0.011
0.442
0.299
0.242
0.903
0.000
0.000
0.167
0.000
0.265
0.795
0.000
0.000
0.169
0.349
0.014
0.641
0.170
0.000
1.542
1.315
1.236
1.216
1.210
.545
0.625
0.000
0.000
0.000
0.000
0.002
Constant
79.726
2.852
Table 2 Summary of ??? Results
MISSING FROM TEXT IS A DESCRIPTION OF HOOW TO READ THESE TABLES
0.000
Download