Selective Sampling to Overcome Skewed a priori Probabilities

advertisement
Ennett CM, Frize M. Selective sampling to overcome skewed a priori probabilities with neural networks. Proc
AMIA Symp 2000;(20 Suppl):225-9.
Selective Sampling to Overcome Skewed a priori Probabilities
with Neural Networks
Colleen M. Ennett1, MASc, Monique Frize1,2, PhD, PEng
Systems & Computer Engineering, Carleton University, Ottawa, ON, Canada
2
School of Information Technology & Engineering, University of Ottawa, Ottawa, ON, Canada
1
Highly skewed a priori probabilities present
challenges for researchers developing medical
decision aids due to a lack of information on the rare
outcome of interest. This paper attempts to overcome
this obstacle by artificially increasing the mortality
rate of the training sets. A weight pruning technique
called weight-elimination is also applied to this
coronary artery bypass grafting (CABG) database to
assess its impact on the artificial neural network’s
(ANN) performance. The results showed that
increasing the mortality rate improved the sensitivity
rates at the cost of the other performance measures,
and the weight-elimination cost function improved
the sensitivity rate without seriously affecting the
other performance measures.
Keywords
Artificial neural networks; Coronary
artery bypass grafting; Cardiac surgery; Weightelimination; Outcomes estimation.
INTRODUCTION
A challenge associated with medical databases is the
ability to predict rare outcomes. Certainly, rare
outcomes are of great interest since their occurrence
is generally undesirable such as the presence of a
disease or post-surgical mortality. Given that their
occurrence is rare, there is limited data upon which to
draw conclusions about their causes. Post-surgical
mortality of coronary artery bypass graft (CABG)
patients is an example of a rare outcome. Generally,
CABG mortality rates reported in journal articles are
less than 5 percent. Although this is a desirable rate
from a medical point of view, the low mortality rate
makes accurate mortality predictions quite
challenging.
Previous risk models have been developed for
predicting CABG surgery mortality using additive
[1,2] and statistical models [3-8], but in recent years
artificial neural networks (ANNs) have entered into
this area of medical research [9,10].
The question posed in this paper is: What is the effect
of repeatedly presenting rare outcomes to an ANN
using random selective sampling? In other words, the
objective is to study the effect of changing the
mortality rate distribution of the training datasets by
entering the cases of the under-represented class
multiple times into an artificial dataset. Artificially
changing the a priori statistics (i.e., increasing the
mortality rate) of the datasets is an attempt to combat
the problems that arise with the low representation of
the nonsurvivors (those who die) of CABG surgery
when predicting in-hospital mortality using ANNs
[11].
The highly skewed distribution poses serious
difficulties for the intended ANN to be investigated,
as shown by preliminary results done by the Medical
IDEAS (Intelligent DEcision Aid Systems) Research
Group [MIRG] researchers using an adult intensive
care database [12,13]. Ennett and Frize [13]
discovered a region of inconsistent network
performance (i.e., sometimes the network would
learn patterns and other times it would classify
everything as belonging to the group with the highest
a priori probability) when the underrepresented class
made up approximately 15 percent of the cases. At
representations of less than 15 percent, the ANN
failed to learn anything. Instead, it classified all
patients as belonging to the larger outcome class (i.e.,
those patients requiring less than 8 hours of artificial
ventilation). In theory, this technique of changing the
a priori statistics of the training set may provide a
solution to the challenges faced when dealing with a
drastically underrepresented outcome in a two-class
problem.
The application of this ANN algorithm with weightelimination to a medical database has given
successful results in a previous study [12,14], but it
also put into question whether the weight-elimination
cost function would prove useful when applied to
another database. The problem involved in the
research by Trigg [12] and Frize et al. [14] was to
identify intensive care unit (ICU) patients who
require 8 or more hours of mechanical ventilation.
This involved an approximately 70-30 percent
distribution of the outcomes (i.e., 30 percent of the
patients required 8 or more hours of ventilation). This
is a much larger percentage of the underrepresented
category than the problem presented by the CABG
database. The second question was: Will the weight-
Ennett CM, Frize M. Selective sampling to overcome skewed a priori probabilities with neural networks. Proc
AMIA Symp 2000;(20 Suppl):225-9.
elimination cost function (developed by Weigend et
al. [15,16]) improve the performance of a
feedforward backpropagation ANN used to classify
CABG mortality?
METHODOLOGY
The experiments carried out for this paper used the
cardiac patient database from the San Francisco Heart
Institute (SFHI) in Daly City, CA. The original
database contained 7050 patients who underwent all
types of open-heart surgery between January 1, 1985
and June 30, 1994. The only surgical cases included
in this analysis were patients who underwent CABG
surgery, CABG plus valve surgery, or CABG plus
repair surgery, leaving 6325 cases. There were only 9
cases collected in 1985 so those cases were
eliminated from consideration because they were not
representative of the surgical practice in 1985. Also,
some technical advances were introduced at the
hospital during 1990-91 that would have influenced
the outcomes in 1992 and afterwards, so the data
from 1992-94 were also removed. This left a database
of 3526 patient cases dating from 1987 to 1991.
Using 32 commonly chosen CABG mortality risk
factors as inputs (listed in Table 1), there were 3427
complete patient cases.
Table 1: Input variables
Variable
date of surgery
patient’s age
female gender
mitral valve disease operation
aortic valve disease operation
emergent/urgent priority for surgery
failed PTCA prior to surgery
reoperation
renal disease
ventricular aneurysm
left main disease
ejection fraction:
normal
moderate
severe
mitral valve regurgitation
aortic valve stenosis
tricuspid valve disease
hypertension
pre-operative intraaortic balloon pump
previous myocardial infarction (MI)
Prevalence
N/A
N/A
0.247
0.021
0.031
0.229
0.047
0.1
0.062
0.003
0.195
0.660
0.250
0.090
0.041
0.025
0.001
0.621
0.042
0.532
evolving MI
history of congestive heart failure
unstable angina
cerebrovascular disease
peripheral vascular disease
triple vessel disease
obesity
small stature
chronic obstructive pulmonary disease
diabetes
cardiogenic shock
hypercholesterolemia
previous cerebrovascular accident
anemia
0.013
0.102
0.629
0.119
0.121
0.74
0.107
0.018
0.132
0.228
0.007
0.531
0.039
0.165
The database was randomly separated into training
and test sets such that two-thirds of the database was
used for training (2254) and the remaining third
became the test set (1173). The mortality rate of the
training set was 3.7 percent, and that of the test set
was 3.8 percent.
A feed-forward neural network with one hidden layer
(i.e., a three-layer ANN) was trained using the
backprogation training algorithm with the hyperbolic
tangent transfer function for use in the experiments
for this paper. The weight-elimination cost function
developed by Weigend et al. [15,16] was added to the
error function. The weight-elimination function
(Equation 1) is a penalty term that attempts to reduce
the weights of the least important variables to zero
(or near zero) to ultimately remove their influence
from the network. It is a weight-pruning technique.
wij 2
E (W )  E 0 (W )   
ij
w0 2
wij 2
1
w0 2
(1)
E(W) is the combined cost function that includes the
initial cost function, E0(W) (typically, the sum-ofsquared-errors (SSE)), and the weight-elimination
term (the second term in Equation 1). Here, W
represents the weight vector,  is the weight-decay
constant, and wij indicates the individual weight of
the ANN.
The objective was to artificially increase the number
of nonsurvivors (and hence the mortality rate) in the
training sets. To achieve this goal, the datasets were
separated according to their outcome: death or
Ennett CM, Frize M. Selective sampling to overcome skewed a priori probabilities with neural networks. Proc
AMIA Symp 2000;(20 Suppl):225-9.
survival. This approach was necessary due to the
small number of nonsurvivors, and to ensure that the
desired distribution could be achieved. By combining
the idea of separating the patient records according to
the outcome used by Katz et al. [17] with the random
sampling technique so that a specific mortality rate
could be obtained, the artificial datasets for the SFHI
coronary artery surgery database were created.
Therefore, once the original database was divided
into training and test sets, these sets were further
subdivided into those who survived the surgery and
those who did not. The artificial sets were formulated
from these “categorized” datasets.
when classifying CABG patients. The no weightelimination networks are set to the same network
parameters (i.e., number of hidden nodes,
momentum, learning rate, and weighting factor) as
the weight-elimination networks but with the weightelimination function deactivated. The objective of the
weight-elimination cost function is to reduce the size
of the connection weights to eliminate less useful
input variables. Theoretically, this course of action of
reducing the number of weights in the network, and
hence the network’s complexity, is expected to
improve the network’s classification performance
[12,14,15,18].
To artificially increase the percentage of
nonsurvivors in the datasets, a simple program that
performed random sampling with replacement was
used. Since the sample size can affect the
performance of a model, the total number of cases in
the training and test sets was kept constant. Given the
number of cases, and the desired percentage of
nonsurvivors for the particular dataset, it was
possible to determine how many patients who did not
survive the surgery would be included in that
particular set. The artificial training sets had
mortality rates of 10, 20, 30, 40 and 50 percent. This
offered a range of training set mortality distributions
for the experiments to be performed. Table 2
describes each dataset with the number of
nonsurvivor and survivor patient cases. The
nonsurvivor and survivor cases were chosen
separately, but in each situation, the patient records
were chosen randomly with replacement.
The networks were compared on the basis of their
sensitivity, specificity and correct classification rate
(CCR) when using zero as the cut-off value (because
the hyperbolic tangent function ranges from -1 to 1),
and the area under the Receiver Operating
Characteristic (ROC) curve. As well, the
performance was contrasted against the CCR of a
constant predictor (CP). A CP is a simple statistical
benchmark that classifies all test set cases as
belonging to the outcome class with the highest a
priori probability of the training set.
Table 2: Distributions of the training datasets
# cases
# cases
Dataset
nonsurvivor survivors Total
3.7 % true
distribution
83
RESULTS
The results of the weight-elimination ANN
experiments where the training set a priori
distributions were altered are presented in Table 3
according to their sensitivity, specificity, CCR and
area under the ROC curve (C-index). Table 4
presents the performance of the weight-elimination
and no weight-elimination ANNs at the point of best
performance of the weight-elimination networks. The
CCR of the CP in these experiments was 96.2 percent
because the higher a priori class of the training set
was the survivors, so the constant predictor would be
correct 96.2 percent of the time if it classified every
test case as a survivor.
2171
2254
10% mortality rate 225
2029
2254
20% mortality rate 451
1803
2254
30% mortality rate 676
1578
2254
40% mortality rate 902
1352
2254
Table 3: Performance measures of the test set for the
best-performing double-layered weight-elimination
networks
50% mortality rate 1127
1127
2254
Expt
MR=3.7 MR=10 MR=20 MR=30 MR=40 MR=50
Sens
(%)
10.85
18.18
43.55
59.09
70.45
72.73
Spec
(%)
98.26
96.38
93.83
82.37
74.67
68.47
CCR
(%)
94.98
93.01
91.94
81.5
74.51
66.47
Cindex
0.9387
0.907
0.8978
0.8054
0.7345
0.6737
The performance of the weight-elimination ANNs
that were trained using the true mortality rate
(MR=3.7 percent)) and trained on the set with an
artificial mortality rate of 20 percent (MR=20
percent) were compared with their no weightelimination counterparts (using only the SSE cost
function) to identify the effect of the weightelimination penalty term on the ANN’s performance
* MR = mortality rate
Ennett CM, Frize M. Selective sampling to overcome skewed a priori probabilities with neural networks. Proc
AMIA Symp 2000;(20 Suppl):225-9.
Table 4: Comparison of ANNs with and without
weight-elimination (WE) using the best-performing
networks
Expt
Sens (%)
Spec (%)
CCR (%)
C-index
MR=3.7
WE
10.85
98.26
94.98
0.9387
MR=3.7
no WE
8.94
99.31
95.92
0.9453
MR=20
WE
43.55
93.83
91.94
0.8978
MR=20
no WE
24.05
93.22
90.63
0.8844
* MR = mortality rate
DISCUSSION
Given the results presented in Table 3, it is clear that
training with a higher-than-normal prevalence of the
under-represented outcome (here, in-hospital death)
improves the sensitivity of the ANN using the test
data at the cost of poorer results for the other
performance measures. Typical C-index values for
CABG mortality risk models range from 0.70 to 0.80.
The some of the results found in these experiments
exceeded these C-index values, however, their
clinical relevance may be questionable given that few
of the nonsurvivors are correctly classified (as noted
by the sensitivity rate). The ANNs trained on higher
prevalences are of more clinical value because more
nonsurvivors are correctly classified. The higher risk
patients are always more difficult to classify using
risk stratification models, therefore, preference
should be given to a model that is better able to
classify these individuals [19].
The results of Table 4 indicate that the use of the
weight-elimination cost function improved the
ANN’s
classification
performance
of
the
nonsurvivors (the sensitivity) without a dramatic
effect (if any) on the other performance parameters.
One of the limitations of this study is the decreased
specificity that occurred when the mortality rate was
increased, and sometimes when weight-elimination
was used. Certainly, the best model would have high
sensitivity and high specificity, whereas the models
presented here permitted a lower specificity at the
cost of a higher sensitivity.
The weight-elimination cost function improved the
sensitivity of the ANNs while only having a
moderate effect on the other performance measures.
FUTURE WORK
Researchers from MIRG continue to work on
applications of the weight-elimination technique to
different medical databases including the neonatal
ICU [20,21]. Future work will also focus on
developing a useful model that has a better balance
between sensitivity and specificity.
Acknowledgments
This work was completed with the support of grants
from the Medical Research Council (MRC) of
Canada and the Natural Sciences and Engineering
Research Council (NSERC) of Canada. The authors
would also like to acknowledge and thank Dr.
Richard E. Shaw for providing access to the San
Francisco Heart Institute patient database.
References
1.
2.
3.
4.
5.
CONCLUSION
6.
Increasing the mortality rate of the training set when
the a priori probabilities are skewed will increase the
sensitivity of the weight-elimination ANN at the cost
of the other performance measures. A model with a
higher sensitivity may be of more clinical value since
the nonsurvivors are more difficult to identify. Such a
model would highlight the patients who may be at a
higher risk of death following CABG surgery.
7.
Parsonnet V, Dean D, Bernstein AD. A method
of uniform stratification of risk for evaluating the
results of surgery in acquired adult heart disease.
Circulation 1989;79(Suppl 1):I3-I12.
Higgins TI, Estanganous FG, Loop FD, Beck GJ,
Blum JM, Paranandi L. Stratification of
morbidity and mortality outcome by preoperative
risk factors in coronary artery bypass patients. A
clinical severity score. JAMA 1992;267:23442348.
Edwards FH, Graeber GM. The theorem of
Bayes as a clinical research tool. Surgery,
Gynecology & Obstetrics 1987;165:127-129.
Edwards FH, Albus RA, Zaitchuk R, et al. Use
of a Bayesian statistical model for risk
assessment in coronary artery surgery. Ann
Thorac Surg 1988;45:437-440.
Edwards FH, Clark RE, Schwartz M. Coronary
artery bypass grafting: The Society of Thoracic
Surgeons national database experience. Ann
Thorac Surg 1994;57:12-19.
Pliam MB, Shaw RE, Zapolanski A.
Comparative analysis of coronary surgery risk
stratification
models,
J
Invas
Cardiol
1997;9:203-222.
Shroyer ALW, Grover FL, Edwards FH. 1995
coronary artery bypass risk model: the Society of
Thoracic Surgeons adult cardiac national
database. Ann Thorac Surg 1998;65:879-884.
Ennett CM, Frize M. Selective sampling to overcome skewed a priori probabilities with neural networks. Proc
AMIA Symp 2000;(20 Suppl):225-9.
8.
9.
10.
11.
12.
13.
14.
15.
Shroyer ALW, Plomondon ME, Grover FL,
Edwards FH. The 1996 coronary artery bypass
risk model: The Society of Thoracic Surgeons
adult cardiac national database. Ann Thorac Surg
1999;67:1205-1208.
Lippmann RP, Shahian DM. Coronary artery
bypass risk prediction using neural networks.
Ann Thorac Surg 1997;63:1635-1643.
Orr RK. Use of a probabilistic neural network to
estimate the risk of mortality after cardiac
surgery. Med Decis Making 1997;17(2):178-185.
Ennett CM. Coronary surgery mortality
prediction using artificial neural networks.
Master’s thesis, School of Information
Technology and Engineering, University of
Ottawa: Ottawa, ON, 1999.
Trigg HCE. An investigation of methods to
enhance the performance of artificial neural
networks used to estimate ICU outcomes.
Master’s thesis, Department of Electrical
Engineering, University of New Brunswick:
Fredericton, NB, 1997.
Ennett CM, Frize M. An investigation into the
strengths and limitations of artificial neural
networks: an application to an adult ICU patient
database. Proc of AMIA’98 Fall Symposium,
Orlando, FL, Nov 7-11, 1998:998.
Frize M, Trigg HCE, Solven FG, Stevenson M,
Nickerson BG. Decision-support systems
designed for critical care. Proceedings of the
1997 AMIA Fall Symposium. 1997:855.
Weigend AS, Rumelhart DE, Huberman BA.
Back-propagation, weight-elimination and time
series prediction. In DS Tourestzky, JL Elman,
TJ Sejnowski, GE Hinton, eds. Proc of the 1990
16.
17.
18.
19.
20.
21.
Connectionist Models Summer School. Morgan
Kaufmann: San Mateo. 1990:105-116.
Weigend AS, Rumelhart DE, Huberman BA.
Generalization by weight-elimination with
application to forecasting. In RP Lippmann, J
Moody, DS Touretzky, eds. Advances in Neural
Information Processing Systems (NIPS’90),
Volume 3. Morgan Kaufmann: San Mateo.
1991;3:875-882.
Katz S, Katz AS, Lowe N, Quijano RC. Neural
net_bootstrap hybrid methods for prediction of
complications in patients implanted with
artificial heart valves. J Heart Valve Dis 1994
Jan;3(1):49_52.
Frize M, Wang L, Ennett CM, Nickerson B,
Solven FG, Stevenson M. New advances and
validation of knowledge management tools for
critical care using classifier techniques.
Proceedings of the AMIA '98 Annual
Symposium, Orlando, FL: Nov 7_11,
1998:553_558.
Clark RE. Calculating risk and outcome: The
Society of Thoracic Surgeons database. Ann
Thorac Surg 1996;62:S2-S5.
Tong Y, Frize M, Walker R. Estimating
ventilation using artificial neural networks in
intensive care units. First Joint BMES/EMBS
Conference, Atlanta, GA: Oct 12_16, 1999.
Walker R, Frize M, Tong Y. Data analysis using
artificial neural networks and computer_aided
decision_making in the neonatal intensive care
unit. Paediatric Academy Society 1999 Annual
Meeting, San Francisco, CA: May 2, Paediatric
Research 1999;45:231A.
Download