From: AAAI Technical Report SS-94-01. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
Predicting risk of an adverse event in complex medical
data sets using fuzzy ARTMAP
network.
Natalya Markuzonl,* 2,
Stephan A. Gaehde2JArlene S. Ash
2Gail A. Carpenter1{ MarkA. Moskowitz
IDepartment of Cognitive & Neural Systems,
Boston University, 111 CummingtonSt., Boston, MA02215
2Health Care Research Unit, Section of General Internal Medicine,
Evans Memorial Department of Medicine,
Boston University Medical Center, Boston, MA02118
Abstract
emerging patterns. Here we introduce a neural
network similar to fuzzy ARTMAP
(Carpenter et
al., 1992). Fuzzy ARTMAP
is a self-organizing
Fuzzy ARTMAP
is a supervised learning syspattern classification modelfor supervised learntem which includes nonlinear dynamics in the ing and prediction. The performance of the netlearning process. Weintroduce a new testing
workis comparedto that of the logistic regresprocedure which allows the system to estimate
sion model, using a cholecystectomy and diabetes
the probability of an outcome. Simulations il- data set. The task for each model is to predict
lustrate the system performance in estimating
the occurrence of an adverse event based on inrisk in medical procedures. The results are com- formation about the patient provided by an input
pared to the performance of the logistic regres- vector. Anadverse event is a complication followsion model. It is shown that both models have ing a medical procedure. Fuzzy ARTMAP
was
similar explanatory power.
modified to provide probabilistic predictions, as
does the logistic regression model. Performance
of the two models was similar on these data. This
Introduction
paper includes a short description of the network
architecture, the data organization, and the simThe application of neural networks to large com- ulation results.
plex biomedical data sets can improve predictive
performance compared to regression modeling.
Neural networks can model both linear and nonFuzzy ARTMAP and probalinear relationships and update by incorporating
bilistic
predictions
*Supportedin part hy British Petroleum(BP 89A1204).
tSupportedin part by VAFellowshipTrainingGrant. Fuzzy ARTMAP
(Figure 1) includes a pair
tSupported in part by APd)A(ONRN00014-92-J- Fuzzy ARTmodules (ART~ and ARTb) (Carpen4015), the National Science Foundation(NSFIRI-9000530),and the ()ffice of NavalResearch(ONR
N00014-ter, Grossberg and Rosen, 1991) linkedab together
via an inter-ART associative memoryF that is
91-J-4100).
87
called a mapfield. During supervised learning,
ART,receives a stream {a (p)} of input patterns
and ARTbreceives a stream {b(P)} of patterns,
where b(P) is tile correct prediction given a(P).
These modulesare linked by an associative learning network and an internal controller that ensures autonomoussystem operation in real time.
Tile controller is designed to create the minimal
numberof ART,recognition categories needed to
meet accuracy criteria.
ab
mapfield F
.......
.........
i7 ........
match
tracking
;:
ARTt,
...................
T
wzX I rN~l
Vigilance parameter p‘‘ calibrates the mini,
mumconfidence that ART‘‘must have in a recognition category, or hypothesis, activated by an input a (p) in order to ART,to accept that category,
I uI
rather than search for a better one through an automatically controlled process of hypothesis testFigure 1: Fuzzy ARTMAP
Architecture
ing. Lowervalues of p‘‘ enable larger categories
to form. These lower p, values lead to a broader
generalization and a higher degree of (:ode com- time. This feature combined with complement
pression. A predictive failure at ARTbincreases coding and fuzzy logic, lead to increasing sizes of
category.
Pa by the minimum amount needed to trigger
hypothesis testing at ART‘‘, using a mechanism A new testing procedure is introduced which
called match tracking. Matchtracking sacrifices allows the networkto, instead of predicting one of
the minimumamount of generalization necessary manypossible outcomes, provide probabilities for
to correct the predictive error. Hypothesis test- each outcome. During testing, instead of selecting leads to the selection of a new ART‘‘ cate- ing just one winner in the ART,, module, K segory, wh{chfocuses attention on a newcluster of lected winners combinetheir predictions to yield
a(p) input features that is better able to predict a probabilistic prediction score. The influence of
b(v). Match tracking allows a single ARTMAPeach category participating in the prediction is
system to learn a different prediction for a rare weightedby the level of the category’s activation,
event than for a cloud of similar frequent events as well as by the total numberof training patin which it is embedded.
terns which participated in the formation of the
The general ARTMAP
network learns to clas- category. Layer G" stores the numberof training
sify both binary and analogue vectors. This is inputs for each F~" category. Improvedprediction
accomplished by incorporating fuzzy set-theory is achieved by training the system several times
using different orderings of the input set.
operations (Zadeh, 1965) into the dynamics
ARTmodules which is possible due to close formal similarity between tile computations of fuzzy
subsethood and ARTcategory choice, resonance,
and learning. A normalization procedure called
complementcoding uses on cells and off cells to
represent the input pattern, and preserves the
individual feature amplitudes while normalizing
the total on cell/off cell vector. Learningis stable
because all adaptive weights can only decrease in
88
An ARTMAP
voting strategy is based on the
observation that fast learlfing typically leads to
different adaptive weights and recognition categories for different orderings of a given training
set, even whenthe predictive accuracy of all simulations is similar. The different internal category structures cause the set of test set items
where errors occur to vary from one simulation to
the next. The voting strategy uses an ARTMAP
system that is trained several times on one input
set with different orderings. Tile final prediction
for a given test set item is tile one madeby the
largest numberof simulations.
glucose level in the evening, is binary. Data from
different patients were not combinedso that tile
network was trained individually for each patient.
The network was tested on three different patients.
The two data sets described above offer different approaches- in the first one weare predicting
the behavior of a new patient based on the exTwodata sets were tested with the network. The perience with previously observed patients, while
first database was a randomsample of cholecys- in the diabetes data we make predictions based
tectomy cases from 1985 and 1986, selected from on a patient’s previous history only.
tile Medicare files of seven states. Abstracted
data composed of tile presence or absence of
about 250 key clinical findings (KCF’s) collected
Results
using a modified Medis(Iroups protocol (Iezzoni
and Moskowitz, 1988), were provided for each
of 3182 patient. Sixteen types of severe adverse Both the neural network and the logistic regresevents (including mortality within 30 days of ad- sion modelprovide a probabilistic estimate of the
mission), which are major complications occur- occurrence of an adverse event while the actual
ring after cholecystectomy, were defined. The oc- outcome in both data sets is binary. Model percurrence of any severe adverse event was marked formance was evaluated by the C-statistic and Ras positive in the binary outcome vector. 16.4% squared. The C-statistic maybe interpreted as
of cases had at least one adverse event. The di- the likelihood that the modelcorrectly assigns a
mensionality of the input vector was reduced to higher problem rank when comparing a randomly
16 from tile 62 most important KCF’s by auto- selected pair of problem/non-problemcases. The
also is the area under the Receiver
matic feature extraction preprocessing. KCF’s (;-statistic
significantly associated (p < .05) with each of the Operating Characteristic curve which shows the
16 types of adverse events were summed.These true-positive rate against the false-positive rate
16 sums and all age variable were used to con- for a given test. R-squared is a measure of the
struct all input vector. The models task was to fraction of variation in outcomethat is accounted
predict tile probability of occurrence of a positive for by a given model. It is an estimate of the explanatory power of a model.
outcomeafter presenting tile input vector.
Data Description
For the cholecystectomy database both modTile other database contained data from diabetes patients. Wepredict an abnormally high els were trained on 4/5 of the data and tested
eveI~ing glucose level based on measurementsof on the remaining 1/5 of the data. Cross valipatient’s blood glucose level and insulin doses dation was performed by randomly dividing the
during preceding 24 hour period. Weemployed data into 5 equal parts and successively traindata about previous evening, morningand tile af- ing models to fit each 4/5. The results of tile
ternoon blood glucose level and regular and NPH neural network and tile logistic regression model
insulin doses taken at these times as input in- were very similar in terms of measures provided
formation to models. Thus tile temporal infor- (Table 1). (J-index for the regression model
mation about changes in blood glucose level and equal to 0.68, and r "2 = 0.065. The C-index for
model is equal to 0.68, and
insulin doses are combined to construct a posi- the fuzzy ARTMAP
2
r
=
0.062.
This
is
a useful level of explanatory
tionally organized analogue input vector. The
outcome, abnormally high versus normal blood powerfor this task.
89
regression modeling ARTMAPmodeling
2C-index
2C-index
R
R
0.065
0.68
0.062
0.68
Table 1:
database
Results with tile
patient 1
patient 2
patient 3
regression
modeling
C-index 2R
0.88
0.30
0.76
0.07
0.03
0.65
analog multidimensional maps. IEEE Transactions on Neural Networks, 3, 698-713.
cholecystectomy
Iezzoni, L.I., and Moskowitz, M.A. (1988).
MedisGroups:a clinical assessment. JAMA,
26O, p. 3195.
Zadeh, L. (1965). Fuzzy sets. Information Control, 8, 338-353.
ARTMAP
modeling
C-index R2
0.91
0.27
0.71
0.09
0.17
0.73
Table 2: Results for different patients with the
diabetes database
Tile model performance for different patients
in diabetes data set are very different (Table 2).
This may partially be explained by physiologic
variability amongpatients. Both tile logistic and
neural network models were able to successfully
predict the elevated blood glucose level based on
previously learned variations over the preceding
month. The neural network shows superior performance for patient 3, and performance similar
to tile logistic: regression modelfor patient 1 and
patient 2.
Weconclude that modified fuzzy ARTMAP
is
a good alternative for estimating of risk in medical procedures.
References
Carpenter, G.A., Grossberg, S., and Rosen, D.B.
(1991). Fuzzy ART: Fast stable learning
and categorization of analog patterns by
an adaptive resonance system. Neural Networks, 4, 759-771.
Carpenter, G.A., Grossberg, S., Markuzon, N.,
Reynolds, J.H., and Rosen, D.B. (1992).
Fuzzy A RTMA P: A neural network architecture for incremental supervised learning of
90