Combining Expert Representations and Neural

From: AAAI Technical Report SS-94-01. Compilation copyright © 1994, AAAI ( All rights reserved.
Expert Representations
and Neural
of Clinical
Stephen I. Gallant
Channing H. Russell
David M. Fram
Michael F. Johnston
Belmont Research Inc.
Cambridge, Massachusetts
prediction tasks, with little concemover
independenceof features.
Neuralnetworkvisualization and learning
algorithms provide an attractive approach
to displayingclinical data and predicting
patient outcomes. However,the raw data
in medicaldatabasesis typically not in a
formthat permits easy application of such
algorithms. For example, data may
consist of scattered and noisy individual
measurementsthat are determined by
available tests and instrumentation. In
order to use this data for visualization or
for prediction by humanor machine,it
maybe necessary to combineseveral
measurementsover appropriate time
Oncean appropriate set of features has
been determined and software produced
to transformraw data to feature vectors,
all remainingprocessingis automatic.
Thefeature vector representation is well
suited for building neural networkmodels
and for prediction, alarmingand display
based upon those models. Note that it
maybe helpful to tune the feature set
based upon insight gained from models
and their predictions.
For this paper wedecided to use a
diabetes dataset to experimentwith an
expert representation for cluster-based
visualization. Thenext Section gives a
quick overview of our work and some
important details. Anexampleis then
presented, followed by concluding
Onewayto overcomethis problemis by
havinga humanexpert create an expert
representationconsisting of a fixed-length
feature vector. Eachcomponentof the
vector is a relevant scalar, Boolean,or
enumeration(e.g. "meal size") value that
is useful for the problemat hand. Vector
componentsare determined by a
knowledgeablehumanexpert, where each
componentmight be derived from several
different types of rawclinical data and/or
data from several time periods. For
examplewith the diabetes dataset used
here, the componentsincluded the total
insulin dose over the past three days,
several blood glucose measurements,etc.
Vector componentscan be partly
redundant, because subsequent clustering
and learning algorithms can weight
componentsappropriately. This eases the
task of the expert, becausehe or she can
concentrateon features sufficient for
2 Experiments
Oneof the authors (Johnston), an MD,
constructeda set of features for
characterizingthe status of a patient on a
particular day. Wethen used Kohonen’s
Topology Preserving Mapalgorithm
[Kohonen82a,b,88; see also Gallant 93]
to cluster the data. Finally, wewrote
codeto display the set of clusters with
respect to particular queries.
normalizedby subtracting the meanfor
the data set and dividing by the standard
Feature Selection
Thedata set consisted of reports from 70
patients. From8 days to 166 days of
data wereprovidedfor each patient,
consisting of blood glucose
measurements,insulin type and dosage,
meals, exercise, and hypoglycemic
symptoms.The completeness of the data
varied widely amongpatients.
Kohonen’s Topology Preserving Maps
Kohonen’salgorithm is especially
interesting for visualization, becauseit
simultaneouslyclusters the data while it
arrangesthe clusters into a planar grid
such that neighboringclusters are similar.
Theresulting grid of clusters is then wellsuited for various types of visualization
[Ferran and Ferrara 92; Hudsonet. al.
Oneof our primaryinterests is the
visualization and interpretation of timeoriented clinical data. Fromthe raw data,
we computeddaily parameters
representing the current day, the previous
day, and averages over the previous three
days. Thefeature vectors for input to the
learning algorithm each represented one
patient-day and contained the following
fields for eachof the three timeintervals:
hypoglycemic symptoms present/unknown
changein insulin dose
from previous day
patient number
maximumblood glucose
minimumblood glucose
mean blood glucose
total insulin dose
regular insulin dose
NPHinsulin dose
UltraLenteinsulin dose
By this method, we constructed 3721
examplevectors of 39 elements. Each
field in the feature vectors wasthen
Kohonen’sTopology Preserving Mapsis
an iterative neural networkalgorithmthat
assigns cluster centroids to a predefined
grid of clusters. It worksby selecting a
training exampleat random,and then
movingthe closest cluster centroid and its
neighboringcentroids in the grid a step
toward the training example.For this
experimentweused 5000iterations to
cluster the 3000+cases into a 6 by 6 grid
(represented by 36 centroids).
Oncecentroids have been determined, we
can "query"the clusters using a single
feature (e.g. "total insulin dose")or any
subset of features. Wesimply form a
queryvector by setting features of interest
to 1.0 and others to 0, take dot products
with all cluster centroids, andfinally
display this information. Moregenerally,
wecan select a subset of cases of interest
and sumtheir individual case vectors to
forma queryvector, or use a cluster
centroid as a query vector to identify
similar clusters.
For visualization purposes, we can
display clusters as circles whereeach
diametercorrespondsto the dot product
of the query vector with that
correspondingcluster. Eachcircle
contains "spokes" correspondingto
individual patients or groupsof patients.
(In the example,each spokerepresents
patients.) In an interactive environment,
® ®
it is possible to "browse"clusters by
clicking on individual cases or small
groups, as represented by the spokes.
Software Environment
All programmingwas done in Dynamic
C++using BTL(Belmont Toolkit
Language).This is a simplified version
of C++with convenient graphics
libraries, as well as an integrated run-time
debugger, class browser, and automatic
garbagecollector for quick prototyping.
AlthoughDynamicC++is an interpreted
language, it waseasily able to compute
clusters for the 3000+training examples.
Code developmentand graphics
interfaces weregreatly aided by the builtin tools andlibraries.
It is also possible to "label" regions of the
grid accordingto the queries that make
them"light up." this producesa set of
characterizations for clusters accordingto
those regions in whichthey appear.
Query Creation
Several strategies can be used to create
probes to query the cluster array. Unit
vectors on the feature axes can highlight
clusters based on single parameters. One
or moreexamplevectors maybe selected
and combinedto form more complex
queries. Weconstructed queries by both
It is also possibleto use the expertrepresentation feature vectors for
prediction. For example, we might be
interested in predicting pre-dinner blood
glucose levels or possibly hypoglycemic
episodes. Becausethe data has been
represented by feature vectors, we can
use standard neural networkalgorithms
(or other machinelearning techniques) for
such tasks.
Queries based on selected example
vectors were formedby sorting the
normalizedinput dataset using one
parameteras a sort key, then combining
the first or last Nvectors as a probe,
where N was usually 100 (out of 3721).
Whenthese probes were comparedto unit
vectors with a single nonzeroparameter,
the example-basedqueries usually
showedmoreconcentrated cluster groups
with larger dot products, but both
producedqualitatively similar results.
Thecombinationof expert
representations, Kohonen-styledisplays,
and the notion of queryingclusters gives
an appealingapproachto visualization
and browsingclinical data.
Thetwo figures are plots created with
query vectors based on lowest mean
blood glucose measurements(top) and
highest meanblood glucose
measurements(bottom). This illustrates
the general propertiesof the cluster set,
using segregation on glucose level as an
example. Other related probes were
consistent with this topology.
Theroles of these proceduresare
important. The expert representation
makesthe data accessible to machine
learning algorithms, the clustering
arrangesdata and reduces dimensionality,
and the query focuses the output on
similar cases of interest. In a clinical
setting the clustering correspondsto
identifying classes or variants of a disease
or syndromeand to assigning patients to
a particular cohort basedon their
Queries based upona single patient may
also be informative. Whenthe clustering
results in clinically relevant subdivisions
within a population, this gives an
automaticpatient classification tool.
Althoughwedecided to concentrate on
cluster-basedvisualization in this paper,
Kohonen,T (1982a). "Clustering,
taxonomy,and topological mapsof
patterns." Proceedingsof the 6th
International Conferenceon Pattern
Recognition October 1982:114-128.
the samedata can be used for predicting
clinical outcomesby neural networks,
and subsequent alarming/reminding based
uponthose predictions.
Kohonen,T (1982b). "Self-organized
formationof topologically correct feature
maps." Biol Cybern 43: 59-69.
Ferran, EAand Ferrara, P (1992).
"Clustering proteins into families using
artificial neural networks." CABIOS
Kohonen,T (1988). Self-Organization
and Associative Memory,2nd Ed. Berlin,
Gallant, S. Neural NetworkLearning
and Expert Systems, MITPress, 1993.
Spilker, B., Crusan, C., Pool, J.,
Russell, C. and Fram, D. New Software
Technologyfor Visualizing Clinical Trial
Data. Drug News& Perspectives, Vol.
5, No. 5, June 1992, 298-305.
Hudson,B, Livingstone, DJ, et al.
(1989). "Pattern recognition display
methodsfor the analysis of computed
molecularproperties." Journal of
Computer-AidedMolecular Design 3: 5565.