From: AAAI Technical Report SS-94-01. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. Combining Expert Representations and Neural Visualization of Clinical Data Networks for Stephen I. Gallant Channing H. Russell David M. Fram Michael F. Johnston Belmont Research Inc. Cambridge, Massachusetts 1 Introduction prediction tasks, with little concemover independenceof features. Neuralnetworkvisualization and learning algorithms provide an attractive approach to displayingclinical data and predicting patient outcomes. However,the raw data in medicaldatabasesis typically not in a formthat permits easy application of such algorithms. For example, data may consist of scattered and noisy individual measurementsthat are determined by available tests and instrumentation. In order to use this data for visualization or for prediction by humanor machine,it maybe necessary to combineseveral measurementsover appropriate time periods. Oncean appropriate set of features has been determined and software produced to transformraw data to feature vectors, all remainingprocessingis automatic. Thefeature vector representation is well suited for building neural networkmodels and for prediction, alarmingand display based upon those models. Note that it maybe helpful to tune the feature set based upon insight gained from models and their predictions. For this paper wedecided to use a diabetes dataset to experimentwith an expert representation for cluster-based visualization. Thenext Section gives a quick overview of our work and some important details. Anexampleis then presented, followed by concluding remarks. Onewayto overcomethis problemis by havinga humanexpert create an expert representationconsisting of a fixed-length feature vector. Eachcomponentof the vector is a relevant scalar, Boolean,or enumeration(e.g. "meal size") value that is useful for the problemat hand. Vector componentsare determined by a knowledgeablehumanexpert, where each componentmight be derived from several different types of rawclinical data and/or data from several time periods. For examplewith the diabetes dataset used here, the componentsincluded the total insulin dose over the past three days, several blood glucose measurements,etc. Vector componentscan be partly redundant, because subsequent clustering and learning algorithms can weight componentsappropriately. This eases the task of the expert, becausehe or she can concentrateon features sufficient for 2 Experiments Oneof the authors (Johnston), an MD, constructeda set of features for characterizingthe status of a patient on a particular day. Wethen used Kohonen’s Topology Preserving Mapalgorithm [Kohonen82a,b,88; see also Gallant 93] to cluster the data. Finally, wewrote codeto display the set of clusters with respect to particular queries. 42 normalizedby subtracting the meanfor the data set and dividing by the standard deviation. Feature Selection Thedata set consisted of reports from 70 patients. From8 days to 166 days of data wereprovidedfor each patient, consisting of blood glucose measurements,insulin type and dosage, meals, exercise, and hypoglycemic symptoms.The completeness of the data varied widely amongpatients. Kohonen’s Topology Preserving Maps Kohonen’salgorithm is especially interesting for visualization, becauseit simultaneouslyclusters the data while it arrangesthe clusters into a planar grid such that neighboringclusters are similar. Theresulting grid of clusters is then wellsuited for various types of visualization [Ferran and Ferrara 92; Hudsonet. al. 89]. Oneof our primaryinterests is the visualization and interpretation of timeoriented clinical data. Fromthe raw data, we computeddaily parameters representing the current day, the previous day, and averages over the previous three days. Thefeature vectors for input to the learning algorithm each represented one patient-day and contained the following fields for eachof the three timeintervals: (ignored) weekday/weekend mg/dl mg/dl mg/dl units units units units large/small/typical/ unknown large/small/typical/ lunch unknown large/small/typical/ dinner unknown more/less/typical/ exercise unknown hypoglycemic symptoms present/unknown changein insulin dose units from previous day patient number day maximumblood glucose minimumblood glucose mean blood glucose total insulin dose regular insulin dose NPHinsulin dose UltraLenteinsulin dose breakfast By this method, we constructed 3721 examplevectors of 39 elements. Each field in the feature vectors wasthen 43 Kohonen’sTopology Preserving Mapsis an iterative neural networkalgorithmthat assigns cluster centroids to a predefined grid of clusters. It worksby selecting a training exampleat random,and then movingthe closest cluster centroid and its neighboringcentroids in the grid a step toward the training example.For this experimentweused 5000iterations to cluster the 3000+cases into a 6 by 6 grid (represented by 36 centroids). QueryVisualization Oncecentroids have been determined, we can "query"the clusters using a single feature (e.g. "total insulin dose")or any subset of features. Wesimply form a queryvector by setting features of interest to 1.0 and others to 0, take dot products with all cluster centroids, andfinally display this information. Moregenerally, wecan select a subset of cases of interest and sumtheir individual case vectors to forma queryvector, or use a cluster centroid as a query vector to identify similar clusters. For visualization purposes, we can display clusters as circles whereeach diametercorrespondsto the dot product of the query vector with that correspondingcluster. Eachcircle contains "spokes" correspondingto individual patients or groupsof patients. (In the example,each spokerepresents patients.) In an interactive environment, ® ® ® ® ® ¯ @ ® ® ® ® ® ® ® Y i ® ® Probe:/users/mfj/UL/kohonen/probes/lomeangl.top.probe ® ® ® ® ¯ ® ® ® 0 ® ® ® ® ® 0 ® ® ® ¯ ® Probe:/users/mfj/UL/kohonen/probes/himeangl.top.probe 44 ¯ it is possible to "browse"clusters by clicking on individual cases or small groups, as represented by the spokes. Software Environment All programmingwas done in Dynamic C++using BTL(Belmont Toolkit Language).This is a simplified version of C++with convenient graphics libraries, as well as an integrated run-time debugger, class browser, and automatic garbagecollector for quick prototyping. AlthoughDynamicC++is an interpreted language, it waseasily able to compute clusters for the 3000+training examples. Code developmentand graphics interfaces weregreatly aided by the builtin tools andlibraries. It is also possible to "label" regions of the grid accordingto the queries that make them"light up." this producesa set of characterizations for clusters accordingto those regions in whichthey appear. Query Creation Several strategies can be used to create probes to query the cluster array. Unit vectors on the feature axes can highlight clusters based on single parameters. One or moreexamplevectors maybe selected and combinedto form more complex queries. Weconstructed queries by both methods. Prediction It is also possibleto use the expertrepresentation feature vectors for prediction. For example, we might be interested in predicting pre-dinner blood glucose levels or possibly hypoglycemic episodes. Becausethe data has been represented by feature vectors, we can use standard neural networkalgorithms (or other machinelearning techniques) for such tasks. Queries based on selected example vectors were formedby sorting the normalizedinput dataset using one parameteras a sort key, then combining the first or last Nvectors as a probe, where N was usually 100 (out of 3721). Whenthese probes were comparedto unit vectors with a single nonzeroparameter, the example-basedqueries usually showedmoreconcentrated cluster groups with larger dot products, but both producedqualitatively similar results. 3 Discussion Thecombinationof expert representations, Kohonen-styledisplays, and the notion of queryingclusters gives an appealingapproachto visualization and browsingclinical data. Thetwo figures are plots created with query vectors based on lowest mean blood glucose measurements(top) and highest meanblood glucose measurements(bottom). This illustrates the general propertiesof the cluster set, using segregation on glucose level as an example. Other related probes were consistent with this topology. Theroles of these proceduresare important. The expert representation makesthe data accessible to machine learning algorithms, the clustering arrangesdata and reduces dimensionality, and the query focuses the output on similar cases of interest. In a clinical setting the clustering correspondsto identifying classes or variants of a disease or syndromeand to assigning patients to a particular cohort basedon their symptoms. Queries based upona single patient may also be informative. Whenthe clustering results in clinically relevant subdivisions within a population, this gives an automaticpatient classification tool. Althoughwedecided to concentrate on cluster-basedvisualization in this paper, 45 Kohonen,T (1982a). "Clustering, taxonomy,and topological mapsof patterns." Proceedingsof the 6th International Conferenceon Pattern Recognition October 1982:114-128. the samedata can be used for predicting clinical outcomesby neural networks, and subsequent alarming/reminding based uponthose predictions. References Kohonen,T (1982b). "Self-organized formationof topologically correct feature maps." Biol Cybern 43: 59-69. Ferran, EAand Ferrara, P (1992). "Clustering proteins into families using artificial neural networks." CABIOS 8(1): 39-44. Kohonen,T (1988). Self-Organization and Associative Memory,2nd Ed. Berlin, Springer-Verlag. Gallant, S. Neural NetworkLearning and Expert Systems, MITPress, 1993. Spilker, B., Crusan, C., Pool, J., Russell, C. and Fram, D. New Software Technologyfor Visualizing Clinical Trial Data. Drug News& Perspectives, Vol. 5, No. 5, June 1992, 298-305. Hudson,B, Livingstone, DJ, et al. (1989). "Pattern recognition display methodsfor the analysis of computed molecularproperties." Journal of Computer-AidedMolecular Design 3: 5565. 46