From: AAAI Technical Report SS-94-01. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. Extracting TemporalKnowledge. Christopher Cimino, M.D.1and Sheila A. Turken, M.D.2 1Office of ComputerBased Education and Departmentof Neurology Albert Einstein College of Medicine 2FamilyMedicalCare, Inc. Wehave attempted to examinethe value of temporal knowledge for two physicians making predictions about blood glucoselevels in diabetic patients. Ourhypothesis is that physicians evaluatingsequential data about a patient derive an abstractdescriptionof the patient that is not temporally related (atemporal knowledge). For example,a patient mightbe identified as a "brittle diabetic". AtemporalknowledgeshouMallow a physician to makeisolated predictions about a patient that are better than randomchance.A combination of temporal and atemporal knowledge shouMallow a physician to makebetter predictions than atemporal knowledgealone. We used the AIM-94diabetes datasets supplied by Dr. Michael Kahn.These datasets consist predominatelyof blood glucose measurementsand insulin dosages.Wefailed to showany different in predictive ability betweentemporaland atemporal data, betweenthe two physicians, and betweenthe individual physiciansand three different controls. Wedid showthat by groupingall 20 physician cases (temporal andatemporal)their predictions had an averageerror of 69 mg/dl of blood glucose. These predictions werebetter than the averageof one of the controls by 30 mg/dl (standarddeviation of 6.5). Whilethis is statistically significant, wedo not feel it is clinicallysignificant.It is our conclusion that physicians maynot makeaccurate predictions based solely on insulin dosagesand glucose measurements. Introduction Close control of blood glucose levels is important in avoidingcomplicationsof diabetes mellitus. Diabetics attempt to achievethis control by making multiple measurements of their blood glucose level, and use this informationto adjust their insulin dosage. Insulin pumpshave been developed in an attempt to automatethis process. Algorithmshave been developedto analyze glucose data and predict insulin needs (Fisher) in an attempt to further automatethe process. It is not clear to whatextent detailed temporaldata is useful as comparedto abstract knowledgeabout a patient. For example,a patient mightbe identified as a "brittle diabetic". This knowledgealong with a single day’s data can used to makepredictions that are better than randomchance. It seemslikely that addingtemporalinformation, in the form of several days worth of data, will improvethe prediction even more. Wewouldlike to try to quantify the relative worth of temporal and atemporal knowledge. Dataset The70 case AIM-94diabetes dataset supplied by Dr. MichaelKahnwasused. In the diabetes data, there are events at discrete times that can be viewed as treatment events (insulin injections, meals, and activity levels) and diagnostic events (symptoms and glucose measurements).The vast majority of the events in the cases are insulin dosagesand glucose measurements.Events describing exercise, hypoglycemicsymptoms,and meal intakes are rare. Ourprimaryassumptionis that clinicians use their diagnostic assessmentof the patient to make treatmentdecisions and that their diagnostic assessmentis based on readily observable data. In a normaltreatment situation, the clinician would alter treatment based on his or her impressionof the patient’s condition.Thedatasets do not permit any alterations in treatment so wewill ask the expert clinician to give a diagnostic assessmentin the formof a prediction about the patient’s glucose measurement.Wetherefore need to makea secondaryassumptionthat diagnostic knowledgeis closely related to therapeutic knowledgein the diabetes domain; in other words, knowledgeused to chooseappropriate insulin dosageis closely related to knowledgeused to predict glucose levels. Five cases were selected at randomfrom the dataset of 70 cases. Codeddata was converted into English phrases such as "Pre-supper Glucose Measurement"or "NPHInsulin Dosage". Subjects The subjects were two physicians. Onephysician is board certified in Neurologyand the other is boardcel~tified in Endocrinology. Methods Eachphysician waspresented with data from the beginningof the case up until the first glucose measurement.The physician was told the time of this glucose measurementand asked to predict the value. This wasrepeated for the first twenty glucose measurements.The physician was allowed to reviewall previouslypresenteddata. Thefirst ten measurementswere considered a learning period. The second ten measurementswere considered predictions madewith temporal knowledge.Thedata typically coveredthe fh-st 10 days of the case. Thephysician waspermitted to makenotes about the patient during this period. Next the physician was presented with samples of data taken from randomplaces within the case. Each randomsample included exactly one glucose measurementand would end with the next glucose measurement.The physician would be asked to predict the second measurement.Noprevious data was available and no feedback was given concerningthe accuracy of the predictions. This wasrepeated until 10 predictions werecollected. These predictions were considered to have been madewithout temporal knowledge.This procedure wasrepeated with each of the five cases for both physicians. Threesets of control predictions were developed. Onecontrol used no knowledgeand predicted a value of 100 md/dl for all the measurements.The second control predicted a value of 100 mg/dlfor the first measurement and predicted all subsequent measurementswouldbe the sameas the previous measuredvalue. Thethird control predicted the value would be a randomnumberbetween 80 and 300. Eachset of ten predictions were scored by averaging the absolute difference betweenthe actual glucose measurementand the predicted glucose measurement.A lower score meansa better averageprediction. Results Wecollected a total of 60 samples: 30 physician samplesand 30 control samples. The average score was89.3. The best score by a physician was 34.4. The best control score was 59.4. Sevenphysician scores werebetter than all control scores. The worst score by a physician was 117.8. The worst control score was146,6. Four control scores were worsethan all physician scores. Learning, temporal, and atemporal scores were comparedfor physicians individually and combinedon a case-matched basis. There was a trend toward improvedscores betweenthe learning phase and the temporal phase. This trend wasnot statistically significant. Assuming the trend continued, we wouldneed to collect another 20 cases for each physician to achievesignificance. There was a smaller trend for improved performance on the atemporal scores as compared to the temporalscores. Assumingthis trend continued, wewouldneedto collect at least 50 morecases for each physician to achieve significance. Therewasa small trend for the endocrinologist’s temporalscores to be better than the neurologist’s and for the neurologist’s atemporalscores to be better than the endocrinologist’s. Weestimate at least 50 cases for these differences to reach significance. Of the controls, the averagescore for the "no knowledge"control was the best (98.6). The average score for the randomcontrol was second best (105.3). Theaverage score for the "previous value" control was the worst (107.2). None these differences weresignificant. Thelargest trends were betweenthe physicians and the "previous value" control. Whencombined, the physicians temporal scores averaged 28.8 mg/dl better (s.d. 27.9). Thephysicians atemporalscores averaged 32.4 mg/dl better (s.d. 25.8). estimate 5 to 10 additional cases wouldbe required for these differences to reach significance. When grouping the physician temporal and atemporal scores they weresignificantly better than the "previous value" scores by an average of 30.6 mg/dl(s.d. 6.5). Discussion Thereare three possible explanationsfor the lack of difference betweenthe samplesin this study. The most obvious explanation is that morecases should havebeenusedto reach statistical significance. Whileit is probablytrue that wecould showa statistically difference betweenphysiciansand the controls, it is not likely that this differencewould be clinically significant. At best, its seemsthe physicians wouldhave madeerrors on the order of 50-80 mg/dl of blood glucose while the controls wouldmakeelTors of 80-110mg/dl. If a physician can not easily and clearly outperforma simple algorithmon five cases, then there is something wrongthat goes beyondjust the numberof samplescollected. A second explanation is that while physicians may be very goodat choosing appropriate therapy, they are very poor at predicting blood glucoselevels. There maybe no correlation betweenthese two abilities. Their poor performancegoes undocumented because they are never called on to predict glucose levels. This explanation seems unlikely to us. Thethird explanationis that the available data is insufficient for physiciansto accurately predict blood glucose levels. Thevast majority of the data consisted of insulin dosages and glucose measurements.There were only rare pieces of information concerning exercise and meals. Diabetesmellitus, in its mostreductionist definition, is a disorder characterizedby a deficiency of endogenousinsulin action. As such, it can be treated by insulin administration. Thetwo parametersof insulin dose and resultant glucose level are obviously two key descriptive characteristics of a patient at any giventime. Onthe other hand, diabetes managementcan be complex and ultimately unpredictable. Thereare a multitude of other informationalfactors that are essential in determininga blood sugar level in an individual. Somefactors in a patient are knowable,such as the age, weight, height, additional medicalhistory, concurrent medications,activity level, and diet-other factors are not so discrete and quantifiable, such as the complexinteraction of insulin action with that of the counterregulatory hormonecascade on any given occasion. 8 It is interesting to note that the atemporal predictions wereslightly better than the temporal predictions. This seemsto suggest that temporal knowledgeplays little role in these predictions. We cannot makethis conclusionin the face of the poor predictions overall. It is our conclusionthat physicians can not makeaccurate predictions based solely on insulin dosagesand glucose measurements. References Fisher, ME.A semi-closed loop algorithm for control of blood glucoselevels in diabetics. IEEE Transactions on biomedical engineering. 38(1):5761, 1991. Correspondence Please address correspondenceto: Christopher Cimino, M.D. Albert Einstein College of Medicine 1300 Morris Park Avenue Bronx, NY10461 e-mail: cimino@aecom.yu.edu