Extracting Temporal Knowledge.

From: AAAI Technical Report SS-94-01. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
Extracting TemporalKnowledge.
Christopher Cimino, M.D.1and Sheila A. Turken, M.D.2
1Office of ComputerBased Education and Departmentof Neurology
Albert Einstein College of Medicine
2FamilyMedicalCare, Inc.
Wehave attempted to examinethe value of
temporal knowledge for two physicians making
predictions about blood glucoselevels in diabetic
patients. Ourhypothesis is that physicians
evaluatingsequential data about a patient derive an
abstractdescriptionof the patient that is not
temporally related (atemporal knowledge). For
example,a patient mightbe identified as a "brittle
diabetic". AtemporalknowledgeshouMallow a
physician to makeisolated predictions about a
patient that are better than randomchance.A
combination of temporal and atemporal knowledge
shouMallow a physician to makebetter predictions
than atemporal knowledgealone. We used the
AIM-94diabetes datasets supplied by Dr. Michael
Kahn.These datasets consist predominatelyof
blood glucose measurementsand insulin
dosages.Wefailed to showany different in
predictive ability betweentemporaland atemporal
data, betweenthe two physicians, and betweenthe
individual physiciansand three different controls.
Wedid showthat by groupingall 20 physician
cases (temporal andatemporal)their predictions
had an averageerror of 69 mg/dl of blood glucose.
These predictions werebetter than the averageof
one of the controls by 30 mg/dl (standarddeviation
of 6.5). Whilethis is statistically significant, wedo
not feel it is clinicallysignificant.It is our
conclusion that physicians maynot makeaccurate
predictions based solely on insulin dosagesand
glucose measurements.
Introduction
Close control of blood glucose levels is important
in avoidingcomplicationsof diabetes mellitus.
Diabetics attempt to achievethis control by making
multiple measurements
of their blood glucose level,
and use this informationto adjust their insulin
dosage. Insulin pumpshave been developed in an
attempt to automatethis process. Algorithmshave
been developedto analyze glucose data and predict
insulin needs (Fisher) in an attempt to further
automatethe process.
It is not clear to whatextent detailed temporaldata
is useful as comparedto abstract knowledgeabout
a patient. For example,a patient mightbe identified
as a "brittle diabetic". This knowledgealong with a
single day’s data can used to makepredictions that
are better than randomchance. It seemslikely that
addingtemporalinformation, in the form of several
days worth of data, will improvethe prediction
even more. Wewouldlike to try to quantify the
relative worth of temporal and atemporal
knowledge.
Dataset
The70 case AIM-94diabetes dataset supplied by
Dr. MichaelKahnwasused. In the diabetes data,
there are events at discrete times that can be viewed
as treatment events (insulin injections, meals, and
activity levels) and diagnostic events (symptoms
and glucose measurements).The vast majority of
the events in the cases are insulin dosagesand
glucose measurements.Events describing exercise,
hypoglycemicsymptoms,and meal intakes are
rare. Ourprimaryassumptionis that clinicians use
their diagnostic assessmentof the patient to make
treatmentdecisions and that their diagnostic
assessmentis based on readily observable data.
In a normaltreatment situation, the clinician would
alter treatment based on his or her impressionof
the patient’s condition.Thedatasets do not permit
any alterations in treatment so wewill ask the
expert clinician to give a diagnostic assessmentin
the formof a prediction about the patient’s glucose
measurement.Wetherefore need to makea
secondaryassumptionthat diagnostic knowledgeis
closely related to therapeutic knowledgein the
diabetes domain; in other words, knowledgeused
to chooseappropriate insulin dosageis closely
related to knowledgeused to predict glucose levels.
Five cases were selected at randomfrom the dataset
of 70 cases. Codeddata was converted into
English phrases such as "Pre-supper Glucose
Measurement"or "NPHInsulin Dosage".
Subjects
The subjects were two physicians. Onephysician
is board certified in Neurologyand the other is
boardcel~tified in Endocrinology.
Methods
Eachphysician waspresented with data from the
beginningof the case up until the first glucose
measurement.The physician was told the time of
this glucose measurementand asked to predict the
value. This wasrepeated for the first twenty
glucose measurements.The physician was allowed
to reviewall previouslypresenteddata. Thefirst
ten measurementswere considered a learning
period. The second ten measurementswere
considered predictions madewith temporal
knowledge.Thedata typically coveredthe fh-st 10
days of the case. Thephysician waspermitted to
makenotes about the patient during this period.
Next the physician was presented with samples of
data taken from randomplaces within the case.
Each randomsample included exactly one glucose
measurementand would end with the next glucose
measurement.The physician would be asked to
predict the second measurement.Noprevious data
was available and no feedback was given
concerningthe accuracy of the predictions. This
wasrepeated until 10 predictions werecollected.
These predictions were considered to have been
madewithout temporal knowledge.This procedure
wasrepeated with each of the five cases for both
physicians.
Threesets of control predictions were developed.
Onecontrol used no knowledgeand predicted a
value of 100 md/dl for all the measurements.The
second control predicted a value of 100 mg/dlfor
the first measurement
and predicted all subsequent
measurementswouldbe the sameas the previous
measuredvalue. Thethird control predicted the
value would be a randomnumberbetween 80 and
300.
Eachset of ten predictions were scored by
averaging the absolute difference betweenthe
actual glucose measurementand the predicted
glucose measurement.A lower score meansa
better averageprediction.
Results
Wecollected a total of 60 samples: 30 physician
samplesand 30 control samples. The average score
was89.3. The best score by a physician was 34.4.
The best control score was 59.4. Sevenphysician
scores werebetter than all control scores. The
worst score by a physician was 117.8. The worst
control score was146,6. Four control scores were
worsethan all physician scores.
Learning, temporal, and atemporal scores were
comparedfor physicians individually and
combinedon a case-matched basis. There was a
trend toward improvedscores betweenthe learning
phase and the temporal phase. This trend wasnot
statistically significant. Assuming
the trend
continued, we wouldneed to collect another 20
cases for each physician to achievesignificance.
There was a smaller trend for improved
performance on the atemporal scores as compared
to the temporalscores. Assumingthis trend
continued, wewouldneedto collect at least 50
morecases for each physician to achieve
significance. Therewasa small trend for the
endocrinologist’s temporalscores to be better than
the neurologist’s and for the neurologist’s
atemporalscores to be better than the
endocrinologist’s. Weestimate at least 50 cases for
these differences to reach significance.
Of the controls, the averagescore for the "no
knowledge"control was the best (98.6). The
average score for the randomcontrol was second
best (105.3). Theaverage score for the "previous
value" control was the worst (107.2). None
these differences weresignificant. Thelargest
trends were betweenthe physicians and the
"previous value" control. Whencombined, the
physicians temporal scores averaged 28.8 mg/dl
better (s.d. 27.9). Thephysicians atemporalscores
averaged 32.4 mg/dl better (s.d. 25.8).
estimate 5 to 10 additional cases wouldbe required
for these differences to reach significance. When
grouping the physician temporal and atemporal
scores they weresignificantly better than the
"previous value" scores by an average of 30.6
mg/dl(s.d. 6.5).
Discussion
Thereare three possible explanationsfor the lack of
difference betweenthe samplesin this study. The
most obvious explanation is that morecases should
havebeenusedto reach statistical significance.
Whileit is probablytrue that wecould showa
statistically difference betweenphysiciansand the
controls, it is not likely that this differencewould
be clinically significant. At best, its seemsthe
physicians wouldhave madeerrors on the order of
50-80 mg/dl of blood glucose while the controls
wouldmakeelTors of 80-110mg/dl. If a physician
can not easily and clearly outperforma simple
algorithmon five cases, then there is something
wrongthat goes beyondjust the numberof
samplescollected.
A second explanation is that while physicians may
be very goodat choosing appropriate therapy, they
are very poor at predicting blood glucoselevels.
There maybe no correlation betweenthese two
abilities. Their poor performancegoes
undocumented
because they are never called on to
predict glucose levels. This explanation seems
unlikely to us.
Thethird explanationis that the available data is
insufficient for physiciansto accurately predict
blood glucose levels. Thevast majority of the data
consisted of insulin dosages and glucose
measurements.There were only rare pieces of
information concerning exercise and meals.
Diabetesmellitus, in its mostreductionist
definition, is a disorder characterizedby a
deficiency of endogenousinsulin action. As such,
it can be treated by insulin administration. Thetwo
parametersof insulin dose and resultant glucose
level are obviously two key descriptive
characteristics of a patient at any giventime. Onthe
other hand, diabetes managementcan be complex
and ultimately unpredictable. Thereare a multitude
of other informationalfactors that are essential in
determininga blood sugar level in an individual.
Somefactors in a patient are knowable,such as the
age, weight, height, additional medicalhistory,
concurrent medications,activity level, and diet-other factors are not so discrete and quantifiable,
such as the complexinteraction of insulin action
with that of the counterregulatory hormonecascade
on any given occasion.
8
It is interesting to note that the atemporal
predictions wereslightly better than the temporal
predictions. This seemsto suggest that temporal
knowledgeplays little role in these predictions. We
cannot makethis conclusionin the face of the poor
predictions overall. It is our conclusionthat
physicians can not makeaccurate predictions based
solely on insulin dosagesand glucose
measurements.
References
Fisher, ME.A semi-closed loop algorithm for
control of blood glucoselevels in diabetics. IEEE
Transactions on biomedical engineering. 38(1):5761, 1991.
Correspondence
Please address correspondenceto:
Christopher Cimino, M.D.
Albert Einstein College of Medicine
1300 Morris Park Avenue
Bronx, NY10461
e-mail: cimino@aecom.yu.edu