Slides - University of Sheffield NLP Group

advertisement
BabyTalk: Generating English
Summaries of Clinical Data
Ehud Reiter
Univ of Aberdeen, CS Dept
Dr. Ehud Reiter, Computing Science, University of Aberdeen
1
Structure
Background: data-to-text
 Babytalk project
 Results of first evaluation
 Current work

Dr. Ehud Reiter, Computing Science, University of Aberdeen
2
What is data-to-text

Goal: generate English summaries of
non-linguistic data
» Numerical weather predictions
» Medical records
» Statistics
» Etc
Dr. Ehud Reiter, Computing Science, University of Aberdeen
3
Simple Example:
Weather Forecasts

Input: numerical weather predictions
» From supercomputer running a numerical weather
simulation


Output: textual weather forecast
We’ve developed several systems
» Two used commercially (oil rig, road gritting)
– Users prefer some gen texts to human texts!
» Demo of pollen system on our webpage

So have others (FoG, MultiMeteo, …)
Dr. Ehud Reiter, Computing Science, University of Aberdeen
4
Pollen forecasts

Grass pollen levels for Tuesday have decreased from the high
levels of yesterday with values of around 4 to 5 across most
parts of the country. However, in South Eastern areas, pollen
levels will be high with values of 6.
Dr. Ehud Reiter, Computing Science, University of Aberdeen
5
Other data-text apps
Medical: to-be-discussed
 Assistive technology: help blind people
access statistical data
 Financial: summarise stock-market data
 Education: Summarise assessment
results, help write stories
 Engineering: Sum. gas-turbine data
 Etc

Dr. Ehud Reiter, Computing Science, University of Aberdeen
6
Why is data-to-text useful

The world is drowning in data
» NLP researchers talk about problems of
too much text, but data problems are
worse
– Texts are at least read by someone (writer)
– Most data is automatically collected and never
looked at by a human
Dr. Ehud Reiter, Computing Science, University of Aberdeen
7
Data overload

Sensor recording 2 bytes/second
» 170KB/day
» 63MB/year
» Millions of sensors in hospitals, jet engines, …

Simulations
» Weather: 30MB for one day in one UK county,
from one model
» Climate models: petabytes of data

Too much data, need better tools for utilising!
Dr. Ehud Reiter, Computing Science, University of Aberdeen
8
Decision Support

Data often used for decision support
» Medical: help doctors make decisions
» Weather: helps staff on offshore oil rigs plan their
operations
» Engineering: help plan maintenance
» Etc

Often under time pressure
» Make a decision in 3 min, here is 30MB of data to
help you
Dr. Ehud Reiter, Computing Science, University of Aberdeen
9
Using data for decision support

Alarming
» Trigger alarm if value exceeds threshold
– Or other such simple rule
» Works, doesn’t get full value from data

Visualisation
» Show data to experts visually
– People like this, unclear how much it helps,
especially when massive amount of data
Dr. Ehud Reiter, Computing Science, University of Aberdeen
10
Using data for decision support

Knowledge-based systems
» Feed data into an expert system which
makes recommendations based on it
» Can work in some contexts, but problems
– Domain experts dislike being told what to do
– Often key data not available to KBS
– Can be brittle, fragile
Dr. Ehud Reiter, Computing Science, University of Aberdeen
11
Data-text for decision support
Idea: use KBS, NLP tech to generate a
short text summary of a data set
 Intermediate between KBS and
visualisation

» Use domain reasoning to highlight key info,
infer causal links, add background know
» But stick to describing data, don’t tell
experts what to do!
Dr. Ehud Reiter, Computing Science, University of Aberdeen
12
Data-text for decision support
vs alarms: deeper info
 vs visualisation

» Just key facts, not everything
» Supplemented with causal links, etc

vs KBS
» More acceptable to users
» More robust, since not useless if missing
some key data or knowledge
Dr. Ehud Reiter, Computing Science, University of Aberdeen
13
Data-text for decision support
Above is still somewhat speculative
 But people in many domains are
interested in exploring the concept to
see if it works

» Esp since current situation is so bad!

Of course other uses of data-to-text
» Assistive technology, education
Dr. Ehud Reiter, Computing Science, University of Aberdeen
14
Language and World
How does language relate to the world?
 Data-to-text is a great way of exploring
this

» The real reason I got into this…
Dr. Ehud Reiter, Computing Science, University of Aberdeen
15
BabyTalk



Goal: Summarise clinical data about
premature babies in neonatal ICU
Input: sensor data; records of
actions/observations by medical staff
Output: multi-para texts, summarise
»
»
»
»
»
BT45: 45 mins data, for doctors (completed)
BT-Nurse: 12 hrs data, for nurses
BT-Family: 24 hrs data, for parents
BT-Clan: 24 hrs data, for other friends, family
Bt-Doc: several hrs data, for doctors
Dr. Ehud Reiter, Computing Science, University of Aberdeen
16
Neonatal ICU
Dr. Ehud Reiter, Computing Science, University of Aberdeen
17
Baby Monitoring
SpO2 (SO,HS)
ECG (HR)
Peripheral Temperature (TP)
Arterial Line
(Blood Pressure)
Transcutaneous Probe
(CO,OX)
Core Temperature (TC)
Dr. Ehud Reiter, Computing Science, University of Aberdeen
18
Input: Sensor Data
Dr. Ehud Reiter, Computing Science, University of Aberdeen
19
Input: Action Records
FullDescriptor
SETTING;VENTILATOR;FiO2
(36%)
MEDICATION;Morphine
Time
10.30
10.44
ACTION;CARE;TURN/CHANGE
POSITION;SUPINE
10.46-10.47
ACTION;RESPIRATION;HANDBAG BABY
10.47-10.51
10.47
SETTING;VENTILATOR;FiO2
(60%)
ACTION;RESPIRATION;INTUBATE 10.51-10.52
Dr. Ehud Reiter, Computing Science, University of Aberdeen
20
BT45 texts
Human corpus text
 At 1046 the baby is turned for re-intubation and re-intubation is
complete by 1100 the baby being bagged with 60% oxygen
between tubes. During the re-intubation there have been some
significant bradycardias down to 60/min, but the sats have
remained OK. The mean BP has varied between 23 and 56, but
has now settled at 30. The central temperature has fallen to
36.1°C and the peripheral temperature to 33.7°C. The baby has
needed up to 80% oxygen to keep the sats up.
Computer-generated text
 By 11:00 the baby had been hand-bagged a number of times
causing 2 successive bradycardias. She was successfully reintubated after 2 attempts. The baby was sucked out twice.
At 11:02 FIO2 was raised to 79%.
Dr. Ehud Reiter, Computing Science, University of Aberdeen
21
Babytalk architecture
Signal analysis: patterns, trends
 Data interpretation: based on medical
knowledge (like expert sys)
 Doc planning: select and structure
events to be mentioned
 Microplanning: choose words, syntactic
structures, referring exp
 Realisation: generate actual text

Dr. Ehud Reiter, Computing Science, University of Aberdeen
22
Signal Analysis

Detect trends, patterns, events, etc
» Blood oxygen levels increasing
» Downward spike in heart rate

Detect artefacts
» Changes due to sensor problems
Plenty of algorithms exist for this
 Will not further discuss here

Dr. Ehud Reiter, Computing Science, University of Aberdeen
23
Data Abstraction

Detect higher-level events in the data
» Sequence of bradycardias (downward
spikes in HR)

Determine medical importance
» Bradycardia more important if
simultaneous desaturation (downward
spike in SO)

Medical KBS
Dr. Ehud Reiter, Computing Science, University of Aberdeen
24
Data Abs: Links Between Events

Infer links between events
» Blood O2 falls, therefore O2 level in
incubator is increased
» HR up because baby is being handled
» Morphine given as part of the intubation
procedure

Very imp, much of value added of text
» Helps readers build good mental model of
what is happening to the baby
Dr. Ehud Reiter, Computing Science, University of Aberdeen
25
Document Planning
First NLP stage
 Decide what events to mention
 Decide how these are ordered and
organised

Dr. Ehud Reiter, Computing Science, University of Aberdeen
26
Content Determination

First approach: Include most medically
important events
» Also include moderately important events
which are linked to very important events

Doesn’t always work
Dr. Ehud Reiter, Computing Science, University of Aberdeen
27
Problem: Continuity

Omitting intermediate events confuses
readers
» Example: TcPO2 suddenly decreased to
8.1. SaO2 increased to 92. TcPO2
suddenly decreased to 9.3
» There is a gradual rise in TcPO2 between
the sudden falls
– This is less important medically
– But important for reader’s comprehension
Dr. Ehud Reiter, Computing Science, University of Aberdeen
28
Document Structure

How do we order/group events
» By time
» By medical importance
» By body subsystem (eg, respiration)

Initially focused on time, but users want more
emphasis on subsystem
» Eg, first a “scene” about respiration, then a
“scene” about thermoregulation
– Not constant shifting between two
Dr. Ehud Reiter, Computing Science, University of Aberdeen
29
Doc Planning: Narrative

High-level analysis: need to do a better
job of generating a “story” from the data
» Link events together
» Include events needed for story
progression even if not important
» “Scene” structure

Qualitative observation by users
Dr. Ehud Reiter, Computing Science, University of Aberdeen
30
Microplannig
Second NLP stage
 Choose words and syntactic structure to
express information
 Aggregation
 Reference

Dr. Ehud Reiter, Computing Science, University of Aberdeen
31
Challenge: Time

Need to communicate temporal info
» Enough so that readers can interpret the
data
» Not too much, text becomes unreadable
– Imagine story with “At 10.14 John left home. At
10.28 he met Mary in the pub. At 10.39…”
Dr. Ehud Reiter, Computing Science, University of Aberdeen
32
Tenses

Use Reichenbach model
» Speech time: time of report being read
» Event time: time of event being described
» Reference time: determined using a
salience model
– Similar to resolving anaphoric reference

Usually worked, sometimes failed
» Need better model for reference time
Dr. Ehud Reiter, Computing Science, University of Aberdeen
33
What does event time mean?

Sometimes explicit time given for event
» Supposed to be start time of event, sometimes
misinterpreted

Ex:”After three attempts, at 13.53 a peripheral
venous line was inserted successfully.”
» 13.53 refers to time of first (failed) attempt
– Start of LINE-INSERT-ATTEMPTS event
» Readers interpret as time of final (succ) attempt

Need better linguistic model of time
» Linguistic temporal ontology (Moens Steedman)?
Dr. Ehud Reiter, Computing Science, University of Aberdeen
34
Lexical Choice
Need mechanism to map domain
events (instances in a Protégé ontology)
to linguistic structures
 Use JESS rules

» Lexical info from Verbnet, NIH lexicon

Engineering challenge
» Relate to Sheffield work on NLG/ontologies
Dr. Ehud Reiter, Computing Science, University of Aberdeen
35
Vague language

Human texts are full of vague language
» Ex: There is a momentary bradycardia
» What does “momentary” mean?

Our models of this are very crude, need
to be improved!
Dr. Ehud Reiter, Computing Science, University of Aberdeen
36
Realisation
Last NLG stage
 Generate actual text, once choices
made
 Use Aberdeen simplenlg package
 Will not further discuss here

Dr. Ehud Reiter, Computing Science, University of Aberdeen
37
BT45 Evaluation

Showed 35 medical professionals 24
scenarios in 3 conditions (8 of each)
» Visualisation of medical data
» Textual summary (manually written)
» Textual summary (from BT45)

Asked to make a treatment decision
» Limited to 3 minutes
» Measured correctness (against gold stan)

Off-ward, using historical data
» So no other knowledge about baby
Dr. Ehud Reiter, Computing Science, University of Aberdeen
38
Free-text comments
Comments were not solicited, but were
recorded if made
 Most important were

» Better layout (eg, bullet lists)
» Continuity (as mentioned before)
Dr. Ehud Reiter, Computing Science, University of Aberdeen
39
Decision-Support results
No sig difference in time taken
 Avg decision-quality (scale -1 to 1)

» Human texts: 0.39
» Computer texts: 0.34
» Visualisation: 0.33
Human sig better than comp, visual
 No sig diff comp, visual

Dr. Ehud Reiter, Computing Science, University of Aberdeen
40
Results by subject type

Analysis by type of subjects
» Human texts especially good for junior
nurses (ie, least experienced subjects)
Dr. Ehud Reiter, Computing Science, University of Aberdeen
41
Results by scenario

Each scenario had a main target action
» 8 different ones

Computer texts as good as human texts
for five of these; worse for three
» No action, manage temperature, monitor
equipment
» These relate to specific problems in the
system, which can be fixed
Dr. Ehud Reiter, Computing Science, University of Aberdeen
42
Target Actions with Poor Perf
No action: Needs high-level summary,
not blow-by-blow event description
 Manage Temperature: Two temp
channels, need to describe together
 Monitor equipment: Need to mention
(not ignore) sensor artefacts

Dr. Ehud Reiter, Computing Science, University of Aberdeen
43
Summary

Good performance with human texts
shows textual presentation is effective
» Also seen in previous study

Babytalk as good as visualisation, could
make better by addressing above issues
» Even now giving users BabyTalk text as
supplement to visualisations could help
Dr. Ehud Reiter, Computing Science, University of Aberdeen
44
Current Work

BT-Nurse: shift summaries for nurses
» Use live data from current babies
» Evaluate on ward, using babies that
subjects (nurses) actually looking after
» Focus on info relevant to nurse shift
planning, not real-time decision support
» Longer time period (12 hrs)
– Need more sensor abstraction
» Longer texts (multi-page)
Dr. Ehud Reiter, Computing Science, University of Aberdeen
45
Current Work

BT-Family: information for parents
» Estimate how stressed parents are, use
this to control content, phrasing
– High stress means less content
– Relate to Sheffield work on personality??
» Express information in language which
parents can understand, not medicalese
Dr. Ehud Reiter, Computing Science, University of Aberdeen
46
Current Work

BT-Clan: Information for friends, family
» Social networking perspective: encourage
useful support, minimise hassle of dealing
with numerous inquiries
– Parents decide what to tell people
– Intentional deceit: if granny is frail, don’t tell her
bad news
» Info about parents as well as baby
Dr. Ehud Reiter, Computing Science, University of Aberdeen
47
Research agenda
Detecting complex events in the data
 Integration with medical guidelines
 Better use of vague language
 Better stories
 Role of text in interactive multimodal
information presentation system
 Try in domain of assisted living

Dr. Ehud Reiter, Computing Science, University of Aberdeen
48
Download