HELP! lti Intelligent Help (or lack thereof) in Spoken Dialog Systems

advertisement
HELP!
Intelligent Help (or lack thereof)
in Spoken Dialog Systems
Dialogs on Dialogs discussion
Stefanie Tomko
20-Feb-04
lti
Papers
• Adding intelligent help to mixed-initiative
spoken dialogue systems. G. Gorrell, I.
Lewin & M. Rayner. In Proc. of ICSLP, 2002.
• Targeted help for spoken dialogue
systems: intelligent feedback improves
naive users' performance. B.A. Hockey, O.
Lemon, E. Campana, L. Hiatt, G. Aist, J.
Hieronymous, A. Gruenstein, J. Downding. In
Proc. of EACL, 2003.
• ?????  there isn't a lot out there about this!
20-Feb-04
Dialogs on Dialogs discussion
2
lti
We need Help!
• 56% of NL system users in experiment
asked for help without explicit knowledge
that they could do so
• Speech Graffiti users knew about various
help/orientation keywords
– 91% used options
– 70% used where was I?
– 48% used help
20-Feb-04
Dialogs on Dialogs discussion
3
lti
What is Help?
I didn't understand
what you said….
How do I do
<something>?
How do I say
that?
What can I
do?
20-Feb-04
Dialogs on Dialogs discussion
Where was I?
4
lti
User-initiated Help examples
• NL Movieline:
– Wordy, general
• This system allows you to obtain movie and theater information for
Pittsburgh. You can ask for the location, phone number, and movie
listing for a certain theater. Or, you can ask about a particular movie
to get the rating and or genre find out where it is playing. Specify
both a movie and theater to obtain showtimes. If you get stuck, you
can say Reset, to start over.
• Jupiter
– Example based
• You can ask about general weather forecasts as well as information
on temperature, windspeed,…
• Try saying one of the following: 'what's the weather for Denver?'
'what cities do you know about?' 'what do you know about besides
weather?' 'what can I say?'
• Try saying one of the following 'are there any advisories for the
United States?' 'what is the extended forecast for Boston?' 'will it
rain in Toronto?'…
20-Feb-04
Dialogs on Dialogs discussion
5
lti
User-initiated Help examples
• Speech Graffiti
– Somewhat "state" based
• slot + options: you can say, rating is... G, PG, PG-13, R,
NC-17, not rated, or you can ask, what is the rating?
• options: you can specify or ask about title, show time, day…
• help: gives list of keywords on 1st round, then gives explanation
of keyword functions
• TellMe
– Orientation, or, at main level, lots of general system info
• You're in Sports, in the NHL section.
20-Feb-04
Dialogs on Dialogs discussion
6
lti
These are all kind of "dumb"
•
•
•
•
They might not take system state into account
They aren't smart about what users really want to do
They might not tell users exactly how to speak
They might not orient users to where they are in the
system
• But at least they give users some information…
20-Feb-04
Dialogs on Dialogs discussion
7
lti
System-initiated "Help" examples
• NL Movieline:
– Excuse me?
– Didn't catch that.
• Jupiter
– Pardon me?
• Speech Graffiti
– I'm sorry, I'm having trouble understanding you
• TellMe
– I'm sorry, I didn't get that. Please say a category in
Travel.
• These are really dumb!
20-Feb-04
Dialogs on Dialogs discussion
8
lti
Intelligent/Targeted Help
• Makes system-initiated help a little smarter
• Goal: provide immediate feedback, tailored to
what the user said, for cases in which the
system was not able to understand an
utterance
• Kind of different perspective compared to
traditional error handling
What should I do
to deal with this
error?
20-Feb-04
How can I help the
user not make this
error in the future?
Dialogs on Dialogs discussion
9
lti
Gorrell et al ICSLP paper
• Grammar-based vs. statistical LMs
– Grammars easy to create (?)
– GB performs better if users know what to say
– SLMs better for unusual & less constrained utts
• 1st attempt – recognition only (i.e. no help)
– Run all utts through GBLM &SLM, choose
based on confidence scores
– Not reliable enough
20-Feb-04
Dialogs on Dialogs discussion
10
lti
On/Off House
• User initiative
• Natural language
– Turn off the light in the bathroom
– Are the hall and kitchen lights switched on?
– Could you tell me which lights are on?
20-Feb-04
Dialogs on Dialogs discussion
11
lti
Targeted Help
Grammarbased LM
parsable?
yes
no
Send to SLM
Classify result
Play regular output
20-Feb-04
Play
appropriate
help message
Dialogs on Dialogs discussion
12
lti
Classification
•
•
•
•
Hand-classified training set
12 classes
24 features
Most common classes
– REFEXP_COMMAND (35%)
• I didn't quite catch that. To turn a device on or off, you could
try something like 'turn on the kitchen light.'
– LONG_COMMAND (13%)
• I didn't quite catch that. Long commands can be difficult to
understand. Perhaps try giving separate commands for each
device.
– PRON_COMMAND (11%)
• I didn't quite catch that. To change the status of a device or
group of devices you've just referred to, you could try for
example 'turn it on' or 'turn them off.'
20-Feb-04
Dialogs on Dialogs discussion
13
lti
Evaluation
• Baseline classification error: 65%
• Cross-validated final decision tree error: 12%
• Between-subjects user study task
– call a voice-enabled house & leave it in a secure
state
• No training
• Targeted help (N=16) vs. control help (N=15)
20-Feb-04
Dialogs on Dialogs discussion
14
lti
Results
Targeted help
Control help
WER (GB only?)
39%
55%
Grammaticality
47%
36%
WER(?): 1st 5 utts
45%
76%
20-Feb-04
Dialogs on Dialogs discussion
15
lti
Results (2)
• Targeted help group had more variety in
constructions
• Targeted help users requested help more often
– Six TH users vs. only one (!) control user
• Longer dialogs in TH groups
– Some of this is system exploration
• No significant differences in awareness of final
house state or perception of systems' abilities
• No comparison of task completion
20-Feb-04
Dialogs on Dialogs discussion
16
lti
Hockey et al EACL paper
• Domain: WITAS command & control for
robotic helicopter
• Targeted Help is an independent module
Grammarbased LM
parsable?
no
SLM
parsable
?
Send to
SLM
yes
no
yes
Create & play
appropriate
help message
Play regular
output
20-Feb-04
Dialogs on Dialogs discussion
17
lti
Help message content
• Message contains one or more of
– A. What the system heard
• A report of the backup SLM recognition hypothesis
– B. What the problem was (diagnostic)
• A description of the problem with the user's utterance
– C. What you might say instead
• A similar in-grammar example
• Rule-based determination of exact content
for B & C
• Not clear how often A B & C appear & in what
combinations
20-Feb-04
Dialogs on Dialogs discussion
18
lti
B. Diagnostic
• Endpointing
– Check if initial recognized word is ok initial
parsable-input word
• Out-of-volcabulary
– Compare SLM vocab to GBLM vocab
• Subcategorization
– Check features of verbs in SLM hypothesis
• Zoom in [+intrans]
• => ! Zoom in on the red car
20-Feb-04
Dialogs on Dialogs discussion
19
lti
C. In-grammar example
• Try to use words & dialog-move type from user's
original utterance
–
–
–
–
wh-question
yn-question
answer
command
Fly over to
the
hospital
GBLM: [reject]
SLM: fly
hospital
TH: fly to the hospital
(how does TH know this is a command?)
20-Feb-04
Dialogs on Dialogs discussion
20
lti
Evaluation
• Between-groups user study
– Targeted help vs. no help
– Was user-initiated help available?
• N=20, 5 tasks each
– Only T1 & T5 assessed
– Locate an x and then land at the y
20-Feb-04
Dialogs on Dialogs discussion
21
lti
Results
• Significantly fewer TH users gave up on tasks
– Control users gave up on 39% of tasks
– TH users gave up on only 6%
• Time to completion effects
–
–
–
–
Hard to measure "completion!"
Task (=> users get better over time)
Help x Task
Help alone
• (p<.1 in "lenient" analysis)
20-Feb-04
Dialogs on Dialogs discussion
22
lti
Discussion
• Definitely an improvement over "dumb" options
• How easy are these options to automate and
port to new domains/systems?
– Classifier version needs training data
– Rule-based version needs… rules
• Is there such a thing as too smart?
• The system doesn't understand the word X
• The system doesn't understand the word X used with the red
car
20-Feb-04
Dialogs on Dialogs discussion
23
lti
Discussion (2)
• Do grammaticality improvements fostered
by TH persist?
• How frequently is TH activated?
– Does frequency decrease over time?
• At a faster rate cf. plain-old help?
– In rule-based system, how often do both LMs
fail?
20-Feb-04
Dialogs on Dialogs discussion
24
lti
Discussion (3)
• How often does either system (esp. rule-based)
provide inappropriate help?
– Wrong dialogue-move type?
– Wrong vocabulary?
• What % of 1st-utt-after-TH are grammatical?
– cf. plain-old help
• Are there other ways to implement/ supplement
TH?
– State information?
– Back-off to directed dialog? (in worst case…)
20-Feb-04
Dialogs on Dialogs discussion
25
lti
Anything else?
• Let me know if you come across any more
references to this sort of thing…
20-Feb-04
Dialogs on Dialogs discussion
26
lti
Download