Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems

advertisement
Error Awareness and Recovery in
Task-Oriented Spoken Dialogue Systems
Thesis Proposal
Dan Bohus
Carnegie Mellon University, January 2004
Thesis Committee
Alex Rudnicky (Chair)
Roni Rosenfeld
Jeff Schneider
Eric Horvitz (Microsoft Research)
Problem
Lack of robustness when faced with
understanding errors
 Spans most domains and interaction types
 Has a significant impact on performance
2
An example
S: Are you a registered user ?
U: No I'm not. No [NO I'M NOT NO]
S: What is your full name ?
U: Adam Schumacher [ATHENS IN AKRON]
S: There is an Athens in Georgia and in Greece. Which destination did you want ?
U: Neither [NEITHER]
S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which
destination did you want ?
U: Georgia [GEORGIA]
S: A flight from Athens... Where do you want to go ?
U: Start over [START OVER]
S: Do you really want to start over ?
U: Yes [YES]
S: What city are you leaving from ?
U: Hamilton, Ontario [HILTON ONTARIO]
S: Sorry, I'm not sure I understood what you said. Where are you leaving from ?
U: Hamilton [HILTON]
S: I'm still having trouble understanding you. To go on, I need you to answer the
following question. What city are you leaving from ?
U: Toronto [TORONTO]
3
Some Statistics …


Semantic error rates
CMU Communicator [CMU]
32%
CU Communicator [CU]
27%
How May I Help You? [AT&T]
36%
Jupiter [MIT]
28%
SpeechActs [SRI]
25%
Corrections [Krahmer, Swerts, Litman, Levow]
 30% of utterances correct system mistakes
 2-3 times more likely to be misrecognized
4
Significant Impact on Interaction

CMU Communicator
Failed
26%
40%
33%
sessions
Contain understanding errors

Multi-site Communicator Corpus [Shin et al]
Failed
37%
5
63%
sessions
Outline

Problem
 Approach
 Infrastructure
 Research Program
 Timeline & Summary
6
problem : approach : infrastructure : indicators : strategies : decision process : summary
Increasing Robustness …

Increase the accuracy of speech recognition

Assume recognition is unreliable, and create
the mechanisms for acting robustly at the
dialogue management level
 ASR performance increases / demands increase
 More general
7
problem : approach : infrastructure : indicators : strategies : decision process : summary
Snapshot of Existing Work: Slide 1

Theoretical models of grounding
 Contribution Model [Clark], Grounding Acts [Traum]
Analytical/Descriptive, not decision oriented

Practice: heuristic rules
 Misunderstandings

Threshold(s) on confidence scores
 Non-understandings
Ad-hoc, lack generality, not easy to extend
8
problem : approach : infrastructure : indicators : strategies : decision process : summary
Snapshot of Existing Work: Slide 2

Conversation as Action under Uncertainty
[Paek and Horvitz]
 Belief networks to model uncertainties
 Decisions based on expected utility, VOI-analysis

Reinforcement learning for dialogue control
policies [Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc]
 Formulate dialogue control as an MDP
 Learn optimal control policy from data
Do not scale up to complex, real-world domains
9
problem : approach : infrastructure : indicators : strategies : decision process : summary
Research Program: Goals & Approach
A task-independent, adaptive and scalable
framework for error recovery in task-oriented
spoken dialogue systems
Approach:
 Decision making under uncertainty
10
problem : approach : infrastructure : indicators : strategies : decision process : summary
Three Components
0. Infrastructure
1. Error awareness
Develop indicators that …
 Assess reliability of information
 Assess how well the dialogue is advancing
2. Error recovery strategies
 Develop and investigate an extended set of
conversational error handling strategies
3. Error handling decision process
 Develop a scalable reinforcement-learning based
approach for error recovery in spoken dialogue systems
11
problem : approach : infrastructure : indicators : strategies : decision process : summary
Infrastructure

Completed
RavenClaw
 Modern dialog management framework for
complex, task-oriented domains

Completed
12
RavenClaw spoken dialogue systems
 Test-bed for evaluation
problem : approach : infrastructure : indicators : strategies : decision process : summary
RavenClaw
RoomLine
query
GetQuery
Login
GetResults
DiscussResults
results
Welcome
GreetUser
AskRegistered
registered
DateTime
Location
AskName
Network
user_name
Properties
Projector
Whiteboard
Dialogue Task (Specification)
Domain-Independent Dialogue Engine
registered: [No]-> false, [Yes] -> true
Indicators
Error Handling
Decision Process
ExplicitConfirm
AskRegistered
Login
Strategies
RoomLine
Dialogue Stack
13
registered: [No]-> false, [Yes] -> true
user_name: [UserName]
registered: [No]-> false, [Yes] -> true
user_name: [UserName]
query.date_time: [DateTime]
query.location: [Location]
query.network: [Network]
Expectation Agenda
problem : approach : infrastructure : indicators : strategies : decision process : summary
RavenClaw-based Systems
 RoomLine
 CMU Let’s Go!! Bus Information
System
 LARRI [Symphony]
 TeamTalk [11-741]
 Eureka [11-743]
14
problem : approach : infrastructure : indicators : strategies : decision process : summary
Three Components
0. Infrastructure
1. Error awareness
Develop indicators that …
 Assess reliability of information
 Assess how well the dialogue is advancing
2. Error recovery strategies
 Develop and investigate an extended set of
conversational error handling strategies
3. Error handling decision process
 Develop a scalable reinforcement-learning based
approach for error recovery in spoken dialogue systems
15
problem : approach : infrastructure : indicators : strategies : decision process : summary
Existing Work

Confidence Annotation
 Traditionally focused on speech recognizer
[Bansal, Chase, Cox, and others]
 Recently, multiple sources of knowledge
[San-Segundo, Walker, Bosch, Bohus, and others]

Recognition, parsing, dialogue management
 Detect misunderstandings: ~ 80-90% accuracy

Correction and Aware Site Detection
[Swerts, Litman, Levow and others]
 Multiple sources of knowledge
 Detect corrections: ~ 80-90% accuracy
16
problem : approach : infrastructure : indicators : strategies : decision process : summary
Proposed: Belief Updating

Continuously assess beliefs in light of initial
confidence and subsequent events

An example:
S:
Where are you flying from?
U: [CityName={Aspen/0.6; Austin/0.2}]
initial belief
+
S:
Did you say you wanted to fly out of Aspen?
U: [No] [CityName={Boston/0.8}]
[CityName={Aspen/?;
Austin/?;
Boston/?}]
17
system action
+
user response
updated belief
problem : approach : infrastructure : indicators : strategies : decision process : summary
Belief Updating: Approach

Model the update in a dynamic belief network
t
t+1
C
C
initial belief
updated belief
system
action
User response features
contents
CurrentTop
confidence
Confidence
Yes
correction
18
Current3r
d
Current2nd
Utterance
Length
No
Positive
Markers
Negative
Markers
problem : approach : infrastructure : indicators : strategies : decision process : summary
Three Components
0. Infrastructure
1. Error awareness
Develop indicators that …
 Assess reliability of information
 Assess how well the dialogue is advancing
2. Error recovery strategies
 Develop and investigate an extended set of
conversational error handling strategies
3. Error handling decision process
 Develop a scalable reinforcement-learning based
approach for error recovery in spoken dialogue systems
19
problem : approach : infrastructure : indicators : strategies : decision process : summary
Is the Dialogue Advancing Normally?
Locally, turn-level:
 Non-understanding indicators
 Non-understanding flag directly available
 Develop additional indicators

Recognition, Understanding, Interpretation
Globally, discourse-level:
 Dialogue-on-track indicators
 Summary statistics of non-understanding

20
indicators
Rate of dialogue advance
problem : approach : infrastructure : indicators : strategies : decision process : summary
Three Components
0. Infrastructure
1. Error awareness
Develop indicators that …
 Assess reliability of information
 Assess how well the dialogue is advancing
2. Error recovery strategies
 Develop and investigate an extended set of
conversational error handling strategies
3. Error handling decision process
 Develop a scalable reinforcement-learning based
approach for error recovery in spoken dialogue systems
21
problem : approach : infrastructure : indicators : strategies : decision process : summary
Error Recovery Strategies

Identify
 Identify and define an extended set of error
handling strategies

Implement
 Construct task-decoupled implementations of a
large number of strategies

Evaluate
 Evaluate performance and bring further
refinements
22
List of Error Recovery Strategies
User Initiated
Help
Where are we?
Start over
Scratch concept value
Go back
Channel establishment
Suspend/Resume
Repeat
Summarize
Quit
23
System Initiated
Ensure that the system
has reliable information
(misunderstandings)
Explicit confirmation
Implicit confirmation
Disambiguation
Ask repeat concept
Reject concept
Ensure that the dialogue
on track
Local problems
(non-understandings)
Switch input modality
SNR repair
Ask repeat turn
Ask rephrase turn
Notify non-understanding
Explicit confirm turn
Targeted help
WH-reformulation
Keep-a-word reformulation
Generic help
You can say
Global problems
(compounded,
discourse-level
problems)
Restart subtask plan
Select alternative plan
Start over
Terminate session /
Direct to operator
problem : approach : infrastructure : indicators : strategies : decision process : summary
List of Error Recovery Strategies
User Initiated
Help
Where are we?
Start over
Scratch concept value
Go back
Channel establishment
Suspend/Resume
Repeat
Summarize
Quit
24
System Initiated
Ensure that the system
has reliable information
(misunderstandings)
Explicit confirmation
Implicit confirmation
Disambiguation
Ask repeat concept
Reject concept
Ensure that the dialogue
on track
Local problems
(non-understandings)
Switch input modality
SNR repair
Ask repeat turn
Ask rephrase turn
Notify non-understanding
Explicit confirm turn
Targeted help
WH-reformulation
Keep-a-word reformulation
Generic help
You can say
Global problems
(compounded,
discourse-level
problems)
Restart subtask plan
Select alternative plan
Start over
Terminate session /
Direct to operator
problem : approach : infrastructure : indicators : strategies : decision process : summary
Error Recovery Strategies: Evaluation

Reusability
 Deploy in different spoken dialogue systems

Efficiency of non-understanding strategies
 Simple metric: Is the next utterance understood?
 Efficiency depends on decision process
 Construct upper and lower bounds for efficiency


25
Lower bound: decision process which chooses uniformly
Upper bound: human performs decision process (WOZ)
problem : approach : infrastructure : indicators : strategies : decision process : summary
Three Components
0. Infrastructure
1. Error awareness
Develop indicators that …
 Assess reliability of information
 Assess how well the dialogue is advancing
2. Error recovery strategies
 Develop and investigate an extended set of
conversational error handling strategies
3. Error handling decision process
 Develop a scalable reinforcement-learning based
approach for error recovery in spoken dialogue systems
26
problem : approach : infrastructure : indicators : strategies : decision process : summary
Previous Reinforcement Learning Work

Dialogue control ~ Markov Decision Process
 States
 Actions
 Rewards

S2
A
S1
S3
Previous work: successes in small domains
 NJFun [Singh, Kearns, Litman, Walker et al]

Problems
 Lack of scalability
 Once learned, policies are not reusable
27
problem : approach : infrastructure : indicators : strategies : decision process : summary
Proposed Approach
Overcome previous shortcomings:
1. Focus learning only on error handling



Reduces the size of the learning problem
Favors reusability of learned policies
Lessens the system development effort
2. Use a “divide-and-conquer” approach

28
Leverage independences in dialogue
problem : approach : infrastructure : indicators : strategies : decision process : summary
Gated Markov Decision Processes
Topic-MDP
No Action
RoomLine
Explicit
Confirmation
Topic-MDP
Login
No Action
Welcome
GreetUser
AskRegistered
AskName
registered
Concept-MDP
Gating
Mechanism
Topic-MDP
user_name
No Action
Concept-MDP
Explicit Confirm
No Action
 Small-size models
 Parameters can be tied across

29
models
Easy to design initial policies
 Decoupling favors
reusability of policies
 Accommodate dynamic
task generation
 Independence assumption
problem : approach : infrastructure : indicators : strategies : decision process : summary
Reward structure & learning
Global, post-gate rewards
Local rewards
Reward
Action
Action
Gating Mechanism
Gating Mechanism
Reward
MDP
MDP
MDP
 Rewards based on any

30
dialogue performance metric
Atypical, multi-agent
reinforcement learning setting
Reward
MDP
Reward
MDP
MDP
 Multiple, standard RL

problems
Model-based approaches
problem : approach : infrastructure : indicators : strategies : decision process : summary
Evaluation

Performance
 Compare learned policies with initial heuristic

policies
Metrics





Task completion
Efficiency
Number and lengths of error segments
User satisfaction
Scalability
 Deploy in a system operating with a sizable task
 Theoretical analysis
31
problem : approach : infrastructure : indicators : strategies : decision process : summary
Outline





32
Problem
Approach
Infrastructure
Research Program
Summary & Timeline
problem : approach : infrastructure : indicators : strategies : decision process : summary
Summary of Contributions

Overall Goal: develop a task-independent,
adaptive and scalable framework for error
recovery in task-oriented spoken dialogue
systems
 Modern dialogue management framework
 Belief updating framework
 Investigation of an extended set of error handling

33
strategies
Scalable data-driven approach for learning error
handling policies
problem : approach : infrastructure : indicators : strategies : decision process : summary
Timeline
now
indicators
data
strategies
Misunderstanding
and
non-understanding
strategies
end of
year 4
Data collection for
belief updating and
WOZ study
Develop and
evaluate the
belief updating
models
Data collection for
RL training
end of
year 5
Data collection for
RL evaluation
Contingency
data collection
efforts
5.5 years
34
Evaluate
non-understanding
strategies; develop
remaining
strategies
decisions
Investigate
theoretical
aspects of
proposed
reinforcement
learning
model
Implement
dialogue-on-track
indicators
proposal
milestone 1
milestone 2
Error handling
decision process:
reinforcement
learning
experiments
milestone 3
Additional
experiments:
extensions or
contingency work
defense
problem : approach : infrastructure : indicators : strategies : decision process : summary
Thank You!
Questions & Comments
committee members,
then floor
35
Indicators: Goals

Goal: Increase awareness and capacity to
detect problems
 Develop indicators which can inform the error
handling process about potential problems
System acquires
information
System acquires
incorrect information
Misunderstanding
Understanding
process
System does not
acquire information
Non-understanding
36
System acquires
correct information
OK
Year 4
Experiments
1
Data Collection and
Experiments
Year 5
Spring’04
2
3
4
BDC: Background Data
Collection
Summer’04
5
6
7
[5] – 4 months
DC-1: Data collection for
belief updating and nonunderstanding strategies
evaluation
WOZ: Wizard-of-oz
experiment for nonunderstanding strategies
Indicators
Strategies
Decision
Process
Writing
BDC: Background Data
Collection
Non-understanding
and Dialogue-OnTrack Indicators
(Work Item 6)
Error prevention
and recovery
strategies
(Work Item 8)
Decision Process:
Reinforcement
Learning Work
(Work Item 9)
Winter’04-05
11
12
13
Year 6
Spring’05
14
15
16
Summer’05
17
18
19
[11] – 2 m
DC-2E:
Data
collection
for
decision
process
evaluation
[9] – 3 months
DC-2L: Data
collection for
decision process
training and
baselines
Fall’05
20
21
22
Winter
23
24
[14] – 6 months
Contingency (or extension work items)
data collections / experiments
[6] – 5 months
Build and evaluate belief
updating models, integrate in
RavenClaw
Belief Updating
(Work Item 5)
[7] – 3 months
Implement
dialogue-on-track
indicators
[1] – 4 months
- Finish RavenClaw
implementations for the
misunderstanding and
non-understanding
strategies
[4] – 6 months
- Evaluate non-understanding
strategies in random exploration mode
and in a WOZ setting
- Develop the rest of the error handling
strategies
[2] – 12 months
Investigate more the theoretical aspects of the proposed RL model, establish final
structure for the topic and concept MDPs, design initial policies, and finalize
structure for gating function.
Implement the models in the RavenClaw dialogue management framework.
[3] – 3 months
Write paper on
RavenClaw
conversational
strategies for
error handling
Writing
1
37
Fall’04
8
9
10
2
3
4
[15] – 6 months
Refinements of the proposed model,
follow-up work for evaluating
adaptability and reusability of policies
[10] – 6 months
Perform reinforcement learning
experiments/evaluation for the
decision process
[16] – 6 months (Contingency time)
Alternative data-driven models
[13] – 3 months
Write decision
process paper
[8] – 3 months
Write belief
updating paper
[12] – 10 months
Write thesis document
5
6
M1
8
9
10
11
M2
13
14
15
16
17
M3
19
20
21
22
23
24
problem : approach : support work : indicators : strategies : decision process : summary
Three Desired Properties

Task-Independence
 Reuse the proposed architecture across different spoken
dialogue systems with a minimal amount of authoring effort

Adaptability
 Learn from experience how to adapt to the characteristics of
various domains

Scalability
 Applicable in spoken dialogue systems operating with large,
practical tasks
38
ExplConf
ImplConf
ExplConf
ImplConf
LC
ImplConf
MC
NoAct
HC
NoAct
NoAct
0
39
ExplConf
NoAct
Belief Updating: Approach




Model the update in a dynamic belief network
Top-N values
Fixed structure
Learn parameters
t
t+1
C
C
System
Action
 Data collection

Evaluation
 Accuracy
 Soft-error
CurrentTop
Current2nd
Confidence
Current3r
d
No
Yes
Utterance
Length
Positive
Markers
Negative
Markers
User response features
40
problem : approach : infrastructure : indicators : strategies : decision process : summary
Gated Markov Decision Processes
Topic-MDP
No Action
RoomLine
Explicit
Confirmation
Topic-MDP
Login
No Action
Welcome
GreetUser
AskRegistered
AskName
registered
Concept-MDP
user_name
Gating
Mechanism
Topic-MDP
No Action
Concept-MDP
Explicit Confirm
No Action
Issues:
 Structure of individual MDPs
 Gating mechanism
 Reward structure and learning
41
problem : approach : infrastructure : indicators : strategies : decision process : summary
Structure for individual MDPs

State-space:
 informative subset of corresponding indicators
 Concept-MDPs: confidence / beliefs
 Topic-MDPs: non-understanding, dialogue-ontrack indicators

Action-space
 corresponding system-initiated error handling
strategies
42
problem : approach : infrastructure : indicators : strategies : decision process : summary
Gating Mechanism

Heuristic derived from domain-independent
dialogue principles
 Give priority to topics over concept
 Give priority to entities closer to the conversational
focus
43
problem : approach : infrastructure : indicators : strategies : decision process : summary
Download