Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee Alex Rudnicky (Chair) Roni Rosenfeld Jeff Schneider Eric Horvitz (Microsoft Research) Problem Lack of robustness when faced with understanding errors Spans most domains and interaction types Has a significant impact on performance 2 An example S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO] 3 Some Statistics … Semantic error rates CMU Communicator [CMU] 32% CU Communicator [CU] 27% How May I Help You? [AT&T] 36% Jupiter [MIT] 28% SpeechActs [SRI] 25% Corrections [Krahmer, Swerts, Litman, Levow] 30% of utterances correct system mistakes 2-3 times more likely to be misrecognized 4 Significant Impact on Interaction CMU Communicator Failed 26% 40% 33% sessions Contain understanding errors Multi-site Communicator Corpus [Shin et al] Failed 37% 5 63% sessions Outline Problem Approach Infrastructure Research Program Timeline & Summary 6 problem : approach : infrastructure : indicators : strategies : decision process : summary Increasing Robustness … Increase the accuracy of speech recognition Assume recognition is unreliable, and create the mechanisms for acting robustly at the dialogue management level ASR performance increases / demands increase More general 7 problem : approach : infrastructure : indicators : strategies : decision process : summary Snapshot of Existing Work: Slide 1 Theoretical models of grounding Contribution Model [Clark], Grounding Acts [Traum] Analytical/Descriptive, not decision oriented Practice: heuristic rules Misunderstandings Threshold(s) on confidence scores Non-understandings Ad-hoc, lack generality, not easy to extend 8 problem : approach : infrastructure : indicators : strategies : decision process : summary Snapshot of Existing Work: Slide 2 Conversation as Action under Uncertainty [Paek and Horvitz] Belief networks to model uncertainties Decisions based on expected utility, VOI-analysis Reinforcement learning for dialogue control policies [Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc] Formulate dialogue control as an MDP Learn optimal control policy from data Do not scale up to complex, real-world domains 9 problem : approach : infrastructure : indicators : strategies : decision process : summary Research Program: Goals & Approach A task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Approach: Decision making under uncertainty 10 problem : approach : infrastructure : indicators : strategies : decision process : summary Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 11 problem : approach : infrastructure : indicators : strategies : decision process : summary Infrastructure Completed RavenClaw Modern dialog management framework for complex, task-oriented domains Completed 12 RavenClaw spoken dialogue systems Test-bed for evaluation problem : approach : infrastructure : indicators : strategies : decision process : summary RavenClaw RoomLine query GetQuery Login GetResults DiscussResults results Welcome GreetUser AskRegistered registered DateTime Location AskName Network user_name Properties Projector Whiteboard Dialogue Task (Specification) Domain-Independent Dialogue Engine registered: [No]-> false, [Yes] -> true Indicators Error Handling Decision Process ExplicitConfirm AskRegistered Login Strategies RoomLine Dialogue Stack 13 registered: [No]-> false, [Yes] -> true user_name: [UserName] registered: [No]-> false, [Yes] -> true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network] Expectation Agenda problem : approach : infrastructure : indicators : strategies : decision process : summary RavenClaw-based Systems RoomLine CMU Let’s Go!! Bus Information System LARRI [Symphony] TeamTalk [11-741] Eureka [11-743] 14 problem : approach : infrastructure : indicators : strategies : decision process : summary Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 15 problem : approach : infrastructure : indicators : strategies : decision process : summary Existing Work Confidence Annotation Traditionally focused on speech recognizer [Bansal, Chase, Cox, and others] Recently, multiple sources of knowledge [San-Segundo, Walker, Bosch, Bohus, and others] Recognition, parsing, dialogue management Detect misunderstandings: ~ 80-90% accuracy Correction and Aware Site Detection [Swerts, Litman, Levow and others] Multiple sources of knowledge Detect corrections: ~ 80-90% accuracy 16 problem : approach : infrastructure : indicators : strategies : decision process : summary Proposed: Belief Updating Continuously assess beliefs in light of initial confidence and subsequent events An example: S: Where are you flying from? U: [CityName={Aspen/0.6; Austin/0.2}] initial belief + S: Did you say you wanted to fly out of Aspen? U: [No] [CityName={Boston/0.8}] [CityName={Aspen/?; Austin/?; Boston/?}] 17 system action + user response updated belief problem : approach : infrastructure : indicators : strategies : decision process : summary Belief Updating: Approach Model the update in a dynamic belief network t t+1 C C initial belief updated belief system action User response features contents CurrentTop confidence Confidence Yes correction 18 Current3r d Current2nd Utterance Length No Positive Markers Negative Markers problem : approach : infrastructure : indicators : strategies : decision process : summary Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 19 problem : approach : infrastructure : indicators : strategies : decision process : summary Is the Dialogue Advancing Normally? Locally, turn-level: Non-understanding indicators Non-understanding flag directly available Develop additional indicators Recognition, Understanding, Interpretation Globally, discourse-level: Dialogue-on-track indicators Summary statistics of non-understanding 20 indicators Rate of dialogue advance problem : approach : infrastructure : indicators : strategies : decision process : summary Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 21 problem : approach : infrastructure : indicators : strategies : decision process : summary Error Recovery Strategies Identify Identify and define an extended set of error handling strategies Implement Construct task-decoupled implementations of a large number of strategies Evaluate Evaluate performance and bring further refinements 22 List of Error Recovery Strategies User Initiated Help Where are we? Start over Scratch concept value Go back Channel establishment Suspend/Resume Repeat Summarize Quit 23 System Initiated Ensure that the system has reliable information (misunderstandings) Explicit confirmation Implicit confirmation Disambiguation Ask repeat concept Reject concept Ensure that the dialogue on track Local problems (non-understandings) Switch input modality SNR repair Ask repeat turn Ask rephrase turn Notify non-understanding Explicit confirm turn Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say Global problems (compounded, discourse-level problems) Restart subtask plan Select alternative plan Start over Terminate session / Direct to operator problem : approach : infrastructure : indicators : strategies : decision process : summary List of Error Recovery Strategies User Initiated Help Where are we? Start over Scratch concept value Go back Channel establishment Suspend/Resume Repeat Summarize Quit 24 System Initiated Ensure that the system has reliable information (misunderstandings) Explicit confirmation Implicit confirmation Disambiguation Ask repeat concept Reject concept Ensure that the dialogue on track Local problems (non-understandings) Switch input modality SNR repair Ask repeat turn Ask rephrase turn Notify non-understanding Explicit confirm turn Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say Global problems (compounded, discourse-level problems) Restart subtask plan Select alternative plan Start over Terminate session / Direct to operator problem : approach : infrastructure : indicators : strategies : decision process : summary Error Recovery Strategies: Evaluation Reusability Deploy in different spoken dialogue systems Efficiency of non-understanding strategies Simple metric: Is the next utterance understood? Efficiency depends on decision process Construct upper and lower bounds for efficiency 25 Lower bound: decision process which chooses uniformly Upper bound: human performs decision process (WOZ) problem : approach : infrastructure : indicators : strategies : decision process : summary Three Components 0. Infrastructure 1. Error awareness Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing 2. Error recovery strategies Develop and investigate an extended set of conversational error handling strategies 3. Error handling decision process Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems 26 problem : approach : infrastructure : indicators : strategies : decision process : summary Previous Reinforcement Learning Work Dialogue control ~ Markov Decision Process States Actions Rewards S2 A S1 S3 Previous work: successes in small domains NJFun [Singh, Kearns, Litman, Walker et al] Problems Lack of scalability Once learned, policies are not reusable 27 problem : approach : infrastructure : indicators : strategies : decision process : summary Proposed Approach Overcome previous shortcomings: 1. Focus learning only on error handling Reduces the size of the learning problem Favors reusability of learned policies Lessens the system development effort 2. Use a “divide-and-conquer” approach 28 Leverage independences in dialogue problem : approach : infrastructure : indicators : strategies : decision process : summary Gated Markov Decision Processes Topic-MDP No Action RoomLine Explicit Confirmation Topic-MDP Login No Action Welcome GreetUser AskRegistered AskName registered Concept-MDP Gating Mechanism Topic-MDP user_name No Action Concept-MDP Explicit Confirm No Action Small-size models Parameters can be tied across 29 models Easy to design initial policies Decoupling favors reusability of policies Accommodate dynamic task generation Independence assumption problem : approach : infrastructure : indicators : strategies : decision process : summary Reward structure & learning Global, post-gate rewards Local rewards Reward Action Action Gating Mechanism Gating Mechanism Reward MDP MDP MDP Rewards based on any 30 dialogue performance metric Atypical, multi-agent reinforcement learning setting Reward MDP Reward MDP MDP Multiple, standard RL problems Model-based approaches problem : approach : infrastructure : indicators : strategies : decision process : summary Evaluation Performance Compare learned policies with initial heuristic policies Metrics Task completion Efficiency Number and lengths of error segments User satisfaction Scalability Deploy in a system operating with a sizable task Theoretical analysis 31 problem : approach : infrastructure : indicators : strategies : decision process : summary Outline 32 Problem Approach Infrastructure Research Program Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary Summary of Contributions Overall Goal: develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Modern dialogue management framework Belief updating framework Investigation of an extended set of error handling 33 strategies Scalable data-driven approach for learning error handling policies problem : approach : infrastructure : indicators : strategies : decision process : summary Timeline now indicators data strategies Misunderstanding and non-understanding strategies end of year 4 Data collection for belief updating and WOZ study Develop and evaluate the belief updating models Data collection for RL training end of year 5 Data collection for RL evaluation Contingency data collection efforts 5.5 years 34 Evaluate non-understanding strategies; develop remaining strategies decisions Investigate theoretical aspects of proposed reinforcement learning model Implement dialogue-on-track indicators proposal milestone 1 milestone 2 Error handling decision process: reinforcement learning experiments milestone 3 Additional experiments: extensions or contingency work defense problem : approach : infrastructure : indicators : strategies : decision process : summary Thank You! Questions & Comments committee members, then floor 35 Indicators: Goals Goal: Increase awareness and capacity to detect problems Develop indicators which can inform the error handling process about potential problems System acquires information System acquires incorrect information Misunderstanding Understanding process System does not acquire information Non-understanding 36 System acquires correct information OK Year 4 Experiments 1 Data Collection and Experiments Year 5 Spring’04 2 3 4 BDC: Background Data Collection Summer’04 5 6 7 [5] – 4 months DC-1: Data collection for belief updating and nonunderstanding strategies evaluation WOZ: Wizard-of-oz experiment for nonunderstanding strategies Indicators Strategies Decision Process Writing BDC: Background Data Collection Non-understanding and Dialogue-OnTrack Indicators (Work Item 6) Error prevention and recovery strategies (Work Item 8) Decision Process: Reinforcement Learning Work (Work Item 9) Winter’04-05 11 12 13 Year 6 Spring’05 14 15 16 Summer’05 17 18 19 [11] – 2 m DC-2E: Data collection for decision process evaluation [9] – 3 months DC-2L: Data collection for decision process training and baselines Fall’05 20 21 22 Winter 23 24 [14] – 6 months Contingency (or extension work items) data collections / experiments [6] – 5 months Build and evaluate belief updating models, integrate in RavenClaw Belief Updating (Work Item 5) [7] – 3 months Implement dialogue-on-track indicators [1] – 4 months - Finish RavenClaw implementations for the misunderstanding and non-understanding strategies [4] – 6 months - Evaluate non-understanding strategies in random exploration mode and in a WOZ setting - Develop the rest of the error handling strategies [2] – 12 months Investigate more the theoretical aspects of the proposed RL model, establish final structure for the topic and concept MDPs, design initial policies, and finalize structure for gating function. Implement the models in the RavenClaw dialogue management framework. [3] – 3 months Write paper on RavenClaw conversational strategies for error handling Writing 1 37 Fall’04 8 9 10 2 3 4 [15] – 6 months Refinements of the proposed model, follow-up work for evaluating adaptability and reusability of policies [10] – 6 months Perform reinforcement learning experiments/evaluation for the decision process [16] – 6 months (Contingency time) Alternative data-driven models [13] – 3 months Write decision process paper [8] – 3 months Write belief updating paper [12] – 10 months Write thesis document 5 6 M1 8 9 10 11 M2 13 14 15 16 17 M3 19 20 21 22 23 24 problem : approach : support work : indicators : strategies : decision process : summary Three Desired Properties Task-Independence Reuse the proposed architecture across different spoken dialogue systems with a minimal amount of authoring effort Adaptability Learn from experience how to adapt to the characteristics of various domains Scalability Applicable in spoken dialogue systems operating with large, practical tasks 38 ExplConf ImplConf ExplConf ImplConf LC ImplConf MC NoAct HC NoAct NoAct 0 39 ExplConf NoAct Belief Updating: Approach Model the update in a dynamic belief network Top-N values Fixed structure Learn parameters t t+1 C C System Action Data collection Evaluation Accuracy Soft-error CurrentTop Current2nd Confidence Current3r d No Yes Utterance Length Positive Markers Negative Markers User response features 40 problem : approach : infrastructure : indicators : strategies : decision process : summary Gated Markov Decision Processes Topic-MDP No Action RoomLine Explicit Confirmation Topic-MDP Login No Action Welcome GreetUser AskRegistered AskName registered Concept-MDP user_name Gating Mechanism Topic-MDP No Action Concept-MDP Explicit Confirm No Action Issues: Structure of individual MDPs Gating mechanism Reward structure and learning 41 problem : approach : infrastructure : indicators : strategies : decision process : summary Structure for individual MDPs State-space: informative subset of corresponding indicators Concept-MDPs: confidence / beliefs Topic-MDPs: non-understanding, dialogue-ontrack indicators Action-space corresponding system-initiated error handling strategies 42 problem : approach : infrastructure : indicators : strategies : decision process : summary Gating Mechanism Heuristic derived from domain-independent dialogue principles Give priority to topics over concept Give priority to entities closer to the conversational focus 43 problem : approach : infrastructure : indicators : strategies : decision process : summary