RavenClaw An improved dialog management architecture for task-oriented spoken dialog systems Presented by: Dan Bohus (dbohus@cs.cmu.edu) Work by: Dan Bohus, Alex Rudnicky, Andrew Hoskins Carnegie Mellon University, 2002 New DM Architecture: Goals Able to handle complex, goal-directed dialogs Easy to develop and maintain systems Go beyond (information access systems and) the slot-filling paradigm Developer focuses only on dialog task Automatically ensure a minimum set of taskindependent, conversational skills Open to learning (hopefully both at task and discourse levels) Open to dynamic SDS generation More careful, more structured code, logs, etc: provide a robust basis for future research. 05-22-2002 RavenClaw: a new DM architecture 2 A View from far, far away SELECT * WHERE … Try opening that hatch Since that failed, I need you to push button B Can you repeat that, please ? Suspend… Resume … What did you just say ? Backend Dialog Task Specification Conversational Skills Core Let the developer focus only on the dialog task spec.: Don’t worry about misunderstandings, the accuracy of concepts, repeats, focus shifts, barge-ins, etc… merely describe (program) the task, assuming perfect knowledge of the world Automatically generate the conversational mechanisms 05-22-2002 RavenClaw: a new DM architecture 3 Backend Outline DTS Conversational Goals A view from far away Main ideas Core Dialog Task Specification / Execution Conversational skills In more detail Dialog Task Specification / Execution Conversational skills 05-22-2002 RavenClaw: a new DM architecture 4 Dialog Task Spec & Execution Communicator Welcome Login Travel Locals Bye DTS AskRegistered AskName GreetUser GetProfile DepartLocation Leg1 ArriveLocation Dialog Task implemented by a hierarchy of agents Handle and Operate based on concepts Execution with interleaved Input Passes. Execute the agents by top-down “planning” Do input passes when information is required REMEMBER: This is just the dialog task 05-22-2002 RavenClaw: a new DM architecture 5 Handling inputs Communicator Welcome Login Travel Locals Bye DTS AskRegistered AskName GreetUser GetProfile DepartLocation Leg1 ArriveLocation Input Pass Assemble an agenda of expectations (open concepts) Bind values from the input to the concepts Process non-understanding (if), analyze need for focus shifts Continue execution 05-22-2002 RavenClaw: a new DM architecture 6 Conversational Skills / Mechanisms A lot of problems in SDS generated by lack of conversational skills. “It’s all in the little details!” Conversational Dealing with misunderstandings Generic channel/dialog mechanisms : repeats, focus shift, context establishment, help, start over, etc, etc. Timing Even when these mechanisms are in, they lack uniformity & consistency. Development and maintenance are time consuming. 05-22-2002 RavenClaw: a new DM architecture 7 Conversational Skills / Mechanisms The core takes care of these by dynamically inserting Conversational appropriate agencies in the task tree A list of (more or less) task independent mechanisms: Implicit/Explicit Confirmations, Clarifications, Disambiguation = the whole Misunderstandings problem Context reestablishment Timeout and Barge-in control Back-channel absorption Generic dialog mechanisms: 05-22-2002 Repeat, Suspend… Resume, Help, Start over, Summarize, Undo, Querying the system’s belief RavenClaw: a new DM architecture 8 Outline DTS Goals A view from far away Main ideas Dialog Task Specification / Execution Conversational skills In more detail Dialog Task Specification / Execution Conversational skills 05-22-2002 RavenClaw: a new DM architecture 9 Dialog Task Specification Goal: able to handle complex domains, beyond information access, frame-based, slot-filling systems i.e. : Symphony, Intelligent checklists, Navigation, Route planning We need a powerful enough formalism to describe all these tasks: C++ code ? Declarative would be nice … but is it powerful enough ? Templatized C++ code … ? 05-22-2002 RavenClaw: a new DM architecture 10 Dialog Task Specification Tree of predefined agents types: Each agent has: Inform, Request, Expect, Execute A set of concepts Preconditions Success Criteria Effects Focus Criteria (triggers) Concepts Data, Type (basic, struct, array) Confidence/Value, Availability, Ambiguousness, Groundedness, System/User, TurnAcquired, TurnConveyed, etc… 05-22-2002 RavenClaw: a new DM architecture 11 An example DTS UserLogin: AGENCY concepts: registered(BOOL), name(STRING), id(STRING), profile(PROFILE), profile_found(BOOL) achieves_when: profile || InformProfileNotFound AskRegistered: REQUEST(registered) grammar: {[yes]->true,[no]->false,[guest]->false} AskName: REQUEST(name) precond: registered==no grammar: [user_name] max_attemps: 2 InformGreetUser: INFORM precond: name AskID: REQUEST(id) precond: registered==yes mapping: [user_id] DoProfileRetrieval: EXECUTE precond: name || id call: ABEProfile.Call >name, >id, <profile, <profile_found InformProfileNotFound: INFORM precond: !profile_found Given that the baseline is 259 lines of C++ code, this is pretty good. 05-22-2002 RavenClaw: a new DM architecture 12 Can a formalism cut it ? People have repeatedly tried formalizing dialog … and failed We’re focusing only on the task (like in robotics/execution) Actually, these agents are all C++ classes, so we can backoff to code; the hope is that most of the behaviors can be easily expressed as above. 05-22-2002 RavenClaw: a new DM architecture 13 DTS execution Agency.Execute() decides which subagent is executed next, based on preconditions Various simple policies can be implemented Left-to-right (open/closed), choice, etc But free to do more sophisticated things (MDPs, etc) ~ learning at the task level 05-22-2002 RavenClaw: a new DM architecture 14 Libraries of DTS agencies ? Provide a library of “common task” and “common discourse” agencies Frame agency List browse agency Choose agency Disambiguate agency, Ground Agency, … Etc 05-22-2002 RavenClaw: a new DM architecture 15 Input Pass 1. Construct an agenda of expectations (Partially?) ordered list of concepts expected by the system [DepartureCity] Co [ArrivalCity] Welcome Regist. Nam Login Greet Travel Prof. Dep Locals Bye [Name] [Registered] [Hotel] [Bye] Leg1 Arr Focused 05-22-2002 RavenClaw: a new DM architecture 16 Input Pass (continued) 2. Bind values/confidences to concepts The System <> Mixed Initiative spectrum can be expressed in terms of the way the agenda is constructed and binding policies, independent of task [DepartureCity] I’m flying to San Francisco and I need a hotel there. 05-22-2002 RavenClaw: a new DM architecture [ArrivalCity] [Name] [Registered] [Hotel] [Bye] 17 Input pass (continued) 3. Process non-understandings (iff) - try and detect source and inform user: Channel (SNR, clipping) Decoding (confidence score, prosody) Parsing (parsing scores) Dialog level (parse ok, but no expectation match) 05-22-2002 RavenClaw: a new DM architecture 18 Input Pass 4. Focus shifts Focus shifts seem to be task dependent. Decision to shift focus is taken by the task (DTS) But they also have a TI-side (sub-dialog size, context reestablishment). Context reestablishment is handled automatically, in the Core (see later) 05-22-2002 RavenClaw: a new DM architecture 19 Outline Conversational Goals A view from far away Main ideas Core Dialog Task Specification / Execution Conversational skills In more detail Dialog Task Specification / Execution Conversational skills 05-22-2002 RavenClaw: a new DM architecture 20 Task-Independent, Conversational Mechanisms Should be transparently handled by the core However, the developer should be able to write his own customized mechanisms if needed Most cases handled by inserting extra “discourse” agents on the fly in the dialog task tree 05-22-2002 RavenClaw: a new DM architecture 21 Conversational Skills: A List The grounding / misunderstandings problems Universal dialog mechanisms: Repeat, Suspend… Resume, Help, Start over, Summarize, Undo, Querying the system’s belief Timing and Barge-in control Focus Shifts, Context Establishment Back-channel absorption Q: To which extent can we abstract these away from the Dialog Task ? 05-22-2002 RavenClaw: a new DM architecture 22 UDM: Repeat Repeat (simple) Repeat (with referents) The DTT is adorned with a “Repeat” Agency automatically at start-up Which calls upon the OutputManager Not all outputs are “repeatable” (i.e. implicit confirms, gui, )… which ones exactly… ? only 3%, they are mostly [summarize] User-defined custom repeat agency 05-22-2002 RavenClaw: a new DM architecture 23 UDM: Help DTT adorned at start-up with a help agency Can capture and issue: Local help (obtained from focused agent) ExplainMore help (obtained from focused) What can I say ? Contextual help (obtained from main topic) Generic help (give_me_tips) Obtains Help prompts from the focused agent and the main topic (defaults provided) Default help agency can be overwritten by user 05-22-2002 RavenClaw: a new DM architecture 24 UDM: Suspend … Resume DTT adorned with a SuspendResume agency. Context reestablishment Automatically when focusing back after a subdialog Construct a model for that (given size of subdialog, time issues, etc) Prompts problem shifted to the NLG 05-22-2002 RavenClaw: a new DM architecture 25 UDM: Start over, Summarize Start over: DTT adorned with a Start-Over agency Summarize: DTT adorned with a Summarize agency prompt generated automatically problem shifted to NLG … 05-22-2002 RavenClaw: a new DM architecture 26 Timing & barge-in control Knowledge of barge-in location Information on what got conveyed is fed back to the DM Special agencies can take special action based on that (I.e. List Browsing) Can we determine what are non-barge-in-able utterances in a task-independent manner ? 05-22-2002 RavenClaw: a new DM architecture 27 Confirmation, Clarif., Disamb., Misunderstandings, Grounding… Largely unsolved: this is next ! 2 components: Confidence scores/computation on concepts Obtaining them Updating them Taking the “right” decision based on those scores: 05-22-2002 Insert appropriate agencies on the fly in the dialog task tree: opportunity for learning What’s the set of decisions / agencies ? How do you decide ? RavenClaw: a new DM architecture 28 Confidence scores Obtaining conf. Scores: from annotator Updating them, from different sources: (Un)Attacked implicit/explicit confirms Correction detector Elapsed time ? Domain knowledge Priors ? But how do you integrate all these in a principled way ? 05-22-2002 RavenClaw: a new DM architecture 29 Mechanisms DepartureCity = <Seattle,0.71><SF,0.29> Implicit / Explicit confirmations Clarifications Did you say you were leaving from Seattle ? Disambiguation When do you leave from Seattle ? So you’re leaving from Seattle… When ? I’m sorry was that Seattle or San Francisco? How do you decide which ? Learning ? 05-22-2002 RavenClaw: a new DM architecture 30 Software Engineering Provide a robust basis for future research. Modularity Separability between task and discourse Separability of concepts and confidence computations Portability Mutiple servers Aggressive, structured, timed logging 05-22-2002 RavenClaw: a new DM architecture 31 Conclusion New DM framework separation of dialog task from conversational mechanisms developer can focus only on dialog task conversational mechanisms generated automatically easier development/maintenance robust platform for future research Most of the implementation completed Symphony/LARRI reimplemented Next: back to misunderstandings ! 05-22-2002 RavenClaw: a new DM architecture 32