RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda Dan Bohus Alex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213 ø. Abstract RavenClaw is a new dialog management framework developed as the successor to the Agenda architecture used in the CMU Communicator. RavenClaw introduces a clear separation between the specification of task and discourse behaviors, and allows rapid development of dialog management components for spoken dialog systems operating in complex, taskoriented domains. The new system development effort is focused entirely on the specification of the dialog task, while a rich set of domain-independent conversational behaviors are transparently generated by the dialog engine. To date, RavenClaw has been applied to five different domains allowing us to draw some preliminary conclusions as to the generality of the approach. We briefly describe our experience in developing these systems. 1. 2. Overall design Goals RavenClaw = framework aimed at the rapid development of dialog managers for complex, task-oriented dialog domains Handle a variety of complex domains Easy to develop and maintain systems Developer focuses only on specifying the dialog task Dialog engine handles the rest automatically Architecture supports: Learning (both task and discourse levels) Dynamic generation of dialog tasks Grounding mechanisms RavenClaw is a 2-tier architecture (see below) Dialog Task Specification Layer Captures all the domain-specific dialog (task) logic The system development effort is entirely focused here Domain-independent Dialog Engine Manages dialog by executing the Dialog Task Specification Provides domain-independent conversational strategies Fig Key architectural details Dialog Task Specification (sample) DEFINE_AGENCY(CLogin, IS_MAIN_TOPIC() DEFINE_SUBAGENTS( SUBAGENT(Welcome, CWelcome) SUBAGENT(AskRegistered, CAskRegistered) SUBAGENT(AskName, CAskName) SUBAGENT(GreetUser, CGreetUser) ) DEFINE_CONCEPTS( STRING_USER_CONCEPT(user_name) BOOL_USER_CONCEPT(registered) ) SUCCEEDS_WHEN(COMPLETED(GreetUser)) PROMPT_ESTABLISH_CONTEXT(“establish_context login”) ) RoomLine DEFINE_INFORM_AGENT(CWelcome, PROMPT(“:non-interruptable inform welcome”) ) user_name registered DEFINE_REQUEST_AGENT(CAskRegistered, REQUEST_CONCEPT(registered) GRAMMAR_MAPPING(“[Yes]>true, [No]>false”) ) DEFINE_REQUEST_AGENT(CAskName, PRECONDITION(IS_TRUE(registered)) REQUEST_CONCEPT(user_name) MAX_ATTEMPTS(2) GRAMMAR_MAPPING(“[UserName]”) ) Suspend query Login Welcome ... GreetUser DateTime results GetQuery GetResults Location Properties DiscussResults Rich concept representation AskName Network Projector Whiteboard Joe Down / 0.33 AskRegistered John Doe / 0.46 Set of confidence / value pairs History of previous values Flags indicating grounding, availability, conveyance status, etc Dialog Task Specification Dialog Engine Dialog Stack / Agents Execution 1 2 User Input: 3 Welcome RoomLine Expectation Agenda Login Login RoomLine RoomLine 4 5 AskRegistered Login Login RoomLine RoomLine registered: [No] → false, [Yes] → true System: Are you a registered user? registered: [No] → false, [Yes] → true user_name: [UserName] User: Yes, this is John Doe Parse: [Yes](yes / 0.87) [UserName](john doe / 0.46) registered: [No] → false, [Yes] → true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network] query.projector: [Projector] query.whiteboard: [Whiteboard] 2. Conversational behaviors The Dialog Task Specification The Dialog Engine automatically provides a basic set of domain-independent conversational behaviors Generics The Dialog Task Specification = tree of dialog agents, with each agent handling the corresponding part of the dialog task Advantages of hierarchical representation: Dialog task structure naturally lends itself to hierarchical description Ease of maintenance and design; good scalability Implicitly captures context in dialog Generic dialog mechanisms Help, Repeat, Suspend, Start over, etc Turn-taking behavior Grounding behaviors Explicit and implicit verifications, disambiguations, context reestablishment, etc 4. Dialog Task Agents LARRI [Symphony Project, CMU] Fundamental Dialog Agents (on leaves) Inform – sends an output Request – requests and listens for information Expect – expects (listens for) information DomainOperation – performs domain operations (i.e. back-end calls, etc) A multi-modal conversational agent that provides support for F/A-18 aircraft mechanics performing maintenance tasks: Guidance & information browsing domain Tree-based decomposition very well suited in this domain; portions of the dialog task tree are generated dynamically based on the task to be performed Dialog Agencies (non-terminal nodes) Control the execution of the subsumed agents Intelligent Procedure Assistant [NASA Ames] Agent properties / functionalities: Execute routine Preconditions and triggers Completion criteria (successful / unsuccessful) Effects Hold concepts 3. RavenClaw-based systems Multi-modal system that provides assistance to astronauts on the International Space Station in the execution of procedural tasks and checklists: Guidance & information browsing domain RavenClaw interfaced in Open Agent Architecture (with Gemini inputs / output) The Dialog Engine BusLine [Let’s Go! Project, CMU] Domain-independent component that executes the Dialog Task Specification Dialog flow is generated by alternating Execution Phases and Input Phases Information search interface to Pittsburgh bus schedules: Information exploration domain Static dialog task tree Execution Phase RoomLine [CMU] Assistance for conference room reservation and scheduling within the School of Computer Science at CMU: Information management domain Static dialog task tree The dialog agents in the task tree are executed and generate the system’s behavior. Dialog engine uses a stack structure to execute the agents in the task tree: Repeatedly execute agent on top of the stack When agencies execute, they plan one of their subsumed agents for execution (according to preconditions and policies) Completed agents are removed from the stack Request-type fundamental agents can interrupt an Execution Phase and solicit an Input Phase (3-Stage) Input Phase 1. Assemble an Expectation Agenda Expectation Agenda models the system’s input expectation at that point in time 2. Bind values from input to concepts Inputs are matched to system expectations 3. Analyze focus shifts Establish if the focus of the conversation should be shifted in light of the recent input … then, continue with another Execution Phase. TeamTalk [11-741, CMU] Spoken command and control for a team of robots: Command and control domain Challenges: multi-way conversations, (complex) asynchronous behaviors Static dialog task tree 5. Conclusions RavenClaw = Dialog Management framework which focuses system development effort on creating a description of the underlying dialog task Dialog Engine drives the dialog towards its goals, and uses generic conversational strategies to maintain dialog flow and coherence 5 systems built to date spanning various domains and task complexities RavenClaw adapted easily, indicating high versatility and good scalability properties School of Computer Science, Carnegie Mellon University, 2003, Pittsburgh, PA, 15213.