Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by: Special appearances: Dan Bohus Antoine Raux, Jahanzeb Sherwani, Thomas Harris Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments More Systems LARRI multimodal system that assists F/A-18 aircraft maintenance personnel throughout the execution of procedural tasks [Symphony] Madeleine text-based prototype for medical diagnosis system [MITRE workshop] Eureka dialogue interface to the Vivisimo web search engine The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research Overall Architecture Classical pipeline architecture Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Synthesis Lang. Generation THETA ROSETTA Dialog Manag. Back-end RAVENCLAW (various) examples : architecture : development : components : miscellaneous : research Galaxy HUB Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Galaxy HUB Synthesis Lang. Generation THETA ROSETTA - Generic centralized, messagepassing communication architecture - Developed at MIT, used in Communicator program - Competitor: OAA Dialog Manag. Back-end RAVENCLAW (various) examples : architecture : development : components : miscellaneous : research Getting Even Closer Recognition Lang. Understand. SPHINX PHOENIX/HELIOS HUB Synthesis Language Gen. THETA ROSETTA Dialog Manag. Back-end RAVENCLAW (perl) examples : architecture : development : components : miscellaneous : research Getting Even Closer Multiple, parallel decoders SPHINX SPHINX SPHINX Inputs from other modalities Recognition Lang. Parsing Understand.Confidence Server Text I/O TTYServer Other domain agents DateTime PHOENIX PHOENIX/HELIOS HELIOS HUB Synthesis Lang. Generation THETA Galaxy ROSETTA Stub Dialog Manag. Back-end RAVENCLAW Galaxy (perl) Stub Actual Perl Back-end Lang. Generation ROSETTA (Perl) PROCESS MONITOR examples : architecture : development : components : miscellaneous : research The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous examples : architecture : development : components : miscellaneous : research Building a Spoken Dialogue System Language, Acoustic, Lexical Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA (Limited Domain) Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research So How Long Will It Take? Language, - MITRE Workshop on Dialogue Acoustic, Lexical2003) Management (Fall Models Grammar - Develop a Text-based SDS for medical diagnosis (provided Lang. Understand. backend) Recognition SPHINX PHOENIX/HELIOS - Madeleine (22 hours) RC Fixes Templates 2h15, 11% RavenClaw 4h, 19% Design 4h, 18% 2h45, 13% Synthesis Backend (Limited Domain) Voice Back-end RAVENCLAW (perl) RavenClaw Lang. Generation THETA Dialog Manag. 3h20, 16% ROSETTA Dialog Task Specification Templates Grammar Setup 1h10, 5% 3h45, 18% examples : architecture : development : components : miscellaneous : research Okay, How Long Will It Really Take? To get a system running with a reasonable performance [poll amongst 3 RavenClaw developers] 1 month to get a working system up and running 1 month to fine-tune performance Further iterative improvements will continue as more data accumulates examples : architecture : development : components : miscellaneous : research The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous examples : architecture : development : components : miscellaneous : research Components & Resources Language, Acoustic Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA Limited Domain Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research Components & Resources Language, Acoustic Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA Limited Domain Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research SPHINX II Semi-continuous acoustic models Off-the-shelf 8kHz, 11.025kHz, 16kHz models Scripts for building your own Language models 2-gram & 3-gram model PLSA adapted models perform better CMU-Cambridge SLM Toolkit Generate from Phoenix Grammar Finite state grammar Sphinx supports state-specific LMs Dictionary (lexical models) CMU Dictionary examples : architecture : development : components : miscellaneous : research Sphinx II - continued Multiple parallel decoders [e.g., male + female] Multiple hypothesis forwarded, selection done later Typical WER: 15-30% With pronounced differences native vs. non-native Lowered by retuning acoustic and language models to the domain Migration to SPHINX 3.x in the near future Expected: big improvement in WER Concern: real-time performance Components & Resources Language, Acoustic Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA Limited Domain Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research Phoenix Parser / Grammar Phoenix: Robust Parser CFG Grammar Manually-generated domain- specific grammar rules Reusable, generic sub-grammars [Yes], [No], [Number], [DateTime], [Help], [Repeat], [Suspend], etc… DO YOU HAVE SOMETHING A BIT LARGER? [NeedRoom] ( [_i_want] (DO YOU HAVE SOMETHING) ) [RoomSizeSpec] ( [room_size_spec] ( [rss_larger] (LARGER))) Parses all incoming hypotheses and passes all parses along… [room_size_spec] ([rss_large]) ([rss_small]) ([rss_larger]) ([rss_smaller]) ([rss_smallest]) ([rss_largest]) ; [rss_large] (large) (big) (huge) ; [rss_larger] (*the larger) (*the bigger) (too small) ; [rss_largest] (*the largest) (*the biggest) ; [rss_small] (small) (little) ; examples : architecture : development : components : miscellaneous : research Helios / Confidence Annotation Builds accurate confidence scores using features from 3 sources of knowledge: Speech recognition Language understanding Dialogue management Selects hypothesis with maximum confidence score Research in progress on hypothesisselection, and transferability across domains examples : architecture : development : components : miscellaneous : research Components & Resources Language, Acoustic Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA Limited Domain Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research RavenClaw Architecture Captures all domain-specific dialog (task) logic using a hierarchical description The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine Manages dialog by executing the dialog task specification Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research RavenClaw Architecture Captures all domain-specific dialog (task) logic with a hierarchical description The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine Manages dialog by executing the dialog task specification Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research RavenClaw: Dialogue Task Specification Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Tree of dialog agents Terminals: Inform, Request, Expect, Execute Non-terminals / Dialog agency: plans execution of child nodes Preconditions & effects Success & failure criteria Trigger (focus) criteria Effects Basically a Hierarchical Task Execution Network; each agent: examples : architecture : development : components : miscellaneous : research general_feeling GeneralFeel Sample DTS Code R:HowAreYou? I:Glad I:Sorry // /Madeleine/GeneralFeel DEFINE_AGENCY(CGeneralFeel, DEFINE_CONCEPTS( STRING_USER_CONCEPT(general_feeling, none)) DEFINE_SUBAGENTS( SUBAGENT(HowAreYou, CHowAreYou) SUBAGENT(Glad, CGlad) SUBAGENT(Sorry, CSorry)) SUCCEEDS_WHEN(COMPLETED(Glad) || COMPLETED(Sorry))) // /Madeleine/GeneralFeel/HowAreYou DEFINE_REQUEST_AGENT(CHowAreYou, REQUEST_CONCEPT(general_feeling) GRAMMAR_MAPPING("![Yes]>good, ![FeelingGood]>good, " "![FeelingSoSo]>soso, ![FeelingBad]>bad"))) // /Madeleine/GeneralFeel/Glad DEFINE_INFORM_AGENT(CGlad, PRECONDITION(C("general_feeling") == CString("good")) PROMPT("inform glad_youre_good") ON_COMPLETION(FINISH(/Madeleine))) // /Madeleine/GeneralFeel/Sorry DEFINE_INFORM_AGENT(CSorry, PRECONDITION(C("general_feeling") != CString("good")) PROMPT("inform sorry_youre_bad")) examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Madeleine examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Welcome Madeleine examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel R:Headache general_feeling R: R: R: headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… LoadSymptoms Madeleine examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel R:Headache general_feeling R: R: R: headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel R:Headache general_feeling R: R: R: headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research RavenClaw Execution / Input Pass chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel R:Headache general_feeling R: R: R: headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda general_feeling: [good], [bad], [soso] general_feeling: [good], [bad], [soso] HowAreYou GeneralFeel Madeleine general_feeling: [good], [bad], [soso] have_fever: [fever]. ![yes], ![no] headache: [headache], ![yes], ![no] cough: [cough], ![yes], ![no] … … Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel R:Headache general_feeling R: R: R: headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research RavenClaw Execution chart Madeleine I:Welcome E:LoadSymptoms GeneralFeel R:HowAreYou? I:Glad diagnostic Diagnose I:Sorry Fever Travel R:Headache general_feeling R: R: R: headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever Sorry GeneralFeel Madeleine [soso](not so good) [fever](I think I have a fever) Oh, I’m sorry to hear that… Let me take your temperature… examples : architecture : development : components : miscellaneous : research RavenClaw – Other features Dialogue Engine transparently provides a set of conversational skills Universal dialogue mechanisms: Repeat, Suspend / Resume, Quit Help: Help!, Where are we?, What can I say? Error handling: Explicit and implicit confirmations Strategies for recovering from non-understandings Dynamic dialogue task generation Dynamic dialogue control policy Components & Resources Language, Acoustic Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA Limited Domain Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research Backend & Domain Agents Various problem-specific solutions RoomLine Connects to a static Perl database or to the CMU CorporateTime server; Let’s Go! Bus Information system Connects to a PostGRES database Sublime Connects to a MySQL database; also functions as a web-server; DTW search domain agent Basically, build your own; we provide a stub for interfacing with the Galaxy-Hub examples : architecture : development : components : miscellaneous : research Components & Resources Language, Acoustic Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA Limited Domain Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research Rosetta Language Generation Template- and stochastic-based language generation Input: (act, object, {slot=value}) Output: text (tagged with concepts) # welcome to the system “welcome” => “Welcome to RoomLine, the automated conference room “. “reservation system.”, # greet user “greet_user” => (“Hi, <user_name>.”, “Hi, <user_name>, good to hear from you again.”), # inform the user that the system has misunderstood the times (order) “wrong_time_order” => sub { my %args = @_; my $time_interval_as_string = get_wrong_time_interval_as_string(\%args, “room_query.date_time.time”); my $answer = “I'm sorry, I must have misunderstood the “. “time you needed the room. “; $answer .= “I heard $time_interval_as_string. “; return [“$answer So, let's see ... “, “$answer So, let's try this again ... “, “$answer So, let's try this once more ... “]; }, examples : architecture : development : components : miscellaneous : research Components & Resources Language, Acoustic Models Grammar Recognition Lang. Understand. SPHINX PHOENIX/HELIOS Dialog Manag. Back-end RAVENCLAW (perl) RavenClaw Synthesis Lang. Generation THETA ROSETTA Limited Domain Voice Dialog Task Specification Templates examples : architecture : development : components : miscellaneous : research Synthesis Cepstral Theta synthesis Open-domain unit-selection synthesis SSML tags [Currently working on barge-in location] Festival synthesis Diphone synthesis; Open-domain, Limited-domain unit-selection synthesis SABLE tags Server running separately on a Linux box examples : architecture : development : components : miscellaneous : research The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research Miscellaneous – Documentation Transmitted largely by oral tradition :) A bit of documentation available Research papers, slides WIKI: http://hap.speech.cs.cmu.edu/commwiki mostly for developers, postings of updates, recent developments; hopefully more introductory materials soon. More under work Tutorials: 2 available, but a bit outdated examples : architecture : development : components : miscellaneous : research Miscellaneous – Portability Current systems work on PC Windows platforms Galaxy has Linux version Components are C, C++, (Visual Studio 6.0, Visual Studio.NET), Perl How about using different input / output components? Modify RavenClaw DMInterface class Has been done for the Gemini parser / language generator examples : architecture : development : components : miscellaneous : research Miscellaneous – Research Platform Communicator / RavenClaw framework is a research platform! Constantly evolving Modular Easy to change, develop and test new technologies Research on variety of topics in a real-world, fullblown system: Recognition, Language understanding, Dialogue management, Language generation, Synthesis Your work can be evaluated / reused easily across multiple existing systems examples : architecture : development : components : miscellaneous : research Miscellaneous - Download www.cs.cmu.edu/~dbohus/RavenClaw Download a version of RoomLine An installation script can seed your own project from this RoomLine version examples : architecture : development : components : miscellaneous : research Miscellaneous – RavenClaw Team RavenClaw Team Dan Bohus Antoine Raux Jahanzeb Sherwani Thomas Harris Satanjeev Banerjee Brian Langner (dbohus@cs) (antoine@cs) (jsherwan@cs) (tkharris@cs) (satanjeev@cs) (blangner@cs) More users / developers / documentation writers are always welcome!! Dialogs on Dialogs Reading Group www.cs.cmu.edu/~dod examples : architecture : development : components : miscellaneous : research The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research Error awareness and recovery Problem: lack of robustness when faced with understanding errors Solution: build mechanisms for acting robustly at the dialogue management level Error awareness Building better confidence annotators, hypothesis selection; transference across domains Error recovery strategies Recovery from non-understandings Error handling decision process Scalable, adaptable, task-independent architecture for making error handling decisions examples : architecture : development : components : miscellaneous : research Let’s Go! Research Speech Recognition: acoustic adaptation on non-native speech WER: 50% 30% Speech Synthesis: flexible and natural F0 modeling (F0 unit selection) Emphasis on erroneous/uncertain words for utterance confirmation examples : architecture : development : components : miscellaneous : research Sublime Interface for personalized information management Narrow functionality in unrestricted domains Currently, handle information without understanding it Eventually, learn relationships and a shallow ontology examples : architecture : development : components : miscellaneous : research That’s all, folks! THANK YOU!