Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework

advertisement
Developing Spoken Dialogue Systems
in the
Communicator / RavenClaw Framework
Sphinx Lunch Talk
Carnegie Mellon University, October 2004
Presented by:
Special appearances:
Dan Bohus
Antoine Raux,
Jahanzeb Sherwani,
Thomas Harris
Examples

RoomLine
conference room reservations within SCS; system can access
schedules of 13 conf rooms in Wean-Hall and NSH

Let’s Go! Bus Information System
bus schedule information system for Port Authority buses in
Oakland and Squirrel Hill [Let’s Go! Project]

Sublime
personalized information management system

TeamTalk
an investigation into human and multi-robot spoken language
communication in unstructured environments
Examples

RoomLine
conference room reservations within SCS; system can access
schedules of 13 conf rooms in Wean-Hall and NSH

Let’s Go! Bus Information System
bus schedule information system for Port Authority buses in
Oakland and Squirrel Hill [Let’s Go! Project]

Sublime
personalized information management system

TeamTalk
an investigation into human and multi-robot spoken language
communication in unstructured environments
Examples

RoomLine
conference room reservations within SCS; system can access
schedules of 13 conf rooms in Wean-Hall and NSH

Let’s Go! Bus Information System
bus schedule information system for Port Authority buses in
Oakland and Squirrel Hill [Let’s Go! Project]

Sublime
personalized information management system

TeamTalk
an investigation into human and multi-robot spoken language
communication in unstructured environments
Examples

RoomLine
conference room reservations within SCS; system can access
schedules of 13 conf rooms in Wean-Hall and NSH

Let’s Go! Bus Information System
bus schedule information system for Port Authority buses in
Oakland and Squirrel Hill [Let’s Go! Project]

Sublime
personalized information management system

TeamTalk
an investigation into human and multi-robot spoken language
communication in unstructured environments
More Systems

LARRI
multimodal system that assists F/A-18 aircraft maintenance
personnel throughout the execution of procedural tasks
[Symphony]

Madeleine
text-based prototype for medical diagnosis system [MITRE
workshop]

Eureka
dialogue interface to the Vivisimo web search engine
The Communicator / RavenClaw
Spoken Dialogue Systems Framework






Examples
Overall Architecture
System Development
Components & Resources
Miscellaneous
Current Research
examples : architecture : development : components : miscellaneous : research
Overall Architecture

Classical pipeline architecture
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Synthesis
Lang. Generation
THETA
ROSETTA
Dialog Manag.
Back-end
RAVENCLAW
(various)
examples : architecture : development : components : miscellaneous : research
Galaxy HUB
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Galaxy
HUB
Synthesis
Lang. Generation
THETA
ROSETTA
- Generic centralized, messagepassing communication
architecture
- Developed at MIT, used in
Communicator program
- Competitor: OAA
Dialog Manag.
Back-end
RAVENCLAW
(various)
examples : architecture : development : components : miscellaneous : research
Getting Even Closer
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
HUB
Synthesis
Language Gen.
THETA
ROSETTA
Dialog Manag.
Back-end
RAVENCLAW
(perl)
examples : architecture : development : components : miscellaneous : research
Getting Even Closer
Multiple,
parallel
decoders
SPHINX
SPHINX
SPHINX
Inputs from other
modalities
Recognition
Lang.
Parsing
Understand.Confidence
Server
Text I/O
TTYServer
Other
domain agents
DateTime
PHOENIX
PHOENIX/HELIOS HELIOS
HUB
Synthesis
Lang. Generation
THETA
Galaxy
ROSETTA
Stub
Dialog Manag.
Back-end
RAVENCLAW
Galaxy
(perl)
Stub
Actual Perl
Back-end
Lang. Generation
ROSETTA (Perl)
PROCESS
MONITOR
examples : architecture : development : components : miscellaneous : research
The Communicator / RavenClaw
Spoken Dialogue Systems Framework





Examples
Overall Architecture
System Development
Components & Resources
Miscellaneous
examples : architecture : development : components : miscellaneous : research
Building a Spoken Dialogue System
Language,
Acoustic,
Lexical
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
(Limited
Domain)
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
So How Long Will It Take?
Language,
- MITRE Workshop
on Dialogue
Acoustic,
Lexical2003)
Management (Fall
Models
Grammar
- Develop a Text-based SDS for
medical diagnosis (provided
Lang. Understand.
backend) Recognition
SPHINX
PHOENIX/HELIOS
- Madeleine (22 hours)
RC Fixes
Templates
2h15,
11%
RavenClaw
4h, 19%
Design
4h, 18%
2h45,
13%
Synthesis
Backend
(Limited
Domain)
Voice
Back-end
RAVENCLAW
(perl)
RavenClaw
Lang. Generation
THETA
Dialog Manag.
3h20,
16%
ROSETTA
Dialog
Task
Specification
Templates
Grammar
Setup
1h10, 5%
3h45,
18%
examples : architecture : development : components : miscellaneous : research
Okay, How Long Will It Really Take?

To get a system running with a reasonable
performance [poll amongst 3 RavenClaw developers]
 1 month to get a working system up and running
 1 month to fine-tune performance

Further iterative improvements will continue
as more data accumulates
examples : architecture : development : components : miscellaneous : research
The Communicator / RavenClaw
Spoken Dialogue Systems Framework





Examples
Overall Architecture
System Development
Components & Resources
Miscellaneous
examples : architecture : development : components : miscellaneous : research
Components & Resources
Language,
Acoustic
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
Limited
Domain
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
Components & Resources
Language,
Acoustic
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
Limited
Domain
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
SPHINX II

Semi-continuous acoustic models
 Off-the-shelf 8kHz, 11.025kHz, 16kHz models
 Scripts for building your own


Language models
 2-gram & 3-gram model



PLSA adapted models perform better
CMU-Cambridge SLM Toolkit
Generate from Phoenix Grammar
 Finite state grammar
 Sphinx supports state-specific LMs
Dictionary (lexical models)
 CMU Dictionary
examples : architecture : development : components : miscellaneous : research
Sphinx II - continued

Multiple parallel decoders [e.g., male + female]
 Multiple hypothesis forwarded, selection done
later

Typical WER: 15-30%
 With pronounced differences native vs. non-native
 Lowered by retuning acoustic and language
models to the domain

Migration to SPHINX 3.x in the near future
 Expected: big improvement in WER
 Concern: real-time performance
Components & Resources
Language,
Acoustic
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
Limited
Domain
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
Phoenix Parser / Grammar
 Phoenix: Robust Parser
 CFG Grammar
 Manually-generated domain-

specific grammar rules
Reusable, generic sub-grammars

[Yes], [No], [Number], [DateTime],
[Help], [Repeat], [Suspend], etc…
DO YOU HAVE SOMETHING A BIT LARGER?
[NeedRoom] (
[_i_want] (DO YOU HAVE SOMETHING) )
[RoomSizeSpec] (
[room_size_spec] (
[rss_larger] (LARGER)))

Parses all incoming hypotheses
and passes all parses along…
[room_size_spec]
([rss_large])
([rss_small])
([rss_larger])
([rss_smaller])
([rss_smallest])
([rss_largest])
;
[rss_large]
(large)
(big)
(huge)
;
[rss_larger]
(*the larger)
(*the bigger)
(too small)
;
[rss_largest]
(*the largest)
(*the biggest)
;
[rss_small]
(small)
(little)
;
examples : architecture : development : components : miscellaneous : research
Helios / Confidence Annotation

Builds accurate confidence scores using
features from 3 sources of knowledge:
 Speech recognition
 Language understanding
 Dialogue management


Selects hypothesis with maximum confidence
score
Research in progress on hypothesisselection, and transferability across domains
examples : architecture : development : components : miscellaneous : research
Components & Resources
Language,
Acoustic
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
Limited
Domain
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
RavenClaw Architecture
 Captures all domain-specific dialog (task) logic using

a hierarchical description
The authoring effort is focused entirely here
Dialog Task (Specification)
Domain-independent Dialog Engine


Manages dialog by executing the dialog task
specification
Provides a large number of domain-independent
conversational strategies
examples : architecture : development : components : miscellaneous : research
RavenClaw Architecture
 Captures all domain-specific dialog (task) logic with a

hierarchical description
The authoring effort is focused entirely here
Dialog Task (Specification)
Domain-independent Dialog Engine


Manages dialog by executing the dialog task
specification
Provides a large number of domain-independent
conversational strategies
examples : architecture : development : components : miscellaneous : research
RavenClaw: Dialogue Task Specification
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
general_feeling
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
 Tree of dialog agents


Terminals: Inform, Request, Expect, Execute
Non-terminals / Dialog agency: plans execution of child nodes




Preconditions & effects
Success & failure criteria
Trigger (focus) criteria
Effects
 Basically a Hierarchical Task Execution Network; each agent:
examples : architecture : development : components : miscellaneous : research
general_feeling
GeneralFeel
Sample DTS Code
R:HowAreYou?
I:Glad
I:Sorry
// /Madeleine/GeneralFeel
DEFINE_AGENCY(CGeneralFeel,
DEFINE_CONCEPTS(
STRING_USER_CONCEPT(general_feeling, none))
DEFINE_SUBAGENTS(
SUBAGENT(HowAreYou, CHowAreYou)
SUBAGENT(Glad, CGlad)
SUBAGENT(Sorry, CSorry))
SUCCEEDS_WHEN(COMPLETED(Glad) || COMPLETED(Sorry)))
// /Madeleine/GeneralFeel/HowAreYou
DEFINE_REQUEST_AGENT(CHowAreYou,
REQUEST_CONCEPT(general_feeling)
GRAMMAR_MAPPING("![Yes]>good, ![FeelingGood]>good, "
"![FeelingSoSo]>soso, ![FeelingBad]>bad")))
// /Madeleine/GeneralFeel/Glad
DEFINE_INFORM_AGENT(CGlad,
PRECONDITION(C("general_feeling") == CString("good"))
PROMPT("inform glad_youre_good")
ON_COMPLETION(FINISH(/Madeleine)))
// /Madeleine/GeneralFeel/Sorry
DEFINE_INFORM_AGENT(CSorry,
PRECONDITION(C("general_feeling") != CString("good"))
PROMPT("inform sorry_youre_bad"))
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
general_feeling
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
general_feeling
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Madeleine
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
general_feeling
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Welcome
Madeleine
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
general_feeling
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated…
Madeleine
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
R:Headache
general_feeling
R:
R:
R:
headache
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated…
LoadSymptoms
Madeleine
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
R:Headache
general_feeling
R:
R:
R:
headache
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated…
Madeleine
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
R:Headache
general_feeling
R:
R:
R:
headache
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated…
GeneralFeel
Madeleine
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution / Input Pass
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
R:Headache
general_feeling
R:
R:
R:
headache
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
general_feeling: [good], [bad], [soso]
general_feeling: [good], [bad], [soso]
HowAreYou
GeneralFeel
Madeleine
general_feeling: [good], [bad], [soso]
have_fever: [fever]. ![yes], ![no]
headache: [headache], ![yes], ![no]
cough: [cough], ![yes], ![no]
…
…
Hi, this is Madeleine, the automated…
How are you feeling today?
Not so good, I think I have a fever
[soso](not so good)
[fever](I think I have a fever)
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
R:Headache
general_feeling
R:
R:
R:
headache
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated…
How are you feeling today?
Not so good, I think I have a fever
[soso](not so good)
[fever](I think I have a fever)
GeneralFeel
Madeleine
examples : architecture : development : components : miscellaneous : research
RavenClaw Execution
chart
Madeleine
I:Welcome
E:LoadSymptoms
GeneralFeel
R:HowAreYou?
I:Glad
diagnostic
Diagnose
I:Sorry
Fever
Travel
R:Headache
general_feeling
R:
R:
R:
headache
R:AskFever
E:MeasureTemp
I:InformFever
have_fever
Dialog Stack
Expectation Agenda
Hi, this is Madeleine, the automated…
How are you feeling today?
Not so good, I think I have a fever
Sorry
GeneralFeel
Madeleine
[soso](not so good)
[fever](I think I have a fever)
Oh, I’m sorry to hear that…
Let me take your temperature…
examples : architecture : development : components : miscellaneous : research
RavenClaw – Other features

Dialogue Engine transparently provides a set
of conversational skills
 Universal dialogue mechanisms:

Repeat, Suspend / Resume, Quit
 Help:

Help!, Where are we?, What can I say?
 Error handling:




Explicit and implicit confirmations
Strategies for recovering from non-understandings
Dynamic dialogue task generation
Dynamic dialogue control policy
Components & Resources
Language,
Acoustic
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
Limited
Domain
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
Backend & Domain Agents

Various problem-specific solutions
 RoomLine

Connects to a static Perl database or to the CMU
CorporateTime server;
 Let’s Go! Bus Information system

Connects to a PostGRES database
 Sublime


Connects to a MySQL database; also functions as a
web-server; DTW search domain agent
Basically, build your own; we provide a stub
for interfacing with the Galaxy-Hub
examples : architecture : development : components : miscellaneous : research
Components & Resources
Language,
Acoustic
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
Limited
Domain
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
Rosetta Language Generation

Template- and stochastic-based language generation
 Input: (act, object, {slot=value})
 Output: text (tagged with concepts)
# welcome to the system
“welcome” => “Welcome to RoomLine, the automated conference room “.
“reservation system.”,
# greet user
“greet_user” => (“Hi, <user_name>.”,
“Hi, <user_name>, good to hear from you again.”),
# inform the user that the system has misunderstood the times (order)
“wrong_time_order” => sub {
my %args = @_;
my $time_interval_as_string =
get_wrong_time_interval_as_string(\%args,
“room_query.date_time.time”);
my $answer = “I'm sorry, I must have misunderstood the “.
“time you needed the room. “;
$answer .= “I heard $time_interval_as_string. “;
return [“$answer So, let's see ... “,
“$answer So, let's try this again ... “,
“$answer So, let's try this once more ... “];
},
examples : architecture : development : components : miscellaneous : research
Components & Resources
Language,
Acoustic
Models
Grammar
Recognition
Lang. Understand.
SPHINX
PHOENIX/HELIOS
Dialog Manag.
Back-end
RAVENCLAW
(perl)
RavenClaw
Synthesis
Lang. Generation
THETA
ROSETTA
Limited
Domain
Voice
Dialog
Task
Specification
Templates
examples : architecture : development : components : miscellaneous : research
Synthesis

Cepstral Theta synthesis
 Open-domain unit-selection synthesis
 SSML tags
 [Currently working on barge-in location]

Festival synthesis
 Diphone synthesis; Open-domain, Limited-domain


unit-selection synthesis
SABLE tags
Server running separately on a Linux box
examples : architecture : development : components : miscellaneous : research
The Communicator / RavenClaw
Spoken Dialogue Systems Framework






Examples
Overall Architecture
System Development
Components & Resources
Miscellaneous
Current Research
examples : architecture : development : components : miscellaneous : research
Miscellaneous – Documentation


Transmitted largely by oral tradition :)
A bit of documentation available
 Research papers, slides
 WIKI: http://hap.speech.cs.cmu.edu/commwiki



mostly for developers, postings of updates, recent
developments;
hopefully more introductory materials soon.
More under work
 Tutorials: 2 available, but a bit outdated
examples : architecture : development : components : miscellaneous : research
Miscellaneous – Portability

Current systems work on PC Windows
platforms
 Galaxy has Linux version
 Components are C, C++, (Visual Studio 6.0,
Visual Studio.NET), Perl

How about using different input / output
components?
 Modify RavenClaw DMInterface class

Has been done for the Gemini parser / language
generator
examples : architecture : development : components : miscellaneous : research
Miscellaneous – Research Platform

Communicator / RavenClaw framework is a
research platform!
 Constantly evolving
 Modular

Easy to change, develop and test new technologies
 Research on variety of topics in a real-world, fullblown system:

Recognition, Language understanding, Dialogue
management, Language generation, Synthesis
 Your work can be evaluated / reused easily across
multiple existing systems
examples : architecture : development : components : miscellaneous : research
Miscellaneous - Download



www.cs.cmu.edu/~dbohus/RavenClaw
Download a version of RoomLine
An installation script can seed your own
project from this RoomLine version
examples : architecture : development : components : miscellaneous : research
Miscellaneous – RavenClaw Team

RavenClaw Team








Dan Bohus
Antoine Raux
Jahanzeb Sherwani
Thomas Harris
Satanjeev Banerjee
Brian Langner
(dbohus@cs)
(antoine@cs)
(jsherwan@cs)
(tkharris@cs)
(satanjeev@cs)
(blangner@cs)
More users / developers / documentation
writers are always welcome!!
Dialogs on Dialogs Reading Group
 www.cs.cmu.edu/~dod
examples : architecture : development : components : miscellaneous : research
The Communicator / RavenClaw
Spoken Dialogue Systems Framework






Examples
Overall Architecture
System Development
Components & Resources
Miscellaneous
Current Research
examples : architecture : development : components : miscellaneous : research
Error awareness and recovery


Problem: lack of robustness when faced with
understanding errors
Solution: build mechanisms for acting
robustly at the dialogue management level
 Error awareness

Building better confidence annotators, hypothesis
selection; transference across domains
 Error recovery strategies

Recovery from non-understandings
 Error handling decision process

Scalable, adaptable, task-independent architecture for
making error handling decisions
examples : architecture : development : components : miscellaneous : research
Let’s Go! Research

Speech Recognition: acoustic adaptation on
non-native speech
WER: 50%  30%

Speech Synthesis: flexible and natural F0
modeling (F0 unit selection)
Emphasis on erroneous/uncertain
words for utterance confirmation
examples : architecture : development : components : miscellaneous : research
Sublime


Interface for personalized information
management
Narrow functionality in unrestricted domains
 Currently, handle information without

understanding it
Eventually, learn relationships and a shallow
ontology
examples : architecture : development : components : miscellaneous : research
That’s all, folks!
THANK YOU!
Download