A Framework For Developing Conversational User Interfaces

advertisement
A Framework For Developing
Conversational User Interfaces
James Glass, Eugene Weinstein, Scott Cyphers,
Joseph Polifroni
MIT Computer Science and Artificial Intelligence Laboratory
Cambridge, MA USA
Grace Chung
Corporation for National
Research Initiatives
Reston, VA USA
Mikio Nakano
NTT Corporation
Atsugi, Japan
Conversational User Interfaces
Speech
Human
Speech
Recognition
Synthesis
Computer
Text
Text
Understanding
Generation
Meaning
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Types of Conversational Interfaces
• Conversational systems differ in the degree with which
human or computer controls the conversation (initiative)
Computer
Initiative
Human
• Computer maintains
tight control
• Human is highly
restricted
• Human takes
complete control
• Computer is totally
passive
C: Please say the departure city.
Directed
Dialogue
H: I want to visit my grandmother.
Mixed Initiative
Dialogue
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Free Form
Dialogue
Computer Aided Design on User Interfaces – Jan 16th, 2004
Conversational Interfaces
• Can understand verbal input
– Speech recognition
– Language understanding
(in context)
• Can engage in
dialogue with a user
during the interaction
• Can verbalize response
– Language generation
– Speech synthesis
Language
Generation
Dialogue
Management
Speech
Synthesis
Audio
Back End
Speech
Recognition
Context
Resolution
Language
Understanding
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
The Problem With Conversational
Interfaces
• Advanced conversational systems are out there
– Both user and computer can take initiative
– Goal: conversational skill of system should approach that of
human operator
• But…
– These systems are built by experts
– Huge learning curve for novices, and
– Tremendous iterative effort required even from experts
• For this reason
– Most advanced conversational systems remain in research labs
* e.g. Jupiter weather info system (+1-888-573-TALK) : Zue
et al, IEEE Trans. SAP, 8(1), 2000
– However, we have seen limited commercial deployment
* e.g. AT&T’s “How May I Help You”, Gorin et al, Speech
Communication, 23, 1997
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Simplifying Conversational System
Creation
• Goal: make it easier for both expert and novice developers
to create conversational interfaces
– But still use advanced human language technologies
• Strategy: simplify configuration process
– Automatically configure technology components bases on examples
– Allow specification through web interface or unified configuration file
Web
Interface
Configuration
Engine
Configuration
File
Recognition
Dialogue
Management
Generation
Understanding
Context
Resolution
Synthesis
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Configuring a Conversational Interface:
Knowledge Representation
• First, define example sentences for in-domain actions
Action
identify
set
Examples
I would like to know today’s weather in Denver
What will the temperature be on Tuesday
Turn on the radio in the kitchen please
Can you turn the dining room lights off
• Then, define the important concepts present in the actions
(attributes):
– Concept values make up recognizer vocabulary!
– Examples of attributes automatically matched to attribute classes
Attribute
city
room
Values
Boston, Denver, San Francisco, …
living room, dining room, kitchen, …
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Starting with a Database Table
• Provide database table to configure speech interface:
Name
Phone
Email
Office
Jim Glass
x3-1640
glass@mit.edu
601
Scott Cyphers
x3-0248 cyphers@mit.edu
604
Eugene Weinstein X3-8569
ecoder@mit.edu
633
• Only some columns are used to access entries (e.g., Name)
– Values of those columns become values for domain concepts
– Default action sentences are automatically generated
• But, every table cell can potentially be an answer to a question
– All Names of columns become one concept – “property”
Attributes
Actions
name
Jim Glass, Scott Cyphers…
property
Name, Phone, Email, Office
request_property
What is the email for Jim Glass?
request_office
Where can I find Jim Glass?
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Dialogue Management
• Generic Dialogue Manager (Polifroni & Chung, ICSLP 2002)
Language
Generation
Hotels
Generic
Dialogue
Manager
Air Travel
Dialogue
Management
Sports
Speech
Synthesis
Weather
Audio
Back End
Speech
Recognition
Context
Resolution
Language
Understanding
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
• Plan system
responses
• Regularize
common
concepts
• Summarize
database results
Computer Aided Design on User Interfaces – Jan 16th, 2004
Context Resolution
Input Query
“Show me restaurants in Cambridge.”
Resolve Deixis
“What does this one serve?”
Resolve Pronouns
“What is their phone number?”
Inherit Predicates
“Are there any on Main Street?”
Incorporate Fragments
“What about Massachusetts Ave?”
Fill in Default Values
“Give me directions from MIT.”
Query Interpreted in Context
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Human Language Technology Details
• Approach: Use same technologies as deployed in our
mainstream, more complex systems
• Speech Recognizer (Glass, Computer, Speech, and Language,
2003)
– Trained on 100+ hours of mostly telephone speech
– Word pronunciations supplied by large dictionary, generated by rule, or
provided by developer
• Natural Language Understanding: (Seneff, Computational
Linguistics, 1992)
– Hierarchical sentence grammar used to parse sentence hypothesis
– Back off to concept spotting when no full parse is made
• Language Generation: (Baptist&Seneff, ICSLP 2000)
– Used in: SQL (DB Query) generation, paraphrasing & URL-encoding
meaning representation, responses
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Web-based Interface
Defining Actions and
Concepts (Attributes)
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Web-based Interface: Viewing Sentences
Examining how sentences are reduced
to an action and a set of attribute-value pairs
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Web-based Interface: Response
Generation
Domain independent system prompts
Customizing system
responses
Domain specific system prompts
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Web-based Interface: Editing Pronunciations
Modifying system generated
pronunciations for the vocabulary
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Web-based Interface: Context Resolution
Context Resolution configured through
Masking and Inheritance of concepts
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Voice Configuration File: An Alternative to
the Web Interface
• Entire domain can be specified in single configuration file
– Allows for automated generation of conversational systems
<actions>
<request_name> = i would like a restaurant
| can you (show|give) me a Chinese restaurant in Arlington;
</actions>
<attributes>
<cuisine> = Chinese|Taiwanese;
<city> = Washington | Boston | Arlington;
</attributes>
<discourse>
name masks(city cuisine neighborhood);
</discourse>
<constraints>
<request_name> (city|neighborhood) {prompt_for_city};
</constraints>
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Deployment
• SpeechBuilder functional for the past three years
• Some example domains:
– Office appliance control
– Laboratory directory (auto-attendant)
– Restaurant query system
• Has been used by MIT researchers (experts) as well as novice
developers at our sponsor companies
– Used in technology transfer workshop for pervasive computing project
(Oxygen)
• SpeechBuilder has been used as an educational tool
– Computational linguistics class at Georgetown University
– Summer class at Johns Hopkins University
– Youngest SpeechBuilder developer: 9 years old
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Japanese SpeechBuilder
• Created in collaboration with NTT
• Challenge: Segmentation (no spaces between words)
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Example Domain
• A hotel application using the generic dialogue manager
– Compiled via SpeechBuilder using constraints shown previously
– Other generic functionality is automatically included
• Illustrated technical issues:
–
–
–
–
–
–
–
–
Soliciting necessary information from user
Interpreting fragments correctly in context
Canonicalizing relative dates
Ordering and summarizing results of query to content provider
Resolving superlatives/updating discourse context
Interpreting pronouns in context
Returning and speaking specific properties
Repeating previous replies
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Another Example Domain: Object
Manipulation System
• Stock SpeechBuilder domain for spoken dialogue
• Custom back-end connected to stereo camera and person
tracking algorithm (Demirdjian, WOMOT 2003)
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Ongoing and Future Work
• Incorporate speech synthesis
– Allow use of concatenative speech synthesizer (Yi et al, ICSLP 2000) in
SpeechBuilder
• Allow use of multiple modalities
– Provide functionality to incorporate multimodal input into systems
• Improve dialogue management tools and modules
– Improve ability of SpeechBuilder systems to use more sophisticated
dialogue strategies
– Provide additional generic semantic concepts for use in domains
• Allow system refinement by unsupervised learning
– Use confidence scores to improve domain language model
(Nakano&Hazen, Eurospeech 2003)
• Allow system modification in real-time
– Need ability to re-train recognizer during runtime (Schalkwyk et al,
Eurospeech 2003)
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Thank You! For more information:
• http://www.sls.csail.mit.edu/
• Email us! ecoder@mit.edu
• Jupiter weather Information system:
º +1-617-258-0300 (outside USA)
º 1-888-573-TALK (USA toll-free)
• Mercury flight information system:
º +1-617-258-6040 (outside USA)
º 1-877-MIT-TALK (USA toll-free)
• Pegasus flight status system:
º +1-617-258-0301 (outside USA)
º 1-877-LCS-TALK (USA toll-free)
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
THE END
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Utility for rapid prototypin
speech-based interfaces
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
– Used to create demonstra
for NTT CS Labs open hou
– Prototypes were develope
with a few days of effort
– Computer
Three
papers submitted fo
Aided Design on User Interfaces – Jan 16 , 2004
th
Human Language
Technologies
• Only some columns are used to access entries (e.g., Name)
– Values of those columns become values for domain concepts
– Default action sentences are automatically generated
• But, every table cell can potentially be an answer to a question
– Names of non-access columns become a concept
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
To Configure Response Generation…
• For each concept present in the domain, define how queries
about that concept should be answered
<telephone> = “The telephone for :name is
:phone”
• Define some prompts for generic events, e.g. welcome and
goodbye
<welcome> = “Welcome to the auto-attendant”
<no_data> = “Sorry, there was no data
matching your request.”
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Conversational User Interfaces: Input Side
Human Language
Technologies
Speech
Recognition
“Find me a flight to
Boston on Tuesday”
Text
Understanding
Meaning
“Back-end”
Technologies
action=flights
to_city=Boston
day=Tuesday
Action
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
DB
Computer Aided Design on User Interfaces – Jan 16th, 2004
Conversational User Interfaces: Output Side
Human Language
Technologies
Speech
Synthesis
Delta flight, number fifty five
from La Guardia to Boston…
Text
Generation
flight_num=55
airline=Delta
origin=LGA
dest=BOS
Action
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Meaning
DB
Computer Aided Design on User Interfaces – Jan 16th, 2004
Conversational User Interfaces:
The Whole Picture
Speech
Speech
Recognition
Synthesis
Text
Text
Understanding
Generation
Meaning
Meaning
Action
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
The Missing Pieces: Context and Dialogue
• Context
Resolution:
action=flights
to_city=Boston
day=Tuesday
+
Last time,
the user asked
for a flight
from LGA
action=flights
origin=BOS
dest=LGA
day=Tuesday
=
• Dialogue Management:
action=flights
to_city=Boston
day=Tuesday
+
=
“Which city
would you like
to fly from?”
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Conversational User Interfaces:
The Whole Picture
Speech
Speech
Recognition
Synthesis
Text
Text
Understanding
Generation
Meaning
Meaning
Context Resolution,
Dialogue Management
Action
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
The Problem With Conversational
Interfaces…
• Complex conversational systems are out there
– Both user and computer can take initiative
– Goal: conversational skill of system should approach that of
human operator
• But…
– These systems are built by experts
– Huge learning curve for novices, and
– Tremendous iterative effort required even from experts
• For this reason
– Most advanced conversational systems remain in research labs
* e.g. Jupiter weather info system (+1-888-573-TALK) : Zue
et al, IEEE Trans. SAP, 8(1), 2000
– However, we have seen limited commercial deployment
* e.g. AT&T’s “How May I Help You”, Gorin et al, Speech
Communication, 23, 1997
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Configuring Response Generation…
• For each concept present in the domain, define how queries
about that concept should be answered
• Configure some generic prompts for summarizing long results
• Define some prompts for generic events, e.g. welcome
Property/
Response
Condition
The phone number for :restaurant_name is
phone
:phone
cuisine
:restaurant_name serves :cuisine cuisine
Welcome
No matches
Many matches
item (what to return
when summarizing)
Welcome to the restaurants domain
I’m sorry, I couldn’t find any restaurants
matching your request
I found five restauraunts :items
:restaurant_name
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Configuring Context Resolution
• Context Resolution (discourse) configured through Masking and
Inhertiance of concepts
• Inheritance configures how actions remember concepts, e.g.:
–
–
–
–
–
User: “What is the phone number for Jim Glass”
System: “Jim Glass’ phone number is 3-1640
User: “What about his email address?”
System: “Jim Glass’ email address is glass@mit.edu”
Name is inherited
Name concept is inherited
• Masking configures how certain concepts block other concepts,
even in the presence of inheritance, e.g.
–
–
–
–
–
User: “Do you have any restaurants in Boston?”
System: “In Boston, I have the following…”
City is masked
User: “What about in Times Square?”
System: “In Times Square, New York, I have…”
City concept is masked by Neighborhood concept
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Voice Configuration File
• Developers can also use Voice Configuration (VCFG) file
format to configure SpechBuilder domains:
<actions>
<request_name> = i would like a restaurant
| can you (show|give) me a Chinese restaurant in Arlington;
</actions>
<attributes>
<cuisine> = Chinese|Taiwanese;
<city> = Washington | Boston | Arlington;
</attributes>
<discourse>
name masks(city cuisine neighborhood);
</discourse>
<constraints>
<request_name> (city|neighborhood) {prompt_for_city};
</constraints>
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Dialogue Management
• Generic Dialogue Manager (Polifroni & Chung, ICSLP 2002)
Hotels
Language
Generation
Speech
Synthesis
Air Travel
Dialogue
Management
Sports
Database
Audio
Context
Resolution
Speech
Recognition
Generic
Dialogue
Manager
Language
Understanding
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Weather
• Plan system
responses
• Regularize
common
concepts
• Summarize
database results
Computer Aided Design on User Interfaces – Jan 16th, 2004
Deployment
• SpeechBuilder functional for the past three years
• Some example domains:
– Office appliance control
– Laboratory directory (auto-attendant)
– Restaurant query system
• Has been used by MIT researchers (experts) as well as novice
developers at our partner companies
• SpeechBuilder has been used by students in
– Computational linguistics class at Georgetown University
– Summer class at Johns Hopkins University
– Technology transfer workshop for pervasive computing project (Oxygen)
• In collaboration with NTT, we have developed a Japanese
version of SpeechBuilder. Japanese domains:
– Bus timetable system
– Weather information system
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Configuring a Speech Interface with
SpeechBuilder: Knowledge Representation
• First define some concepts present in the domain (attributes):
– Concept values make up recognizer vocabulary!
Attribute
city
room
Values
Boston, Denver, San Francisco, …
living room, dining room, kitchen, …
• Then, define examples of things to do with the concepts
(actions)
– Examples of attributes automatically matched to attribute classes
Action
identify
set
Examples
I would like to know today’s weather in Denver
What will the temperature be on Tuesday
Turn on the radio in the kitchen please
Can you turn the dining room lights off
Eugene Weinstein — MIT Computer Science and Artificial Intelligence Laboratory
Computer Aided Design on User Interfaces – Jan 16th, 2004
Download