Spoken Dialogue Systems October 6, 2003

advertisement
SIG-AI Fall 2003
Spoken Dialogue
Systems
By: Sachin Kamboj
Spoken Dialogue Systems
October 6, 2003
Outline
Introduction to Spoken Dialogue Systems (SDS)
Applications of SDS
Components of SDS
Speech Recognition
Response Generation
Language Understanding
Speech Synthesis
Dialogue Manager
Domain Specific Components
Classification of SDS
On the basis of dialogue control
On the basis of initiative
On the basis of the verification strategy
Dialogue Manager Components
Challenges in the Design of an SDS
Slide 2
Spoken Dialogue Systems
October 6, 2003
Introduction
Any computer system that interacts with a human using natural
language.
Computer systems with which humans interact on a turn-by-turn
basis and in which spoken natural language plays an important
part in the communication. [Fraser 1997]
Spoken Dialogue Systems provide an interface between the user
and a computer-based application that permits spoken interaction
with the application in a relatively natural manner. [McTear 2002]
Slide 3
Spoken Dialogue Systems
October 6, 2003
Applications
Automated reservation systems
CU Communicator System
TOOT
Mercury Flight Reservation System
NL email interfaces
ELVIS (EmaiL Voice Interactive System)
MailSec
Planning & Problem Solving Systems
TRIPS & TRAINS
Circuit-Fix-It Shop System
Virtual Immersive Worlds (Steve)
Automated Banking Systems (Naunce)
Multimodal Information Systems (MATCH)
Slide 4
Spoken Dialogue Systems
October 6, 2003
Components
Speech Recognizer
Text-to-Speech
System
Language
Understanding
Response Generator
Dialogue Manager
Domain Specific
Components
Slide 5
Spoken Dialogue Systems
October 6, 2003
Speech Recognition
Involves the conversion of Spoken Sounds (user
utterances) to Text (a string of words)
Requires knowledge of Phonetics and Phonology
Basic Idea:
Ŵ = argmaxw P(O/W) P(W)
Challenges:
Variability in speech signal due to the language, speaker and
channel.
Handling continuous spontaneous speech.
Handling large vocabularies.
Providing a Speaker Independent Recognition System
Slide 6
Spoken Dialogue Systems
October 6, 2003
Language Understanding
Converts a sequence of words into a Semantic
Representation that can be used by the Dialogue
Manager.
Involves the use of Morphology, Syntax and Semantics.
Example:
I want to fly to California
want(speaker, fly(_x, California))
Need robust parsing mechanisms to account for errors
in speech recognition and ungrammatical utterances.
Slide 7
Spoken Dialogue Systems
October 6, 2003
Dialogue Manager
“Manages” all the aspects of the dialogue.
It takes a semantic representation of the user’s utterance,
figures out how the utterance fits in the overall context and
creates a semantic representation of the systems response.
Performs all of the following:
Interprets the user's utterance within the current context.
Deal with malformed or unrecognized utterances.
Create a user model.
Perform grounding so that the user and the system have a
common set of beliefs.
Manage initiative and system responses.
Handle issues of pragmatics in generation.
Slide 8
Spoken Dialogue Systems
October 6, 2003
Response Generation
Involves constructing the message that is to be spoken
to the user.
Requires the making of decision regarding:
What information should be included.
How the information should be structured.
The form of the message
The choice of words
The syntactic structure
Current systems use simple methods such as the
insertion of retrieved data into predefined slots in a
template.
Slide 9
Spoken Dialogue Systems
October 6, 2003
Speech Generation
Translates the message constructed by the response
generation component into spoken form.
Two approaches may be used:
Prerecorded canned speech may be used with spaces to be
filled by retrieved or previously recorded samples.
You have fifteen new emails.
Text-to-speech synthesis
Also known as concatenative speech synthesis.
Text-to-phoneme conversion. (spēch, d ī’əlộg’)
Phoneme-to-speech conversion.
Slide 10
Spoken Dialogue Systems
October 6, 2003
Domain Specific Components
The dialogue manager usually needs to interface with
some external software such as a database or an
expert system.
The query or plans thus have to be converted from the
internal representation used by the dialogue manager to
the format used by the external domain specific system
(e.g. SQL or STRIPS style goals).
This interfacing is handled by the domain specific
components.
Slide 11
Spoken Dialogue Systems
October 6, 2003
Classification of SDS
Based on the method used to control the dialogue with the user:
Finite state (or graph) based systems
Frame based systems
Agent based systems
Type of initiative
User Initiative
System Initiative
Mixed Initiative
Type of verification
Explicit Verification
Implicit Verification
Slide 12
Spoken Dialogue Systems
October 6, 2003
Finite State Based Systems
The user is taken through a dialogue consisting of a sequence of
predetermined steps or stages.
The dialogue flow is specified as a set of dialogue states with
transitions denoting various alternative paths through the
dialogue graph.
Get
Destination
Get Travel
Day
Verify
Destination
Verify
Travel Day
System: What is your destination?
User: London
System: Was that London?
User: Yes
System: What day do you want to travel?
User: Friday
System: Was that Sunday?
User: No
System: What day do you want to travel?
Slide 13
Spoken Dialogue Systems
October 6, 2003
Finite State Based System (2)
Advantages:
Simple to construct
The required vocabulary and grammar for each state can be
specified in advance
Results in more constrained speech recognition and language
understanding.
Disadvantages:
Inhibits the user’s ability to ask questions and take initiative.
Do not allow over-informative answers.
Dialogues are not actually natural.
Example: Nuance demo banking system.
Slide 14
Spoken Dialogue Systems
October 6, 2003
Frame Based System
User is asked questions that enable the system to fill
slots in a template in order to perform tasks.
Dialogue flow is not predetermined but depends on:
the contents of the user’s input
the information that the system has to elicit.
Destination City: London
System: What is your destination?
User: London
System: What day do you want to travel?
User: Friday
Departure Day:
Friday
Departure Date:
October 10 System: What is your destination?
Departure Time:
09 am
User: London on Friday, October 10
around 9 in the morning.
System: I have the following connection…
Slide 15
Spoken Dialogue Systems
October 6, 2003
Frame Based Systems (2)
Act like rule-based systems, taking a particular action based on
the current state of affairs.
Questions and other prompts that the system can ask should be listed along
with conditions that have to be true for that particular question.
Advantages:
User can provide over-informative answers.
Allows more natural dialogues.
Disadvantages:
Cannot handle complex dialogues.
Range of applications limited to systems that elicit information from users
and act on the basis of the same.
Example: Philips train timetable information system
Slide 16
Spoken Dialogue Systems
October 6, 2003
Agent Based Systems
Allow complex communication between the system, the user and
the underlying application in order to solve some problem or task.
Many variations depending on the application.
User: I’m looking for a job in the
Calais area. Are there any server?
System: No, there aren’t any
employment servers for Calais.
However, there is an employment
server for Pas-de-Calais and an
employment server for Lille. Are
you interested in one of these?
User: What time does the bank
open?
System: 9 am but they only accept
job applications at noon.
User: What time does the bank open?
System: 9 am but the guards come
around 8.
Slide 17
Spoken Dialogue Systems
October 6, 2003
Agent Based Systems (2)
Communication is viewed as interaction between two agents,
each of which is capable of reasoning about its own actions and
beliefs.
The dialogue model takes the preceding context into account
The dialogue evolves dynamically as a sequence of related steps that
build on top of each other.
Advantages:
Allow natural dialogue in complex domains.
Disadvantage:
Such agents are usually very complex.
Hard to build.
Slide 18
Spoken Dialogue Systems
October 6, 2003
Dialogue Manager Components
Dialogue Model: contains information about:
Whether the system or the user should take the initiative
Whether explicit or implicit confirmation should be used
The kind of speech acts that needs to be generated.
User Model: contain the systems beliefs about:
What the user knows
The user's expertise, experience and ability to understand the system's utterances.
Knowledge Base: contains information about the world and the domain.
Discourse Context: contains the dialogue history and current discourse.
Reference Resolver: performs reference resolution and handles ellipsis.
Plan Recognizer and Grounding Module:
Interprets the user's utterance given the current context
Reasons about the user's goals and beliefs.
Domain Reasoner/Planner: generates plans to achieve the shared goals.
Discourse Manager: manages the flow of information between all of the above
modules.
Slide 19
Spoken Dialogue Systems
October 6, 2003
Challenges in the Design of an SDS
Recovery from errors
Understanding pragmatically ill-formed utterances
Design of system prompts
Reference resolution
Understanding inter-sentential ellipsis
Plan recognition
Detection of conflicts
Performing grounding
And many more…
Slide 20
Spoken Dialogue Systems
October 6, 2003
Recovery From Errors
A SDS should be able to detect errors or misunderstandings and
recover from them.
Errors may be of the following types:
Uncertainties – speech recognition o/p has a low confidence score.
Inconsistencies – utterance conflicts with domain model/prev utterances
Ambiguities – more than one interpretation of a sentence
Luperfoy proposes a recovery strategy based on the following four
stage algorithm:
Detection
Diagnosis (Classification of the error)
Repair plan selection
Interactive plan execution
Slide 21
Spoken Dialogue Systems
October 6, 2003
Pragmatically Ill-formed Utterances
Listeners assume their beliefs of the world match the speaker’s
Hence, listeners interpret the utterances with respect to their beliefs
However, the speakers views of the world may differ from those of
the listener:
As a result, the speakers utterance may be syntactically and semantically
correct – yet violate the pragmatic rules.
Pragmatically Ill-formed utterances are of two types:
Extensional failures
How many women on the UD wrestling team are CIS majors?
Intensional failures
Which apartments are for sale?
What advanced placement courses did BOB take in high school?
What is Dr. Smith’s home address?
Slide 22
Spoken Dialogue Systems
October 6, 2003
Design of System Prompts
Prompt design is important for:
Natural flowing conversations
To overcome shortcomings in speech recognition technology
One of the most challenging aspects is implicitly letting the user know what
they can say. By not knowing:
Users can go beyond the functionality of the system
Not utilize the system as fully as they could
Prompt design is related to initiative
This is AZ Banking. How may I help you?
This is AZ banking. Say ‘check balance’ to check your balance, ‘pay bill’ to pay a bill
or ‘transfer funds’ to transfer funds…
Prompts should be more explicit in the case of recognition errors and less
explicit as the user shows greater familiarity with the system.
Slide 23
Spoken Dialogue Systems
October 6, 2003
Reference Resolution
Reference is the process by which speakers use
expressions like he and it to refer to entities salient in
the discourse.
Reference resolution is the process of determining the
referent entity of a referring expression.
For example:
John went to Bill’s car dealership to check out an Acura Integra. He
looked at it for about an hour.
Before he bought it, John checked over the Integra very carefully.
Slide 24
Spoken Dialogue Systems
October 6, 2003
Inter-sentential Ellipsis
Is the use of a syntactically incomplete sentence
fragment, along with the context in which the fragment
occurs, to communicate a complete thought and
accomplish a speech act.
Examples:
I want to cash this check. Small bills only please.
Speaker 1: Who are the candidates for the consultants?
Speaker 2: Mary Smith, Bob Jones and Ann Doe.
Speaker 1: Tom’s recommendations?
Slide 25
Spoken Dialogue Systems
October 6, 2003
References
Carberry, Sandra: “Plan Recognition in Natural
Language Dialogue”, ACL-MIT Press Series on Natural
Language Processing, MIT Press, 1990.
Slide 26
Spoken Dialogue Systems
October 6, 2003
Questions?
Slide 27
Download