Flexible Dialog Management for In-vehicle Dialog Systems Jeongwoo Ko

advertisement
Language Technologies Institute
Flexible Dialog Management
for In-vehicle Dialog Systems
Jeongwoo Ko
jko@cs.cmu.edu
Language Technologies Institute
Outline



Introduction
Approach in CAMMIA
Dialog Management Framework






System Architecture
ScenarioTemplate/ScenarioXML
ScenarioXML Development Kit
Pilot Systems & Experiments
Current Research Focus
Future Work
2
Language Technologies Institute
Introduction of CAMMIA Project
(Conversational Agent for Multilingual Mobile Information Access)
3
Language Technologies Institute
Sample Dialog
S1: How may I help you?
U1: I want to go to Carnegie Mellon University.
S2: Do you want to go to Carnegie Mellon University?
U2: Yes.
S3: The distance to the destination is 100 miles.
It takes about 2 hours.
U3: I would like to know weather.
S4: Please tell me the area and the date.
U4: Pittsburgh
(Navigation System sends the next direction to the dialog manager)
S5: To go to Carnegie Mellon University, please make a left turn here.
Please tell me the date for Pittsburgh.
U5: Tomorrow
S6: The weather for Pittsburgh tomorrow will be fine.
4
Language Technologies Institute
Some Related Research (Speech Interface)

Pellom et al, HLT 2001 (CU-Move)





Route planning and navigation
Noise suppression front-end and back-end navigation information
retrieval
Download driving instructions from the Internet after route planning
User can ask route information during travel
(ex. What’s my next turn?)
Coletti et al, IEEE 2003



Hotel retrieval/reservation, POI retrieval, Simple Route Query
Car Wide Web module: XML-based DB interface
Local database for tour and geographic data
5
Language Technologies Institute
Some Related Research (Multimodal Interface)

Minker et al, ICSLP 2002 (SmartKom
Mobile)





Provide framework for modality control
Prototype on Compac iPAG H3630 handheld computer
Apply to pedestrian and driver environment
Display maps for route information
Slide shows for sight information
6
Language Technologies Institute
Approach in CAMMIA

Asynchronous communication with navigation
system






Maintain dialog history for smooth dialog switching
Flexible & robust Dialog Manager based on
VoiceXML (Voice eXtensible Markup Language)
Error handling


When getting the next direction, interrupt the current dialog
Notify the next turn direction
Resume the pending dialog
Correction, Explicit/Implicit confirmation
Support multi-lingual (Japanese, English)
7
Language Technologies Institute
Next

Introduction
Approach in CAMMIA

Dialog Management Framework







System Architecture
ScenarioTemplate/ScenarioXML
ScenarioXML Development Kit
Pilot Systems & Experiments
Current Research Focus
Future Work
8
Language Technologies Institute
Dialog Management Framework
Speech
Interface
Dialog
Manager (DM)
HTTP
Request
VXI
VoiceXML
Dialog
Controller
Direction
Julius
URL to
get next
direction
Dialog history
Dialog scenario
Data source
Navigation
System (NS)
Destination
Navigation
System
Simulator
Direction
Map
Display
ADM
VXI: VoiceXML Interpreter
ADM: Asynchronous Dialog Manager
9
Language Technologies Institute
Dialog Manager





Support multi-user/multi-dialogs
Keep track of user session and dialog flows for
smooth task switching
Convert user utterances to database query
Mixed-initiative interaction
Create VoiceXML with dynamic contents
=> Hard to write VoiceXML by hands
=> Provide abstract level of dialog description
10
Language Technologies Institute
Dialog Description

ScenarioTemplate (ST)





Designed to facilitate new dialog creation
Consist of prompts and variables
=> Dialog designer does not have to know grammar
Support explicit/implicit confirmation
Compiled into SXML
ScenarioXML (SXML)



Consist of dialog states and transitions
Dialog developers need to know grammar, but don’t
have to add other dialogs for dialog switching
Compiled into JSP to create VXML with dynamic
contents
11
Language Technologies Institute
Example of ST and SXML
<variables>
weather_area: String
weather_date: String
weather_result: String
<states>
prompt: Please tell me the area and date.
variable: weather_area weather_date
prompt: Please tell me the area.
variable: weather_area
prompt: please tell me the date.
variable: weather_date
confirmation: Would you like to know the
weather for weather_area weather_date?
response: The weather for weather_date
weather_area is weather_result.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE function SYSTEM "dtd/function.dtd">
<function name="Weather">
<state name="ask_weather" position="start">
<grammar src="grammars/ask_weather_jp.gad"/>
<prompt>
<text>Please tell me the area and date</text>
</prompt>
<jump>
<nextstate next="ask_weather_area">
<field>weather_date</field>
</nextstate>
<nextstate next="ask_weather_date">
<field>weather_area</field>
</nextstate>
<nextstate next="confirm_weather">
<field>weather_area</field>
<field>weather_date</field>
</nextstate>
<default>Can you tell me again?</default>
</jump>
</state>
<state name="ask_weather_area" position="transition">
…
</state>
</function>
12
Language Technologies Institute
Example of SXML and VoiceXML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE function SYSTEM "dtd/function.dtd">
<function name="Weather">
<state name="ask_weather" position="start">
<grammar src="grammars/ask_weather_jp.gad"/>
<prompt>
<text>Please tell me the area and date</text>
</prompt>
<jump>
<nextstate next="ask_weather_area">
<field>weather_date</field>
</nextstate>
<nextstate next="ask_weather_date">
<field>weather_area</field>
</nextstate>
<nextstate next="confirm_weather">
<field>weather_area</field>
<field>weather_date</field>
</nextstate>
<default>Can you tell me again?</default>
</jump>
</state>
<state name="ask_weather_area" position="transition">
…
</state>
</function>
<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
<form>
<property name="message" value="'SessionID=99 IP=127.0.0.1
PORT=1001 COM=GRAMMAR
PARAM=C:\CAMMIA\grammars/;enter_direction_jp.gad;enter_parkin
g_jp.gad;enter_restaurant_jp.gad;cancel_jp.gad;ask_weather_jp.gad
;Correction.gad'" />
<block>
<prompt> Please tell me the area and date</prompt>
</block>
<field name=“parking_intention"
/>
<field name="weather_intention"
/>
<field name=“restaurant_intention"
/>
<field name="weather_area"
/>
<field name="weather_date"
/>
<field name=“direction_intention"
/>
…
<filled namelist="weather_intention">
<if cond="(weather_date!=undefined)">
<submit namelist="recog_result weather_date"
next="ask_weather_area.jsp" />
</if>
<if cond="(weather_area!=undefined)">
<submit namelist="recog_result weather_area"
next="ask_weather_date.jsp" />
</if>
…
<filled namelist=“direction_intention">
</filled>
<goto next="ask_direction.jsp"/>
…
</filled>
</form>
</vxml>
13
Language Technologies Institute
ScenarioTemplate Format
<template>
<variables>
<variable>
<statename>
<id>
<type>
<letter>
::=
::=
::=
::=
::=
::=
::=
<states>
<state>
<question>
<prompt>
<backprompt>
<confirmation>
<response>
::=
::=
::=
::=
::=
::=
::=
<variables><states>
<variable+>
<statename>_<id>:<type>
<letter+>
<letter+>
String|ArrayList
a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|
A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
<state+>
{<question+><confirmation><response>}
<variable+><prompt><backprompt>
{ <letter> | <variable> | " " }
<prompt>
<prompt>
<prompt>
14
Language Technologies Institute
ScenarioXML DTD
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT function (state*)>
<!ATTLIST function name CDATA #REQUIRED>
<!ELEMENT state (grammar?, result*, prompt?, backprompt?, filled?, jump?, nofound?)>
<!ATTLIST state name CDATA #REQUIRED
position CDATA #IMPLIED>
<!ELEMENT grammar EMPTY>
<!ATTLIST grammar src CDATA #REQUIRED>
<!ELEMENT result (field*, list*, param*)>
<!ATTLIST result class CDATA #REQUIRED
method CDATA #REQUIRED>
<!ELEMENT field (#PCDATA)>
<!ELEMENT list (#PCDATA)>
<!ATTLIST field condition CDATA #IMPLIED>
<!ELEMENT param (#PCDATA)>
<!ELEMENT prompt ANY>
<!ELEMENT text (#PCDATA)>
<!ELEMENT backprompt ANY>
<!ELEMENT filled (field*)>
<!ELEMENT jump (nextstate*, default?)>
<!ELEMENT nextstate (field*)>
<!ATTLIST nextstate next CDATA #REQUIRED>
<!ELEMENT default (#PCDATA)>
<!ATTLIST default next CDATA #IMPLIED>
<!ELEMENT nofound (#PCDATA)>
15
Language Technologies Institute
SXML Development Kit (SXMLDK)

Two-step compilation by SXMLDK
ScenarioXML Development Kit
ST
Compiler
Scenario
Template
Dialog Names
Grammars
ScenarioXML
JSP
files
SXML
Compiler
16
Voice
XML
Language Technologies Institute
SXMLDK User Interface
17
Language Technologies Institute
Next



Introduction
Approach in CAMMIA
Dialog Management Framework






System Architecture
ScenarioTemplate/ScenarioXML
ScenarioXML Development Kit
Pilot Systems & Experiments
Current Research Focus
Future Work
18
Language Technologies Institute
Pilot Systems

Prototype I (2002)



Focus on building general architecture to support SXML
HTTP is session-less and VoiceXML needs to keep user’s
utterances in each session
SXML provides session variable to keep user’s utterances
<session-variables>
<persistent names="DMEnv_persistent_topic
DMEnv_persistent_weather_date DMEnv_persistent_weather_area"/>
<transient names="DMEnv_answer_weather_date
DMEnv_answer_weather_area DMEnv_answer"/>
</session-variables>
19
Language Technologies Institute
Pilot Systems

Prototype II (2003)




Improve Prototype I
Develop ScenarioTemplate & SXMLDK
Filled values and dialog states are stored in
back-end dialog controller
Only the current values are stored in VoiceXML
20
Language Technologies Institute
Pilot Experiments


Compare Prototype I & II
Test environments



VXI: Intel Pentium III 800MHz, 512M RAM
DM: Intel Pentium II 400MHz, 224M RAM
Sample Dialog
U1: Weather information, please
S1: Please tell me the area and the date
U2: Tomorrow
S2: Please tell me the area
U3: Pittsburgh
S3: Are you asking the weather for Pittsburgh tomorrow?
U4: Yes.
S4: The weather for Pittsburgh tomorrow will be sunny.
21
date
area
confirmation
response
Language Technologies Institute
Comparison of Bandwidth Utilization

To compare bandwidth utilization, VXML file sizes
were measured
Sub Task
Prototype I
Prototype II
Ask date
6266
931
Ask area
1949
618
Confirmation
1709
607
Response
3110
283
Avg VXML Size
3258
610
Unit: Byte
22
Language Technologies Institute
Comparison of User Wait Time

To compare end-user wait time, loading time in
VXI were measured
Sub Task
Prototype I
Prototype II
Ask date
5.25
0.88
Ask area
3.92
1.28
Confirmation
4.47
1.50
Response
3.44
1.14
Avg Loading Time
4.27
1.20
Unit: second
23
Language Technologies Institute
Summary of Prototype II

Improvement




Efficient VXML management: VXML includes only the
current state information
=> faster & less use of network bandwidth
Easy to add new dialogs using ScenarioTemplate
Easy to support multiple languages by changing
ScenarioTemplate prompts and grammars
Issues


Predefined dialog states and prompts
Hard to support more complex dialogs
24
Language Technologies Institute
Example of Complex Dialog
U1: Tour guide, please.
S1: Tour guidance is started. Where is a destination?
U2: Carnegie Mellon University
S2: The distance to the destination is 100 miles. It takes about 2 hours.
U3: Do you know a good restaurant near here?
S3: What kind of food do you like?
U4: I want to eat Italian food.
S4: There are Olive garden, Bravo and Laromana.
U5: Well, I'd like to go to Laromana.
S5: It does not have a parking lot. Is it OK?
U6: Tell me the one which has parking lots.
S6: Olive garden and Bravo have parking lots.
U7: Which one is closer?
S7: Bravo
U8: I would like to go there.
S8: Do you want to add it as a way point?
U9: Yes
S9: I set Bravo as the waypoint.
25
Language Technologies Institute
Next



Introduction
Approach in CAMMIA
Dialog Management Framework



System Architecture
ScenarioTemplate/ScenarioXML
ScenarioXML Development Kit

Pilot Systems & Experiments

Current Research Focus

Future Work
26
Language Technologies Institute
Current Research Focus

Flexible dialog management





Push model: Offer important information even though
user does not request
Ex) do not take credit card, do not have parking lot
Search data from the list which was already retrieved
Support comparison (the cheapest, closer)
Anaphora resolution (there, it)
Add way point to Navigation Map
27
Language Technologies Institute
Current Research Focus

Robust dialog management for signal loss

Task Manager
- Retrieval task management for signal loss

Dialog Manager
- Located in vehicle

Info Manager
- Retrieve data from the remote database servers
- Maintain local cache with timestamps
28
Language Technologies Institute
Extended Architecture
Dialog
Manager
User
Interface
Dialog
Controller
Voice
Julius
ADM
Data
Server
Info
Manager
Task
Manager
Local
Cache
DB
…
DB
Navigation
System
Map
Display
Navigation
System
Simulator
Vehicle
29
Remote
Server
Language Technologies Institute
Dialog Manager

To support information seeking dialogs, it has three dialog
states

Search
- Ask the user to fill the minimum search constraints
- Display the search results
Ex) There are three Italian restaurants near hear, A, B and C

Search refinement
- User can narrow down the searched items with different search options
such as price and distance
Ex) Which one is closer?

Selection
- Automatically check important information (Push model)
- Add way point to the Navigation map
Ex) I set it as the waypoint
30
Language Technologies Institute
Dynamic State Transition
Check search
constraints
need
features=no
need constraints=yes
need constraints=yes
Ask constraints
New constraints
need features=no
Display the results
Search
Search options
need
selection=no
Selection
Search
options
Find new lists from the
previous results
Display it
Selection
Search
Refinement
Search options
Check special info
Notify it to the user
Not satisfied
satisfied
Display detailed info
Add way point if user wants
End dialog
31
Selection
Language Technologies Institute
Future work



More work on Task Manager
Anaphora and ellipsis resolution
Multi-modal interface





Integration of screen with Navigation map
Complementary prompts (voice and screen)
Dynamic grammar generation
More intelligent push model based on user
preference
Missed turn
32
Download