Language Technologies Institute Flexible Dialog Management for In-vehicle Dialog Systems Jeongwoo Ko jko@cs.cmu.edu Language Technologies Institute Outline Introduction Approach in CAMMIA Dialog Management Framework System Architecture ScenarioTemplate/ScenarioXML ScenarioXML Development Kit Pilot Systems & Experiments Current Research Focus Future Work 2 Language Technologies Institute Introduction of CAMMIA Project (Conversational Agent for Multilingual Mobile Information Access) 3 Language Technologies Institute Sample Dialog S1: How may I help you? U1: I want to go to Carnegie Mellon University. S2: Do you want to go to Carnegie Mellon University? U2: Yes. S3: The distance to the destination is 100 miles. It takes about 2 hours. U3: I would like to know weather. S4: Please tell me the area and the date. U4: Pittsburgh (Navigation System sends the next direction to the dialog manager) S5: To go to Carnegie Mellon University, please make a left turn here. Please tell me the date for Pittsburgh. U5: Tomorrow S6: The weather for Pittsburgh tomorrow will be fine. 4 Language Technologies Institute Some Related Research (Speech Interface) Pellom et al, HLT 2001 (CU-Move) Route planning and navigation Noise suppression front-end and back-end navigation information retrieval Download driving instructions from the Internet after route planning User can ask route information during travel (ex. What’s my next turn?) Coletti et al, IEEE 2003 Hotel retrieval/reservation, POI retrieval, Simple Route Query Car Wide Web module: XML-based DB interface Local database for tour and geographic data 5 Language Technologies Institute Some Related Research (Multimodal Interface) Minker et al, ICSLP 2002 (SmartKom Mobile) Provide framework for modality control Prototype on Compac iPAG H3630 handheld computer Apply to pedestrian and driver environment Display maps for route information Slide shows for sight information 6 Language Technologies Institute Approach in CAMMIA Asynchronous communication with navigation system Maintain dialog history for smooth dialog switching Flexible & robust Dialog Manager based on VoiceXML (Voice eXtensible Markup Language) Error handling When getting the next direction, interrupt the current dialog Notify the next turn direction Resume the pending dialog Correction, Explicit/Implicit confirmation Support multi-lingual (Japanese, English) 7 Language Technologies Institute Next Introduction Approach in CAMMIA Dialog Management Framework System Architecture ScenarioTemplate/ScenarioXML ScenarioXML Development Kit Pilot Systems & Experiments Current Research Focus Future Work 8 Language Technologies Institute Dialog Management Framework Speech Interface Dialog Manager (DM) HTTP Request VXI VoiceXML Dialog Controller Direction Julius URL to get next direction Dialog history Dialog scenario Data source Navigation System (NS) Destination Navigation System Simulator Direction Map Display ADM VXI: VoiceXML Interpreter ADM: Asynchronous Dialog Manager 9 Language Technologies Institute Dialog Manager Support multi-user/multi-dialogs Keep track of user session and dialog flows for smooth task switching Convert user utterances to database query Mixed-initiative interaction Create VoiceXML with dynamic contents => Hard to write VoiceXML by hands => Provide abstract level of dialog description 10 Language Technologies Institute Dialog Description ScenarioTemplate (ST) Designed to facilitate new dialog creation Consist of prompts and variables => Dialog designer does not have to know grammar Support explicit/implicit confirmation Compiled into SXML ScenarioXML (SXML) Consist of dialog states and transitions Dialog developers need to know grammar, but don’t have to add other dialogs for dialog switching Compiled into JSP to create VXML with dynamic contents 11 Language Technologies Institute Example of ST and SXML <variables> weather_area: String weather_date: String weather_result: String <states> prompt: Please tell me the area and date. variable: weather_area weather_date prompt: Please tell me the area. variable: weather_area prompt: please tell me the date. variable: weather_date confirmation: Would you like to know the weather for weather_area weather_date? response: The weather for weather_date weather_area is weather_result. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE function SYSTEM "dtd/function.dtd"> <function name="Weather"> <state name="ask_weather" position="start"> <grammar src="grammars/ask_weather_jp.gad"/> <prompt> <text>Please tell me the area and date</text> </prompt> <jump> <nextstate next="ask_weather_area"> <field>weather_date</field> </nextstate> <nextstate next="ask_weather_date"> <field>weather_area</field> </nextstate> <nextstate next="confirm_weather"> <field>weather_area</field> <field>weather_date</field> </nextstate> <default>Can you tell me again?</default> </jump> </state> <state name="ask_weather_area" position="transition"> … </state> </function> 12 Language Technologies Institute Example of SXML and VoiceXML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE function SYSTEM "dtd/function.dtd"> <function name="Weather"> <state name="ask_weather" position="start"> <grammar src="grammars/ask_weather_jp.gad"/> <prompt> <text>Please tell me the area and date</text> </prompt> <jump> <nextstate next="ask_weather_area"> <field>weather_date</field> </nextstate> <nextstate next="ask_weather_date"> <field>weather_area</field> </nextstate> <nextstate next="confirm_weather"> <field>weather_area</field> <field>weather_date</field> </nextstate> <default>Can you tell me again?</default> </jump> </state> <state name="ask_weather_area" position="transition"> … </state> </function> <?xml version="1.0" encoding="UTF-8" ?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form> <property name="message" value="'SessionID=99 IP=127.0.0.1 PORT=1001 COM=GRAMMAR PARAM=C:\CAMMIA\grammars/;enter_direction_jp.gad;enter_parkin g_jp.gad;enter_restaurant_jp.gad;cancel_jp.gad;ask_weather_jp.gad ;Correction.gad'" /> <block> <prompt> Please tell me the area and date</prompt> </block> <field name=“parking_intention" /> <field name="weather_intention" /> <field name=“restaurant_intention" /> <field name="weather_area" /> <field name="weather_date" /> <field name=“direction_intention" /> … <filled namelist="weather_intention"> <if cond="(weather_date!=undefined)"> <submit namelist="recog_result weather_date" next="ask_weather_area.jsp" /> </if> <if cond="(weather_area!=undefined)"> <submit namelist="recog_result weather_area" next="ask_weather_date.jsp" /> </if> … <filled namelist=“direction_intention"> </filled> <goto next="ask_direction.jsp"/> … </filled> </form> </vxml> 13 Language Technologies Institute ScenarioTemplate Format <template> <variables> <variable> <statename> <id> <type> <letter> ::= ::= ::= ::= ::= ::= ::= <states> <state> <question> <prompt> <backprompt> <confirmation> <response> ::= ::= ::= ::= ::= ::= ::= <variables><states> <variable+> <statename>_<id>:<type> <letter+> <letter+> String|ArrayList a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z| A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z <state+> {<question+><confirmation><response>} <variable+><prompt><backprompt> { <letter> | <variable> | " " } <prompt> <prompt> <prompt> 14 Language Technologies Institute ScenarioXML DTD <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT function (state*)> <!ATTLIST function name CDATA #REQUIRED> <!ELEMENT state (grammar?, result*, prompt?, backprompt?, filled?, jump?, nofound?)> <!ATTLIST state name CDATA #REQUIRED position CDATA #IMPLIED> <!ELEMENT grammar EMPTY> <!ATTLIST grammar src CDATA #REQUIRED> <!ELEMENT result (field*, list*, param*)> <!ATTLIST result class CDATA #REQUIRED method CDATA #REQUIRED> <!ELEMENT field (#PCDATA)> <!ELEMENT list (#PCDATA)> <!ATTLIST field condition CDATA #IMPLIED> <!ELEMENT param (#PCDATA)> <!ELEMENT prompt ANY> <!ELEMENT text (#PCDATA)> <!ELEMENT backprompt ANY> <!ELEMENT filled (field*)> <!ELEMENT jump (nextstate*, default?)> <!ELEMENT nextstate (field*)> <!ATTLIST nextstate next CDATA #REQUIRED> <!ELEMENT default (#PCDATA)> <!ATTLIST default next CDATA #IMPLIED> <!ELEMENT nofound (#PCDATA)> 15 Language Technologies Institute SXML Development Kit (SXMLDK) Two-step compilation by SXMLDK ScenarioXML Development Kit ST Compiler Scenario Template Dialog Names Grammars ScenarioXML JSP files SXML Compiler 16 Voice XML Language Technologies Institute SXMLDK User Interface 17 Language Technologies Institute Next Introduction Approach in CAMMIA Dialog Management Framework System Architecture ScenarioTemplate/ScenarioXML ScenarioXML Development Kit Pilot Systems & Experiments Current Research Focus Future Work 18 Language Technologies Institute Pilot Systems Prototype I (2002) Focus on building general architecture to support SXML HTTP is session-less and VoiceXML needs to keep user’s utterances in each session SXML provides session variable to keep user’s utterances <session-variables> <persistent names="DMEnv_persistent_topic DMEnv_persistent_weather_date DMEnv_persistent_weather_area"/> <transient names="DMEnv_answer_weather_date DMEnv_answer_weather_area DMEnv_answer"/> </session-variables> 19 Language Technologies Institute Pilot Systems Prototype II (2003) Improve Prototype I Develop ScenarioTemplate & SXMLDK Filled values and dialog states are stored in back-end dialog controller Only the current values are stored in VoiceXML 20 Language Technologies Institute Pilot Experiments Compare Prototype I & II Test environments VXI: Intel Pentium III 800MHz, 512M RAM DM: Intel Pentium II 400MHz, 224M RAM Sample Dialog U1: Weather information, please S1: Please tell me the area and the date U2: Tomorrow S2: Please tell me the area U3: Pittsburgh S3: Are you asking the weather for Pittsburgh tomorrow? U4: Yes. S4: The weather for Pittsburgh tomorrow will be sunny. 21 date area confirmation response Language Technologies Institute Comparison of Bandwidth Utilization To compare bandwidth utilization, VXML file sizes were measured Sub Task Prototype I Prototype II Ask date 6266 931 Ask area 1949 618 Confirmation 1709 607 Response 3110 283 Avg VXML Size 3258 610 Unit: Byte 22 Language Technologies Institute Comparison of User Wait Time To compare end-user wait time, loading time in VXI were measured Sub Task Prototype I Prototype II Ask date 5.25 0.88 Ask area 3.92 1.28 Confirmation 4.47 1.50 Response 3.44 1.14 Avg Loading Time 4.27 1.20 Unit: second 23 Language Technologies Institute Summary of Prototype II Improvement Efficient VXML management: VXML includes only the current state information => faster & less use of network bandwidth Easy to add new dialogs using ScenarioTemplate Easy to support multiple languages by changing ScenarioTemplate prompts and grammars Issues Predefined dialog states and prompts Hard to support more complex dialogs 24 Language Technologies Institute Example of Complex Dialog U1: Tour guide, please. S1: Tour guidance is started. Where is a destination? U2: Carnegie Mellon University S2: The distance to the destination is 100 miles. It takes about 2 hours. U3: Do you know a good restaurant near here? S3: What kind of food do you like? U4: I want to eat Italian food. S4: There are Olive garden, Bravo and Laromana. U5: Well, I'd like to go to Laromana. S5: It does not have a parking lot. Is it OK? U6: Tell me the one which has parking lots. S6: Olive garden and Bravo have parking lots. U7: Which one is closer? S7: Bravo U8: I would like to go there. S8: Do you want to add it as a way point? U9: Yes S9: I set Bravo as the waypoint. 25 Language Technologies Institute Next Introduction Approach in CAMMIA Dialog Management Framework System Architecture ScenarioTemplate/ScenarioXML ScenarioXML Development Kit Pilot Systems & Experiments Current Research Focus Future Work 26 Language Technologies Institute Current Research Focus Flexible dialog management Push model: Offer important information even though user does not request Ex) do not take credit card, do not have parking lot Search data from the list which was already retrieved Support comparison (the cheapest, closer) Anaphora resolution (there, it) Add way point to Navigation Map 27 Language Technologies Institute Current Research Focus Robust dialog management for signal loss Task Manager - Retrieval task management for signal loss Dialog Manager - Located in vehicle Info Manager - Retrieve data from the remote database servers - Maintain local cache with timestamps 28 Language Technologies Institute Extended Architecture Dialog Manager User Interface Dialog Controller Voice Julius ADM Data Server Info Manager Task Manager Local Cache DB … DB Navigation System Map Display Navigation System Simulator Vehicle 29 Remote Server Language Technologies Institute Dialog Manager To support information seeking dialogs, it has three dialog states Search - Ask the user to fill the minimum search constraints - Display the search results Ex) There are three Italian restaurants near hear, A, B and C Search refinement - User can narrow down the searched items with different search options such as price and distance Ex) Which one is closer? Selection - Automatically check important information (Push model) - Add way point to the Navigation map Ex) I set it as the waypoint 30 Language Technologies Institute Dynamic State Transition Check search constraints need features=no need constraints=yes need constraints=yes Ask constraints New constraints need features=no Display the results Search Search options need selection=no Selection Search options Find new lists from the previous results Display it Selection Search Refinement Search options Check special info Notify it to the user Not satisfied satisfied Display detailed info Add way point if user wants End dialog 31 Selection Language Technologies Institute Future work More work on Task Manager Anaphora and ellipsis resolution Multi-modal interface Integration of screen with Navigation map Complementary prompts (voice and screen) Dynamic grammar generation More intelligent push model based on user preference Missed turn 32