Distributed Multimodal Synchronization Protocol (DMSP) Chris Cross IETF 66 July 13, 2006 With Contribution from Gerald McCobb and Les Wilson July 13, 2006 © 2006 IBM Corporation DMSP Background • Result of 4 year IBM and Motorola R&D effort • OMA Architecture Document: – Multimodal and Multi-device Architecture – http://member.openmobilealliance.org/ftp/Public_documents/BAC/MAE/Permanent_documents/OMA-ADMMMD-V1_0-20060612-D.zip • ID to IETF July 8, 2005 by IBM & Motorola • Reason for contribution – A standard is needed to synchronize network based services implementing distributed modalities in multimodal applications – Other protocols may have overlap but do not address all multimodal interaction requirements – Other IETF IDs and RFCs: • • • • • Media Server Control Protocol (MSCP) LRDP: The Lightweight Remote Display Protocol (Remote UI BoF) Media Resource Control Protocol Version 2 (MRCPv2) Widex RFC 1056 Distributed Mail System for Personal Computers (also DMSP ) July 13, 2006 © 2006 IBM Corporation Why do you need a distributed system, i.e., a Thin Client? Client Resources R R/G = 1 Thick Client A thick client has speech recognition and synthesis on the device. As resources available on a device shrink or the application requirements increase (larger application grammars) then the performance of the system becomes unacceptable. When that threshold is reached then it is economically feasible to distribute the speech over the network. Thin Client Grammar Size and Complexity G R Resources: memory and MIPS on the client device G Size and Complexity of application grammars R/G = 1 Resources are adequate to perform “real time” recognition and synthesis. July 13, 2006 © 2006 IBM Corporation Multimodal Use Cases • • • • Opera X+V Pizza demo X+V J+V Future W3C multimodal languages (VoiceXML 3, etc.) July 13, 2006 © 2006 IBM Corporation DMSP Architecture • There are 4 DMSP building blocks: 1. 2. 3. 4. Modalities Model-View-Controller (MVC) design pattern View Independent Model Event-based modality synchronization July 13, 2006 © 2006 IBM Corporation DMSP Architecture Building Blocks 1. Modalities are Views in the MVC Pattern • • • GUI, Speech, Pen Individual browsers for each modality Compound browsers for multiple modalities Compound Browser July 13, 2006 © 2006 IBM Corporation DMSP Architecture Building Blocks 2. Model-View-Controller (MVC) design pattern • • • Multimodal system can be modeled in terms of the MVC pattern Each modality can be decomposed and implemented in its own MVC pattern A modality can implement a view independent model and controller locally or use one in the network (e.g., an IM) July 13, 2006 © 2006 IBM Corporation DMSP Architecture Building Blocks View Independent Model • • • • • Enables a centralized model Modality interaction updates view and model Local event filters reflect “important” events to view independent model A modality listens to view independent model for only the events it cares about Compound clients and centralized control through an Interaction Manager as well as distributed modalities all enabled with a single protocol July 13, 2006 © 2006 IBM Corporation DMSP Architecture Building Blocks 4. Event-based synchronization • • • • Compound Client: All modalities rendered in client Interactions in one modality reflected in others thru event based changes to one or more model GUI DOM can serve as View Independent model Something about connecting non-dom UA’s to the ones with a dom July 13, 2006 © 2006 IBM Corporation DMSP Architecture Building Blocks 4. Event-based synchronization (CONT’D) • • • • Distributed Modality: A modality is handled in the infrastructure Requires the DMSP for distributing modality Event based synchronization via View Independent Model gives a modality independent distribution mechanism Enables multiple topographies – Compound Client w/ Distributed Modality – Interaction Manager July 13, 2006 Distributed Modality © 2006 IBM Corporation DMSP Design • There are 4 abstract interfaces 1. 2. 3. 4. • • • Command Response Event Signal Each interface defines a set of methods and related data structures exchanged between user agents Specified as a set of messages XML and Binary message encodings July 13, 2006 © 2006 IBM Corporation DMSP Message Types 1. Signals • One-way asynchronous messages used to negotiate internal processing states • • • Initialization (SIG_INIT) VXML Start (SIG_VXML_START) Close (SIG_CLOSE) July 13, 2006 © 2006 IBM Corporation DMSP Message Types 2. Command and control messages • • • • • • • • • • Add and remove event listener (CMD_ADD/REMOVE_EVT_LISTENER) Can dispatch (CMD_CAN_DISPATCH) Dispatch event (CMD_DISPATCH_EVT) Load URL (CMD_LOAD_URL) Load Source (CMD_LOAD_SRC) Get and Set Focus (CMD_GET/SET_FOCUS) Get and Set Fields (CMD_GET/SET_FIELDS) Cancel (CMD_CANCEL) Execute Form (CMD_EXEC_FORM) Get and Set Cookies (CMD_GET/SET_COOKIES) July 13, 2006 © 2006 IBM Corporation DMSP Message Types 3. Responses • Response messages to commands • • • • OK (RESP_OK) Boolean (RESP_BOOL) String (RESP_STRING) Fields (RESP_FIELDS) • • Contains 1 or more Field data structures Error (RESP_ERROR) July 13, 2006 © 2006 IBM Corporation DMSP Message Types 4. Events • • • Asynchronous notifications between user agents with a common data structure Events correlated with event listeners DOM events • • HTML 4 events • • • DOMActivate, DOMFocusIn, and DOMFocusOut Click, Mouse, Key, submit, reset, etc Error and abort VXML Done (e.g., VoiceXML form complete) July 13, 2006 © 2006 IBM Corporation DMSP Message Types Events (CONT’D) 4. • Form Data • • Recognition Results • • Play back of audio or TTS prompts has started or stopped Start and stop play back mark • • One or more Result EX data structures with raw utterance, score, grammar, and semantics Start and stop play back • • One or more Result data structures with raw utterance, score, and one or more Field data structures Recognition Results EX • • One or more Field data structures (GUI or Voice) TTS encounters a mark in the play text Custom (i.e., application-defined) July 13, 2006 © 2006 IBM Corporation DMSP Conclusion • A protocol dedicated to distributed multimodal interaction • Based on the Model-View-Controller design pattern • Enables both Interaction Manager and Client based View Independent Model topographies • Asynchronous signals and events • Command-response messages • Can be generalized for other modalities besides GUI and Voice • Supports application specific result protocols (e.g. EMMA) through extension TBD • Interested in getting more participation July 13, 2006 © 2006 IBM Corporation Draft Charter • • • • The convergence of wireless communications with information technology and the miniaturization of computing platforms have resulted in advanced mobile devices that offer high resolution displays, application programs with graphical user interfaces, and access to the internet through full function web browsers. Mobile phones now support most of the functionality of a laptop computer. However the miniaturization that has made the technology possible and commercially successful also puts constraints on the user interface. Tiny displays and keypads significantly reduce the usability of application programs. Multimodal user interfaces, UIs that offer multiple modes of interaction, have been developed that greatly improve the usability of mobile devices. In particular multimodal UIs that combine speech and graphical interaction are proving themselves in the marketplace. However, not all mobile devices provide the computing resources to perform speech recognition and synthesis locally on the device. For these devices it is necessary to distribute the speech modality to a server in the network. July 13, 2006 © 2006 IBM Corporation Draft Charter (cont.) • • The Distributed Multimodal Working Group will develop the protocols necessary to control, coordinate, and synchronize distributed modalities in a distributed Multimodal system. There are several protocols and standards necessary to implement such a system including DSR and AMR speech compression, session control, and media streaming. However, the DM WG will focus exclusively on the synchronization of modalities being rendered across a network, in particular Graphical User Interface and Voice Servers. The DM WG will develop an RFC for a Distributed Multimodal Synchronization Protocol that defines the logical message set to effect synchronization between modalities and enough background on the expected multimodal system architecture (or reference architecture defined elsewhere in W3C or OMA) to present a clear understanding of the protocol. It will investigate existing protocols for the transport of the logical synchronization messages and develop an RFC detailing the message format for commercial alternatives, including, possibly, HTTP and SIP. July 13, 2006 © 2006 IBM Corporation Draft Charter (cont.) • While not being limited to these, for simplicity of the scope the protocol will assume RTP for carriage of media, SIP and SDP for session control, and DSR, AMR, QCELP, etc., for speech compression. The working group will not consider the authoring of applications as it will be assumed that this will be done with existing W3C markup standards such as XHTML and VoiceXML and commercial programming languages like Java and C/C++. July 13, 2006 © 2006 IBM Corporation Draft Charter (cont.) • It is expected that we will coordinate our work in the IETF with the W3C Multimodal Interaction Work Group. • The following are our goals for the Working Group: – Date Milestone – TBD Submit Internet Draft Describing DMSP (standards track) – TBD Submit Drafts to IESG for publication – TBD Submit DMSP specification to IESG July 13, 2006 © 2006 IBM Corporation