Distributed Multimodal Synchronization Protocol (DMSP)

advertisement
Distributed Multimodal
Synchronization Protocol (DMSP)
Chris Cross
IETF 66
July 13, 2006
With Contribution from Gerald McCobb and Les Wilson
July 13, 2006
© 2006 IBM Corporation
DMSP Background
• Result of 4 year IBM and Motorola R&D effort
• OMA Architecture Document:
– Multimodal and Multi-device Architecture
–
http://member.openmobilealliance.org/ftp/Public_documents/BAC/MAE/Permanent_documents/OMA-ADMMMD-V1_0-20060612-D.zip
• ID to IETF July 8, 2005 by IBM & Motorola
• Reason for contribution
– A standard is needed to synchronize network based services implementing
distributed modalities in multimodal applications
– Other protocols may have overlap but do not address all multimodal interaction
requirements
– Other IETF IDs and RFCs:
•
•
•
•
•
Media Server Control Protocol (MSCP)
LRDP: The Lightweight Remote Display Protocol (Remote UI BoF)
Media Resource Control Protocol Version 2 (MRCPv2)
Widex
RFC 1056 Distributed Mail System for Personal Computers (also DMSP )
July 13, 2006
© 2006 IBM Corporation
Why do you need a distributed system, i.e.,
a Thin Client?
Client Resources
R
R/G = 1
Thick Client
A thick client has speech recognition and synthesis on
the device. As resources available on a device shrink
or the application requirements increase (larger
application grammars) then the performance of the
system becomes unacceptable. When that threshold
is reached then it is economically feasible to distribute
the speech over the network.
Thin Client
Grammar Size and Complexity
G
R
Resources: memory and MIPS on the client device
G
Size and Complexity of application grammars
R/G = 1
Resources are adequate to perform “real time”
recognition and synthesis.
July 13, 2006
© 2006 IBM Corporation
Multimodal Use Cases
•
•
•
•
Opera X+V Pizza demo
X+V
J+V
Future W3C multimodal languages
(VoiceXML 3, etc.)
July 13, 2006
© 2006 IBM Corporation
DMSP Architecture
•
There are 4 DMSP building blocks:
1.
2.
3.
4.
Modalities
Model-View-Controller (MVC) design pattern
View Independent Model
Event-based modality synchronization
July 13, 2006
© 2006 IBM Corporation
DMSP Architecture Building Blocks
1.
Modalities are Views in
the MVC Pattern
•
•
•
GUI, Speech, Pen
Individual browsers for
each modality
Compound browsers for
multiple modalities
Compound Browser
July 13, 2006
© 2006 IBM Corporation
DMSP Architecture Building Blocks
2. Model-View-Controller (MVC) design pattern
•
•
•
Multimodal system can be modeled
in terms of the MVC pattern
Each modality can be decomposed
and implemented in its own MVC
pattern
A modality can implement a view
independent model and controller
locally or use one in the network
(e.g., an IM)
July 13, 2006
© 2006 IBM Corporation
DMSP Architecture Building Blocks
View Independent Model
•
•
•
•
•
Enables a centralized model
Modality interaction updates view and model
Local event filters reflect “important” events to
view independent model
A modality listens to view independent model for
only the events it cares about
Compound clients and centralized control
through an Interaction Manager as well as
distributed modalities all enabled with a single
protocol
July 13, 2006
© 2006 IBM Corporation
DMSP Architecture Building Blocks
4. Event-based synchronization
•
•
•
•
Compound Client: All modalities
rendered in client
Interactions in one modality
reflected in others thru event
based changes to one or more
model
GUI DOM can serve as View
Independent model
Something about connecting
non-dom UA’s to the ones with
a dom
July 13, 2006
© 2006 IBM Corporation
DMSP Architecture Building Blocks
4. Event-based synchronization (CONT’D)
•
•
•
•
Distributed Modality: A modality is handled in
the infrastructure
Requires the DMSP for distributing modality
Event based synchronization via View
Independent Model gives a modality
independent distribution mechanism
Enables multiple topographies
–
Compound Client w/ Distributed Modality
–
Interaction Manager
July 13, 2006
Distributed Modality
© 2006 IBM Corporation
DMSP Design
•
There are 4 abstract interfaces
1.
2.
3.
4.
•
•
•
Command
Response
Event
Signal
Each interface defines a set of methods and
related data structures exchanged between
user agents
Specified as a set of messages
XML and Binary message encodings
July 13, 2006
© 2006 IBM Corporation
DMSP Message Types
1. Signals
•
One-way asynchronous messages used to
negotiate internal processing states
•
•
•
Initialization (SIG_INIT)
VXML Start (SIG_VXML_START)
Close (SIG_CLOSE)
July 13, 2006
© 2006 IBM Corporation
DMSP Message Types
2. Command and control messages
•
•
•
•
•
•
•
•
•
•
Add and remove event listener
(CMD_ADD/REMOVE_EVT_LISTENER)
Can dispatch (CMD_CAN_DISPATCH)
Dispatch event (CMD_DISPATCH_EVT)
Load URL (CMD_LOAD_URL)
Load Source (CMD_LOAD_SRC)
Get and Set Focus (CMD_GET/SET_FOCUS)
Get and Set Fields (CMD_GET/SET_FIELDS)
Cancel (CMD_CANCEL)
Execute Form (CMD_EXEC_FORM)
Get and Set Cookies (CMD_GET/SET_COOKIES)
July 13, 2006
© 2006 IBM Corporation
DMSP Message Types
3. Responses
•
Response messages to commands
•
•
•
•
OK (RESP_OK)
Boolean (RESP_BOOL)
String (RESP_STRING)
Fields (RESP_FIELDS)
•
•
Contains 1 or more Field data structures
Error (RESP_ERROR)
July 13, 2006
© 2006 IBM Corporation
DMSP Message Types
4. Events
•
•
•
Asynchronous notifications between user agents
with a common data structure
Events correlated with event listeners
DOM events
•
•
HTML 4 events
•
•
•
DOMActivate, DOMFocusIn, and DOMFocusOut
Click, Mouse, Key, submit, reset, etc
Error and abort
VXML Done (e.g., VoiceXML form complete)
July 13, 2006
© 2006 IBM Corporation
DMSP Message Types
Events (CONT’D)
4.
•
Form Data
•
•
Recognition Results
•
•
Play back of audio or TTS prompts has started or stopped
Start and stop play back mark
•
•
One or more Result EX data structures with raw utterance, score,
grammar, and semantics
Start and stop play back
•
•
One or more Result data structures with raw utterance, score,
and one or more Field data structures
Recognition Results EX
•
•
One or more Field data structures (GUI or Voice)
TTS encounters a mark in the play text
Custom (i.e., application-defined)
July 13, 2006
© 2006 IBM Corporation
DMSP Conclusion
• A protocol dedicated to distributed multimodal interaction
• Based on the Model-View-Controller design pattern
• Enables both Interaction Manager and Client based View
Independent Model topographies
• Asynchronous signals and events
• Command-response messages
• Can be generalized for other modalities besides GUI and
Voice
• Supports application specific result protocols (e.g.
EMMA) through extension TBD
• Interested in getting more participation
July 13, 2006
© 2006 IBM Corporation
Draft Charter
•
•
•
•
The convergence of wireless communications with information technology
and the miniaturization of computing platforms have resulted in advanced
mobile devices that offer high resolution displays, application programs with
graphical user interfaces, and access to the internet through full function
web browsers.
Mobile phones now support most of the functionality of a laptop computer.
However the miniaturization that has made the technology possible and
commercially successful also puts constraints on the user interface. Tiny
displays and keypads significantly reduce the usability of application
programs.
Multimodal user interfaces, UIs that offer multiple modes of interaction, have
been developed that greatly improve the usability of mobile devices. In
particular multimodal UIs that combine speech and graphical interaction are
proving themselves in the marketplace.
However, not all mobile devices provide the computing resources to perform
speech recognition and synthesis locally on the device. For these devices it
is necessary to distribute the speech modality to a server in the network.
July 13, 2006
© 2006 IBM Corporation
Draft Charter (cont.)
•
•
The Distributed Multimodal Working Group will develop the protocols
necessary to control, coordinate, and synchronize distributed modalities in a
distributed Multimodal system. There are several protocols and standards
necessary to implement such a system including DSR and AMR speech
compression, session control, and media streaming. However, the DM WG
will focus exclusively on the synchronization of modalities being rendered
across a network, in particular Graphical User Interface and Voice Servers.
The DM WG will develop an RFC for a Distributed Multimodal
Synchronization Protocol that defines the logical message set to effect
synchronization between modalities and enough background on the
expected multimodal system architecture (or reference architecture defined
elsewhere in W3C or OMA) to present a clear understanding of the protocol.
It will investigate existing protocols for the transport of the logical
synchronization messages and develop an RFC detailing the message
format for commercial alternatives, including, possibly, HTTP and SIP.
July 13, 2006
© 2006 IBM Corporation
Draft Charter (cont.)
•
While not being limited to these, for simplicity of the scope the protocol will
assume RTP for carriage of media, SIP and SDP for session control, and
DSR, AMR, QCELP, etc., for speech compression. The working group will
not consider the authoring of applications as it will be assumed that this will
be done with existing W3C markup standards such as XHTML and
VoiceXML and commercial programming languages like Java and C/C++.
July 13, 2006
© 2006 IBM Corporation
Draft Charter (cont.)
• It is expected that we will coordinate our work in
the IETF with the W3C Multimodal Interaction
Work Group.
• The following are our goals for the Working
Group:
– Date Milestone
– TBD Submit Internet Draft Describing DMSP
(standards track)
– TBD Submit Drafts to IESG for publication
– TBD Submit DMSP specification to IESG
July 13, 2006
© 2006 IBM Corporation
Download