Signal Processing for Multimodal Web

W3C Web Technology Day
Signal Processing for
Multimodal Web
Irek Defée
Department of Signal Processing
Tampere University of Technology
Irek Defée
Current status
• Web is developed for traditional data and
computer I/O: text, keyboard, mouse
• This is simple and effective but not a natural
way of human interaction with the world
• Humans interact via
perceptual system
Irek Defée
Human Perceptual System
Human perceptual system has multiple
senses: visual, acoustical, haptic
(touch, body position, temperature)
and actuators (vocal tract, muscles,
motoric system)
The perceptual system is intrinsically
MULTIMODAL: multiple senses and
actuators operate in perfectly coordinated
Irek Defée
Perceptual Information Technology
• Information technology is evolving towards
natural MULTIMODAL human interaction:
 Touch gestures
revolutionized mobile devices
 Intelligent speech input
is available
 There is more to come:
new sensors, cameras
and intelligence
Irek Defée
Signal Processing Role
Perceptual Information Technology
requires sophisticated signal processing
and it is hard due to:
- Complex input signals
- Complex information encoding
- Complex databases of knowledge
Highly sophisticated algorithms and
huge processing power are required
Irek Defée
Multimodal Web
• The trend towards perceptual
information is noted at the W3C:
Extending the Web to allow multiple modes of
interaction: GUI, Speech, Vision, Pen, Gestures,
Haptic interfaces, ...
• Multimodal Interaction Activity:
- Multimodal Architecture and Interfaces
- InkML
- EmotionML
Irek Defée
Multimodal Architecture
Irek Defée
• Extensible Multimodal Markup
Language for Annotations
- containing and annotating the interpretation
of user input
- transcription into words of a raw signal, for
instance derived from speech, pen
- interpretation is to be generated by signal
interpretation processes, such as speech and ink
recognition, semantic interpreters
Irek Defée
Ink Markup Language
• data format for representing ink
• input and processing of handwriting,
gestures, sketches, music using
traces of pen
Irek Defée
Emotion Markup Language
• Annotation of material involving emotionality
• Automatic recognition of emotions from sensors
• Generation of emotion-related system responses:
speech, music, colors, gestures, synthetic faces
• Emotion vocabularies and representations:
<emotion categoryset=""> <category name="surprise"
confidence="0.9 </emotion>
Irek Defée
Department of Signal Processing
Signal processing has a key role as a
front-end for the Multimodal Web
Department is on the forefront of research
in the natural information processing:
- Multimedia information analysis, retrieval and
- Audio information analysis : speech and
- Media information handling: representation and
Irek Defée
for your attention!
Irek Defée