Computer-supported Interaction

advertisement
Computer-supported
Interaction
Hamed Ketabdar
Shiva Sundaram
Computer-supported interaction
• Technologies which support
interaction between human,
machine and environment
• Capturing, processing, and
retrieving multimedia
Computer-supported interaction
Schedule: Thursday 14-16h, FR 0512C, starting 04.11.2010
Hamed Ketabdar: PhD in Electrical Engineering from Swiss
Federal Institute of Technology at Lausanne (EPFL)
Hamed.Ketabdar@telekom.de
Shiva Sundaram: PhD in Electrical Engineering from University of
Southern California (USC)
Shiva.Sundaram@telekom.de
Outline
•
Multi-modal Interfaces
– Input methods: keyboard, pen,
voice, gesture, tactile (touch) …
– Output modalities: audio, video,
tactile
– Fusion of modalities
•
Speech Processing
– Speech Recognition: statistical
methods, acoustic modelling, and
decoding
– Meta data extraction (age, gender,
language, emotion)
– Audio-visual speech recognition
– Multi-lingual speech recognition
•
Information Retrieval
– Representation
– Clustering/segmentation/classifi
cation
– Integration with other processes
•
System and Architecture
– Natural Language Processing
•
Translation
Practical sessions
Possibility for small class projects:
Quickly develop multi-modal interfaces based on our context aware
SDK for iPhone …
User Activity and Context Detection with
Mobile Phones
Detect whether you are walking, sitting, in a meeting, concert or party,
or in an emergency situation …





Mobile phones are equipped with microphone and tilt sensors
Audio context is detected using microphone output
Physical activity signature is captured using tilt sensor output
Tilt and audio information are combined to detect context and/or user activity
Time, duration and other prior knowledge can be also integrated
Applications:
 Smart mobile phones
•
Control ring and other functionalities according to context
 Surveillance and organization (employees, elderly, children)
Information about user activity can be used for better
organization of employees and taking care of elderly and children
 Smart home environment
•
General Purpose Audio Switches
A Switch which can be triggered based on
speech commands or non-speech events
The commands or speech events can be
learned automatically
 Switch can be easily reconfigured for a
new command or application
Involves:
 Automatic language/event acquisition
 Robustness to different sources of variabilities
yeah
ae
Applications:
 Smart environments, security and
surveillance
Dreams:


You buy it, as you may buy a normal mechanical
switch in stores
It can be installed everywhere the same way as a
normal switch
y
Call Classification
Anger, Gender, Age, Language, …
Hierarchical design and discriminative training:



Discriminative representation of emotional states
Efficient fusion of different acoustic features with higher level
information (e.g. duration, message content)
Efficient feature selection mechanism, less computational load for
feature extraction
Pitch,
Intensity
Discriminative
transformatio
n
Combinatio
n
Textual data,
duration, …
Call
classification
Digital Logging of Physical Activities and Context
Enhancing Emergency and Security/Privacy
Functionalities in Mobile Phones
•Unexpected physical events experienced by a mobile phone can be signs of critical security or emergency scenarios:
•Having phone under the risk of being lost or stolen: confidential information on the phone can be exposed
•Phone user experiencing an accident
MobileHCI 2009, Ubicomp 2009
Digital Logging of Physical Activities and Context:
Entertainment: What Type of Music You May
Like to Hear?
 Automatic selection of music based on
context:
 Actual activity of user
 Audio activity in the environment
 Habits and music taste can be also
integrated
11th International ACM Conference on Computers and Accessibility (ASSETS 2009)
Interaction with Mobile User Interface
Sending
commands
Turning pages
Zooming
Click and Double
Click
Calling an
application or
service
Motivating Design of Very Small Mobile
Devices, Headsets, Wrist Watches, and
Portable Music Players
MagiSign: “3D Magnetic Signatures” for User
Identification/Authentication
The user creates his own arbitrary 3D signature using a
properly shaped magnet in the 3D space around the
device.
•
Wider choice for authentication as it can be
flexibly drawn in 3D space around the device.
•
No hardcopy of 3D magnetic signature can be
easily generated.
Unlike Regular signatures can not be affected
by the quality of paper, pen, ink, etc.
•
Call
classification
3D Magnetic Signature:
•
A simple 3D motion
•
Regular signature of the user drawn on the air!
•
Any other combination of even higher complexity
actively using all 3D space around the device.
A magnet as a physical key? A personalized magnet in
terms of shape and polarity can enhance the authentication
process …
1)
2)
3)
Can be used for accessing a service or data, entrance
doors, or simply instead of regular signature during a
purchase …
Even simple gestures may be used for
authentication
MagiWrite: Write It in the Air!
Text entry based on magnetic field
interaction
Character shaped gestures are written
in the space around the device
Suitable for dialling a number, entering
a pin code, selecting a text entry, etc.
Especially useful for very small mobile
devices in which it is hard to operate or
design small keypads or touch screens
MagiEntertain: Using Magnetic Interaction in
Mobile Entertainment Applications (Gaming
and Audio Synthesis)
Conventionally touch pads and touch screens are used
for gaming
• Screen occlusion
MagiGame: Actions of a game avatar such as shooting,
jumping, and changing the aim can be controlled
No screen occlusion, natural gesture based interaction,
more actions per minute, possibility of multi-player
gaming on a device
Adjusting different audio and DJ effects based on
position, orientation, and movements of the magnet
Changing sound volume and audio tracks in a portable
music player
New music instruments …, two players can play on the
same instrument
Literature
Basics:
Automatic Speech Recognition:
•Laurence Rabiner and Biing-Hwang Juang:
•Ernst Günter Schukat-Talamazzini: „Automatische Spracherkennung -Grundlagen, statistische Modelle und effiziente Algorithmen.“ (Vieweg,
1995)
„Fundamentals of speech recognition“ (Prentice
Hall, 1993)
•Bernd Pompino-Marschall: „Einführung in die
Phonetik“ (de Gruyter, 1995)
•Andreas Wendemuth: „Grundlagen der stochastischen
Sprachverarbeitung“ (Oldenburg, 2004)
•Richard O. Duda, Peter E. Hart, David G. Stork:
•Tanja Schultz und Katrin Kirchhoff: "Multilingual Speech Processing"
(Academic Press, 2006)
„Pattern Classification“ (Wiley, 2000)
• Fred Jelinek: „Statistical methods for speech processing“ (MIT, 1997)
•Keinosuke Fukunaga: „Statistical Pattern
Recognition“ (Academic Press, 1990)
•Thomas H. Cormen: „Introduction to Algorithms“
(MIT, 1990)
Exam?
Webpage
• Detailed information about our projects can be found at
http://www.deutsche-telekomlaboratories.de/~ketabdar.hamed/
• All the updated information, slides, etc. can be soon found at:
http://www.deutsche-telekomlaboratories.de/~ketabdar.hamed/teachingsection/index.htm
Download