Computer-supported Interaction Hamed Ketabdar Shiva Sundaram Computer-supported interaction • Technologies which support interaction between human, machine and environment • Capturing, processing, and retrieving multimedia Computer-supported interaction Schedule: Thursday 14-16h, FR 0512C, starting 04.11.2010 Hamed Ketabdar: PhD in Electrical Engineering from Swiss Federal Institute of Technology at Lausanne (EPFL) Hamed.Ketabdar@telekom.de Shiva Sundaram: PhD in Electrical Engineering from University of Southern California (USC) Shiva.Sundaram@telekom.de Outline • Multi-modal Interfaces – Input methods: keyboard, pen, voice, gesture, tactile (touch) … – Output modalities: audio, video, tactile – Fusion of modalities • Speech Processing – Speech Recognition: statistical methods, acoustic modelling, and decoding – Meta data extraction (age, gender, language, emotion) – Audio-visual speech recognition – Multi-lingual speech recognition • Information Retrieval – Representation – Clustering/segmentation/classifi cation – Integration with other processes • System and Architecture – Natural Language Processing • Translation Practical sessions Possibility for small class projects: Quickly develop multi-modal interfaces based on our context aware SDK for iPhone … User Activity and Context Detection with Mobile Phones Detect whether you are walking, sitting, in a meeting, concert or party, or in an emergency situation … Mobile phones are equipped with microphone and tilt sensors Audio context is detected using microphone output Physical activity signature is captured using tilt sensor output Tilt and audio information are combined to detect context and/or user activity Time, duration and other prior knowledge can be also integrated Applications: Smart mobile phones • Control ring and other functionalities according to context Surveillance and organization (employees, elderly, children) Information about user activity can be used for better organization of employees and taking care of elderly and children Smart home environment • General Purpose Audio Switches A Switch which can be triggered based on speech commands or non-speech events The commands or speech events can be learned automatically Switch can be easily reconfigured for a new command or application Involves: Automatic language/event acquisition Robustness to different sources of variabilities yeah ae Applications: Smart environments, security and surveillance Dreams: You buy it, as you may buy a normal mechanical switch in stores It can be installed everywhere the same way as a normal switch y Call Classification Anger, Gender, Age, Language, … Hierarchical design and discriminative training: Discriminative representation of emotional states Efficient fusion of different acoustic features with higher level information (e.g. duration, message content) Efficient feature selection mechanism, less computational load for feature extraction Pitch, Intensity Discriminative transformatio n Combinatio n Textual data, duration, … Call classification Digital Logging of Physical Activities and Context Enhancing Emergency and Security/Privacy Functionalities in Mobile Phones •Unexpected physical events experienced by a mobile phone can be signs of critical security or emergency scenarios: •Having phone under the risk of being lost or stolen: confidential information on the phone can be exposed •Phone user experiencing an accident MobileHCI 2009, Ubicomp 2009 Digital Logging of Physical Activities and Context: Entertainment: What Type of Music You May Like to Hear? Automatic selection of music based on context: Actual activity of user Audio activity in the environment Habits and music taste can be also integrated 11th International ACM Conference on Computers and Accessibility (ASSETS 2009) Interaction with Mobile User Interface Sending commands Turning pages Zooming Click and Double Click Calling an application or service Motivating Design of Very Small Mobile Devices, Headsets, Wrist Watches, and Portable Music Players MagiSign: “3D Magnetic Signatures” for User Identification/Authentication The user creates his own arbitrary 3D signature using a properly shaped magnet in the 3D space around the device. • Wider choice for authentication as it can be flexibly drawn in 3D space around the device. • No hardcopy of 3D magnetic signature can be easily generated. Unlike Regular signatures can not be affected by the quality of paper, pen, ink, etc. • Call classification 3D Magnetic Signature: • A simple 3D motion • Regular signature of the user drawn on the air! • Any other combination of even higher complexity actively using all 3D space around the device. A magnet as a physical key? A personalized magnet in terms of shape and polarity can enhance the authentication process … 1) 2) 3) Can be used for accessing a service or data, entrance doors, or simply instead of regular signature during a purchase … Even simple gestures may be used for authentication MagiWrite: Write It in the Air! Text entry based on magnetic field interaction Character shaped gestures are written in the space around the device Suitable for dialling a number, entering a pin code, selecting a text entry, etc. Especially useful for very small mobile devices in which it is hard to operate or design small keypads or touch screens MagiEntertain: Using Magnetic Interaction in Mobile Entertainment Applications (Gaming and Audio Synthesis) Conventionally touch pads and touch screens are used for gaming • Screen occlusion MagiGame: Actions of a game avatar such as shooting, jumping, and changing the aim can be controlled No screen occlusion, natural gesture based interaction, more actions per minute, possibility of multi-player gaming on a device Adjusting different audio and DJ effects based on position, orientation, and movements of the magnet Changing sound volume and audio tracks in a portable music player New music instruments …, two players can play on the same instrument Literature Basics: Automatic Speech Recognition: •Laurence Rabiner and Biing-Hwang Juang: •Ernst Günter Schukat-Talamazzini: „Automatische Spracherkennung -Grundlagen, statistische Modelle und effiziente Algorithmen.“ (Vieweg, 1995) „Fundamentals of speech recognition“ (Prentice Hall, 1993) •Bernd Pompino-Marschall: „Einführung in die Phonetik“ (de Gruyter, 1995) •Andreas Wendemuth: „Grundlagen der stochastischen Sprachverarbeitung“ (Oldenburg, 2004) •Richard O. Duda, Peter E. Hart, David G. Stork: •Tanja Schultz und Katrin Kirchhoff: "Multilingual Speech Processing" (Academic Press, 2006) „Pattern Classification“ (Wiley, 2000) • Fred Jelinek: „Statistical methods for speech processing“ (MIT, 1997) •Keinosuke Fukunaga: „Statistical Pattern Recognition“ (Academic Press, 1990) •Thomas H. Cormen: „Introduction to Algorithms“ (MIT, 1990) Exam? Webpage • Detailed information about our projects can be found at http://www.deutsche-telekomlaboratories.de/~ketabdar.hamed/ • All the updated information, slides, etc. can be soon found at: http://www.deutsche-telekomlaboratories.de/~ketabdar.hamed/teachingsection/index.htm