Didier Perroud Raynald Seydoux Frédéric Barras Abstract Objectives Modalities ◦ Project modalities ◦ CASE/CARE Implementation ◦ VICI, Iphone, Voice recognition, Network Demonstration Conclusion Coordination between two persons to move a ball into a labyrinth Rotation possible on the x and y axis Gates can be opened with vocal and gestural commands Coordinate the following technologies: ◦ Augmented reality with tags ◦ Gesture detection ( with Iphone accelerometers) ◦ Voice recognition ( words) ◦ Collaborative environments ◦ Physic engine Inputs ◦ Hand rotation in x and y axis ( one axis per player) direct manipulation of the labyrinth board ◦ Hand pumping for gates’ openings ◦ Voice recognition (words) for selecting gate to open and start the game Outputs ◦ Image on the beamer ◦ Iphone vibrations CASE ◦ Semantic level of abstraction CARE ◦ Gesture orientation: assignment ◦ Gesture pumping/Voice selection: complementary to open a gate ◦ Voice commands: assignment Decision level fusion Fission: image, vibration Blocks ◦ Webcam, Tag detection ◦ OpenGL, Physic engine Multimodality Management Augmented reality application Messages from the gateway Messages to the gateway ◦ state machine ◦ event based ◦ Voice events ◦ Gesture events (orientation X and Y, shake) ◦ Vibration events Handle the UIAccelerometer interface Generate motionEvent when shaking Messages to the gateway ◦ Orientations (X or Y) ◦ Shake Messages from the gateway ◦ Vibrate Windows speech API SDK Features: ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ API definition files Runtime component Control Panel applet Text-To-Speech engines in multiple languages. Speech Recognition engines in multiple languages. Redistributable components Sample application code. Sample engines Documentation. Our System A speech recognition engine A grammar <grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en-EN" version="1.0"> <rule id="Labyrinth" scope="public"> <one-of> <item>New game</item> <item>Pause</item> <item>Exit</item> <item>Open gate one</item> <item>Open gate two</item> <item>Close gate one</item> <item>Close gate two</item> </one-of> </rule> </grammar> Recognition comparison before training / after training Live Videos Problems with the physic engine ◦ Coordination user moves – physic moves Voice recognition OK High-level programing Heterogeneity not a problem Functional prototype Thank you