MPI FOR BIOLOGICAL CYBERNETICS EU-geförderte "Cognitive Vision" Projekte am Max-Planck-Institut für biologische Kybernetik, AG Bülthoff Christian Wallraven Heinrich H. Bülthoff Martin Breidt, Douglas W. Cunningham, Cristobal Curio, Arnulf B.A. Graf, Markus Graf, Adrian Schwaninger DAGM, August 30th CogVis (Cognitive Vision) WP1 (Recognition & Categorisation) WP3 (Learning & Adaptation) MPI FOR BIOLOGICAL CYBERNETICS http://cogvis.nada.kth.se Computational Vision and Active Perception, Stockholm – Robotics Computer Science Department, Hamburg - Cognitive Systems Laboratory – Spatio-temporal reasoning MPI for Biological Cybernetics, Tübingen – Human psychophysics School of Computing Leeds University – Spatio-temporal learning and reasoning DIST, Genova – Robotics ETH Zurich, Dept of Computer Science – Computer Vision University of Ljubljana Computer and Information Science – Computer Vision The objective of this project is to provide the methods and techniques that enable construction of vision systems that can perform task oriented categorization and recognition of objects and events in the context of an embodied agent. DAGM, August 30th Workpackage 1: Recognition and Categorisation Objectives • Based on cognitive research on how humans recognise and categorise objects and scenes, we will build a computational system that is capable of recognising and categorising objects and events in a natural environment (such as a living-room). Description of work: • • • • Building a database of 3D objects and elementary gestures Cognitive basis for recognition and categorisation Dynamic multi-cue recognition Recognition of spatio-temporal structures and relations • • • • • • • DR.1.1 A database of solid 3D objects and gestures DR.1.2 Psychophysical results from experiments on recognition & categorisation DR.1.3 A recognition algorithm exploiting temporal continuity DR.1.4 A basis set of primitives and qualitative low-level structural relations suitable for object recognition DR.1.5 A computational recognition system grounded in cognitive research DR 1.6 Algorithms for robust subspace recognition DR 1.7 Algorithm for categorisation using subspace approach Deliverables: DAGM, August 30th Highlights of WP1 MPIK Cognitive Basis of Recognition & Categorization CSL, ETH, KTH, Computer Vision MPIK, UOL systems ETH, KTH, MPIK, UOL Modeling of cognitive studies DAGM, August 30th Cognitive basis of recognition& categorization Psychophysical experiments • How are visual categories formed? • What are the representations used by humans for recognition and categorization? • How are categorization and recognition connected? • What are the temporal aspects of recognition and categorization? • Is there a top-down influence of scene context on categorization? DAGM, August 30th Cognitive basis for recognition and categorization DAGM, August 30th Computer Vision Structured object representations • For categorization using local features • For recognition with spatio-temporal information Multi-cue recognition on a robot Subspace learning DAGM, August 30th Computer vision for modeling psychophysics • CogVis Morphed Objects Database • Psychophysical experiments: – Picture-word matching experiment → reaction times – Typicality task → typicality ratings • Computer vision experiment: – Subspace-based categorisation – Typicality ratings as temporal weights → uncertainty of categorisation, reconstruction errors Categorisation experiment - results Computer vision experiment: 8 8 7.5 7.5 7 7 6.5 6.5 Weight Tipicality rate Psychophysical experiment: 6 6 5.5 5.5 5 5 4.5 4.5 4 1% 25% 50% 75% Morph transformation 4 100% 1% 25% 50% 75% Morph transformation 100% weights TR 650 640 40 500 35 450 630 600 590 Reconstruction error 610 Uncertainty Reaction time 400 30 620 25 20 15 580 10 570 300 250 200 150 100 5 560 550 350 1% 25% 50% 75% Morph transformation RT 100% 0 50 1% 25% 50% 75% Morph transformation 100% uncertainty 0 1% 25% 50% 75% Morph transformation 100% reconstr. errors Workpackage 3: Learning and Adaptation Objectives: • How is knowledge about objects and events acquired and maintained? A computational system able to acquire and maintain representations useful for recognition and categorisation as well as control of attention will be developed. Description of work: • • • • Learning perception-action maps Learning event regularities Learning of efficient methods for categorisation of natural objects Statistical modelling of objects and events • • • • • • • • DR.3.1 Set-up for experimenting action learning DR.3.2 Initial implementation of sensorimotor representation for learning and shift of attention learning DR.3.3 quantitative analysis of the tradeoff between precision and number of classiers for two dierent tasks DR.3.4 Software package for learning and applying models of interactive behaviour DR.3.5 A system capable of robustly categorising the objects from the database in a real-world environment DR.3.6 Framework for the integration of statistical and logic-based models of objects and events DR 3.7 Algorithms for robust learning of subspace representations DR 3.8 Framework for robust continuous learning Deliverables: DAGM, August 30th Highlights I A system that learns simple games by observation • Uses vision components to identify simple visual events (laying down a card) • Using a reasoning engine (Progol), tries to find rulebased representation explaining the observed statespace Modeling categorization in humans • Combination of machinelearning and psychophysics • Which classifier explains human behaviour best? • Support Vector Machines seem best candidate DAGM, August 30th Man or Woman? Highlights II Multi-modal object representations • Access to robotic setup with arm and cameras allows to explore questions of multiple modalities • Idea: store matrix of transitions between all possible views, indexed by changes in the proprioceptive state • Exhaustive action/perception map (predicting views given an action, and vice versa) DAGM, August 30th IST project COMIC Conversational multi-modal interaction with computers MPI FOR BIOLOGICAL CYBERNETICS http://www.hcrc.ed.ac.uk/comic/ Max Planck Institute for Psycholinguistics, Nijmegen – Fundamental Cognitive Research Max Planck Institute for Biological Cybernetics, Tübingen – Fundamental Cognitive Research University of Nijmegen – ASR and AGR University of Sheffield – Dialogue and Action University of Edinburgh – Fission and Output DFKI, Saarbrücken – Fusion and System Integration ViSoft – Graphical part of Demonstrator Multimodal interaction will only be accepted by non-expert users if fundamental cognitive interaction capabilities of human beings are properly taken into account DAGM, August 30th Vision and approach of COMIC Obtain fundamental knowledge on multimodal interaction • use of speech, pen, and facial expressions Develop new approaches for component technologies that are guided by human factor experiments Obtain hands-on experience by building an integrated multimodal demonstrator for bathroom design that combines new approaches for: • • • • Automatic speech recognition Automatic pen gesture recognition Dialogue and Action management Output generation combining text and speech and facial expression • System integration • Cognitive knowledge DAGM, August 30th Fundamental Research on Facial Expressions Faces do a lot in a conversation • Lip motion for speaking • Emotional Expression (pleasure, surprise, fear) • Dialog flow (back-channeling: confusion, comprehension, agreement) • Co-expression (emphasis and word/topic stress) We aim to broaden the capabilities of Avatars, allowing for more sophisticated self expression and more subtle dialog control. To this end, we use psychophysical knowledge and procedures as a basis for synthesizing human conversational expressions. DAGM, August 30th Real, manipulated and virtual expressions Real expressions: • We recorded a variety of conversational expressions from several individuals. • Psychophysical experiments on identification and believability Manipulated expressions: • Using computer vision techniques, we manipulated these expressions to freeze selected parts of the face. • Psychophysical experiment on relative importance of each of these parts for recognition. Virtual expressions: • We designed and constructed a conversational avatar, capable of producing realistic-looking facial expressions • Suitable for human-computer interaction • Perfect tool for fully-controllable cognitive research on perception of facial expressions DAGM, August 30th The four faces of thought DAGM, August 30th The conversational avatar DAGM, August 30th IST project JAST Joint-Action Science & Technology MPI FOR BIOLOGICAL CYBERNETICS Nijmegen Institute for Cognition and Information – Human behaviour F.C. Donders Centre for Cognitive Neuroimaging – Imaging of human behaviour MPI for Psycholinguistics – Human dialogue behaviour MPI for Biological Cybernetics – Human behaviour Dept. of Computer Science, TU München – Robotics Institute of Communication and Computer Systems – Modeling University of Edinburgh, Human Communication Research Centre – Modeling Dept. of Industrial Electronics, Universidade do Minho – Robotics Dept. of Mathematics for Science and Technology, Universidade do Minho – Modeling and Robotics DAGM, August 30th Objectives build jointly-acting autonomous systems that communicate and work intelligently on mutual tasks ensure that the functionality of future technologies includes inherent concepts of cooperative behaviour DAGM, August 30th Milestones The construction of two fully functional autonomous agents that in cooperative configurations of two, three or more will allow, in principle, the completion of complex real-world assembly and construction tasks. The development of perceptual modules for object recognition and recognition of gestures and actions of the partner (human or robot) and the implementation of biologically inspired sensory-motor control schemes for the co-ordinated action of multiple cognitive systems. The development of cognitive control architectures for artificial agents based on neurocognitive experimental findings and the implementation of verbal and non-verbal communication structures on the basis of findings from psycholinguistic studies focusing on the role of dialogue in joint action. The implementation of goal-directed learning processes and sophisticated error monitoring, recognition, and repair strategies to produce a real-world assembly robot scenario that will be capable of partially self-organizing towards stable solutions, taking into account not only its own behaviour (e.g., self-generated errors) but also the behaviour of others (e.g., errors generated by a human or robot partner). DAGM, August 30th Other EU-funded projects at AG Bülthoff Touch HapSys – haptic systems: next generation haptic interfaces, visuo-haptic integration POEMS – perceptually-oriented ego-motion simulation: how to use audio and visual cues to generate ego-motion perception (VR) PRA (Network) – Perception for Recognition and Action ECVision (Network) – European Computer Vision network Enactive (Network) – multi-modal HCI interfaces DAGM, August 30th