Cognitive Vision

advertisement
MPI FOR BIOLOGICAL CYBERNETICS
EU-geförderte
"Cognitive Vision" Projekte am
Max-Planck-Institut für biologische
Kybernetik, AG Bülthoff
Christian Wallraven
Heinrich H. Bülthoff
Martin Breidt, Douglas W. Cunningham, Cristobal Curio,
Arnulf B.A. Graf, Markus Graf, Adrian Schwaninger
DAGM, August 30th
CogVis (Cognitive Vision)
WP1 (Recognition & Categorisation)
WP3 (Learning & Adaptation)
MPI FOR BIOLOGICAL CYBERNETICS
http://cogvis.nada.kth.se
Computational Vision and Active Perception, Stockholm – Robotics
Computer Science Department, Hamburg - Cognitive Systems Laboratory
– Spatio-temporal reasoning
MPI for Biological Cybernetics, Tübingen – Human psychophysics
School of Computing Leeds University – Spatio-temporal learning and
reasoning
DIST, Genova – Robotics
ETH Zurich, Dept of Computer Science – Computer Vision
University of Ljubljana Computer and Information Science – Computer
Vision
The objective of this project is to provide the methods and techniques
that enable construction of vision systems that can perform task oriented
categorization and recognition of objects and events in the context of an
embodied agent.
DAGM, August 30th
Workpackage 1: Recognition and
Categorisation
 Objectives
• Based on cognitive research on how humans recognise and
categorise objects and scenes, we will build a computational
system that is capable of recognising and categorising objects
and events in a natural environment (such as a living-room).
 Description of work:

•
•
•
•
Building a database of 3D objects and elementary gestures
Cognitive basis for recognition and categorisation
Dynamic multi-cue recognition
Recognition of spatio-temporal structures and relations
•
•
•
•
•
•
•
DR.1.1 A database of solid 3D objects and gestures
DR.1.2 Psychophysical results from experiments on recognition & categorisation
DR.1.3 A recognition algorithm exploiting temporal continuity
DR.1.4 A basis set of primitives and qualitative low-level structural relations suitable for object recognition
DR.1.5 A computational recognition system grounded in cognitive research
DR 1.6 Algorithms for robust subspace recognition
DR 1.7 Algorithm for categorisation using subspace approach
Deliverables:
DAGM, August 30th
Highlights of WP1
MPIK
Cognitive Basis of
Recognition &
Categorization
CSL, ETH, KTH, Computer Vision
MPIK, UOL
systems
ETH, KTH,
MPIK, UOL
Modeling of
cognitive studies
DAGM, August 30th
Cognitive basis of recognition& categorization
 Psychophysical experiments
• How are visual categories formed?
• What are the representations used by humans
for recognition and categorization?
• How are categorization and recognition
connected?
• What are the temporal aspects of recognition
and categorization?
• Is there a top-down influence of scene context
on categorization?
DAGM, August 30th
Cognitive basis for recognition and
categorization
DAGM, August 30th
Computer Vision
Structured object representations
• For categorization using local features
• For recognition with spatio-temporal
information
Multi-cue recognition on a robot
Subspace learning
DAGM, August 30th
Computer vision for modeling psychophysics
• CogVis Morphed Objects Database
• Psychophysical experiments:
– Picture-word matching experiment → reaction times
– Typicality task → typicality ratings
• Computer vision experiment:
– Subspace-based categorisation
– Typicality ratings as temporal weights
→ uncertainty of categorisation, reconstruction errors
Categorisation experiment - results
Computer
vision
experiment:
8
8
7.5
7.5
7
7
6.5
6.5
Weight
Tipicality rate
Psychophysical
experiment:
6
6
5.5
5.5
5
5
4.5
4.5
4
1%
25%
50%
75%
Morph transformation
4
100%
1%
25%
50%
75%
Morph transformation
100%
weights
TR
650
640
40
500
35
450
630
600
590
Reconstruction error
610
Uncertainty
Reaction time
400
30
620
25
20
15
580
10
570
300
250
200
150
100
5
560
550
350
1%
25%
50%
75%
Morph transformation
RT
100%
0
50
1%
25%
50%
75%
Morph transformation
100%
uncertainty
0
1%
25%
50%
75%
Morph transformation
100%
reconstr. errors
Workpackage 3: Learning and Adaptation
 Objectives:
• How is knowledge about objects and events acquired and
maintained? A computational system able to acquire and
maintain representations useful for recognition and categorisation
as well as control of attention will be developed.
 Description of work:

•
•
•
•
Learning perception-action maps
Learning event regularities
Learning of efficient methods for categorisation of natural objects
Statistical modelling of objects and events
•
•
•
•
•
•
•
•
DR.3.1 Set-up for experimenting action learning
DR.3.2 Initial implementation of sensorimotor representation for learning and shift of attention learning
DR.3.3 quantitative analysis of the tradeoff between precision and number of classiers for two dierent tasks
DR.3.4 Software package for learning and applying models of interactive behaviour
DR.3.5 A system capable of robustly categorising the objects from the database in a real-world environment
DR.3.6 Framework for the integration of statistical and logic-based models of objects and events
DR 3.7 Algorithms for robust learning of subspace representations
DR 3.8 Framework for robust continuous learning
Deliverables:
DAGM, August 30th
Highlights I
 A system that learns simple
games by observation
• Uses vision components to
identify simple visual events
(laying down a card)
• Using a reasoning engine
(Progol), tries to find rulebased representation
explaining the observed statespace
 Modeling categorization in
humans
• Combination of machinelearning and psychophysics
• Which classifier explains
human behaviour best?
• Support Vector Machines seem
best candidate
DAGM, August 30th
Man or
Woman?
Highlights II
 Multi-modal object
representations
• Access to robotic setup with
arm and cameras allows to
explore questions of multiple
modalities
• Idea: store matrix of
transitions between all
possible views, indexed by
changes in the
proprioceptive state
• Exhaustive action/perception
map (predicting views given
an action, and vice versa)
DAGM, August 30th
IST project COMIC
Conversational multi-modal
interaction with computers
MPI FOR BIOLOGICAL CYBERNETICS
http://www.hcrc.ed.ac.uk/comic/
Max Planck Institute for Psycholinguistics, Nijmegen – Fundamental
Cognitive Research
Max Planck Institute for Biological Cybernetics, Tübingen –
Fundamental Cognitive Research
University of Nijmegen – ASR and AGR
University of Sheffield – Dialogue and Action
University of Edinburgh – Fission and Output
DFKI, Saarbrücken – Fusion and System Integration
ViSoft – Graphical part of Demonstrator
Multimodal interaction will only be accepted by non-expert users if
fundamental cognitive interaction capabilities of human beings are properly
taken into account
DAGM, August 30th
Vision and approach of COMIC
 Obtain fundamental knowledge on multimodal interaction
• use of speech, pen, and facial expressions
 Develop new approaches for component technologies that are
guided by human factor experiments
 Obtain hands-on experience by building an integrated
multimodal demonstrator for bathroom design that combines
new approaches for:
•
•
•
•
Automatic speech recognition
Automatic pen gesture recognition
Dialogue and Action management
Output generation combining text and speech and facial
expression
• System integration
• Cognitive knowledge
DAGM, August 30th
Fundamental Research on Facial Expressions
 Faces do a lot in a conversation
• Lip motion for speaking
• Emotional Expression (pleasure, surprise, fear)
• Dialog flow (back-channeling: confusion, comprehension,
agreement)
• Co-expression (emphasis and word/topic stress)
 We aim to broaden the capabilities of Avatars,
allowing for more sophisticated self expression and
more subtle dialog control.
 To this end, we use psychophysical knowledge and
procedures as a basis for synthesizing human
conversational expressions.
DAGM, August 30th
Real, manipulated and virtual expressions
 Real expressions:
• We recorded a variety of conversational expressions from several
individuals.
• Psychophysical experiments on identification and believability
 Manipulated expressions:
• Using computer vision techniques, we manipulated these
expressions to freeze selected parts of the face.
• Psychophysical experiment on relative importance of each of
these parts for recognition.
 Virtual expressions:
• We designed and constructed a conversational avatar, capable of
producing realistic-looking facial expressions
• Suitable for human-computer interaction
• Perfect tool for fully-controllable cognitive research on perception
of facial expressions
DAGM, August 30th
The four faces of thought
DAGM, August 30th
The conversational avatar
DAGM, August 30th
IST project JAST
Joint-Action Science & Technology
MPI FOR BIOLOGICAL CYBERNETICS
Nijmegen Institute for Cognition and Information – Human behaviour
F.C. Donders Centre for Cognitive Neuroimaging – Imaging of human
behaviour
MPI for Psycholinguistics – Human dialogue behaviour
MPI for Biological Cybernetics – Human behaviour
Dept. of Computer Science, TU München – Robotics
Institute of Communication and Computer Systems – Modeling
University of Edinburgh, Human Communication Research Centre –
Modeling
Dept. of Industrial Electronics, Universidade do Minho – Robotics
Dept. of Mathematics for Science and Technology, Universidade do
Minho – Modeling and Robotics
DAGM, August 30th
Objectives
 build jointly-acting autonomous systems that
communicate and work intelligently on
mutual tasks
 ensure that the functionality of future
technologies includes inherent concepts of
cooperative behaviour
DAGM, August 30th
Milestones
 The construction of two fully functional autonomous agents that in
cooperative configurations of two, three or more will allow, in
principle, the completion of complex real-world assembly and
construction tasks.
 The development of perceptual modules for object recognition and
recognition of gestures and actions of the partner (human or robot)
and the implementation of biologically inspired sensory-motor
control schemes for the co-ordinated action of multiple cognitive
systems.
 The development of cognitive control architectures for artificial
agents based on neurocognitive experimental findings and the
implementation of verbal and non-verbal communication structures
on the basis of findings from psycholinguistic studies focusing on the
role of dialogue in joint action.
 The implementation of goal-directed learning processes and
sophisticated error monitoring, recognition, and repair strategies to
produce a real-world assembly robot scenario that will be capable of
partially self-organizing towards stable solutions, taking into
account not only its own behaviour (e.g., self-generated errors) but
also the behaviour of others (e.g., errors generated by a human or
robot partner).
DAGM, August 30th
Other EU-funded projects at AG Bülthoff
 Touch HapSys – haptic systems: next
generation haptic interfaces, visuo-haptic
integration
 POEMS – perceptually-oriented ego-motion
simulation: how to use audio and visual cues
to generate ego-motion perception (VR)
 PRA (Network) – Perception for Recognition
and Action
 ECVision (Network) – European Computer
Vision network
 Enactive (Network) – multi-modal HCI
interfaces
DAGM, August 30th
Download