Building Cognitive Systems

advertisement
MEETING ON
BUILDING COGNITIVE SYSTEMS
LUXEMBOURG, JULY 2, 2002
Contributors
HILARY BUXTON, UNIV. OF SUSSEX (UK)
JAN-OLOF EKLUNDH, KTH (S)
GÖSTA GRANLUND, LINKÖPING UNIV. (S)
BERNHARD NEBEL, ALBERT-LUDWIGS-UNIVERSITÄT FREIBURG (D)
SAJIT RAO, UNIV. OF GENOA (I)
DAVID VERNON, CAPTEC (IRL)
EC members
HORST FORSTER
GIOVANNI BATISTA VARILE
COLETTE MALONEY (RAPPORTEUR)
1. INTRODUCTION
One could claim that ubiquitous computing is already here – each year we produce
more than one processor chip for each person on the planet and the growth rate of
chip production is greater than the population growth rate. It might be harder to
make a similar claim about ambient intelligence. Central to the ambient
intelligence vision is the ability of computationally empowered devices to
interconnect with each other and with us. Using sensors to provide a window from
the world of interconnected computation into the real physical world, these
devices will sense the world around us and respond by interacting with the world or
by communicating with us. These devices will need to combine visual and auditory
capabilities to sense what is happening in the world and to understand and engage
in dialogue with humans – preferably without the need for keyboards and mice.
This is not just a matter of understanding commands, but also of understanding
context.
How do we build systems that can interact with the real world in an intelligent and
reasoned way? Until now we have built systems that can operate in restricted
domains or in carefully controlled environments, i.e., in artificially constrained
worlds, where models can be constructed with sufficient complexity to allow
algorithms to perform well. At the other extreme, we have been able to build
systems that do not rely on explicit models, but rather react to the real world.
Such systems require the programmer to anticipate all possible situations that
might be encountered.
Systems that respond purposefully to the real world rather than just reacting, must
operate between these extremes. Such systems must combine reaction with the
ability to explicitly represent heuristics, and to learn and reason in pursuit of goaldirected behaviour. Construction of these ‘cognitive’ systems will require:
1
integration of technologies that have evolved from several disciplines in order
to provide intelligent reasoning capabilities and the versatility that is needed
to interpret and interact with the real world1;
integration of information from multiple sensors and multiple cues in order to
allow constraints imposed by the real world (biological limitations, physical
laws,..) including information about context, to provide the robustness needed
for practical applications.
However, constructing and maintaining a coherent world model from the
contributions of a variety of sensors in a perceptual system is as yet a largely
unsolved problem.
The goal of developing cognitive systems, i.e., systems that can perceive and act,
reason and learn, and that are capable of interpretation and interaction in the
real-world environment, is not new. What makes this goal worth pursuing now is
that the computing power available today can support the development of systems
that can operate under real-time constraints2.
A technology of cognitive systems will enable progress in recognition and
categorization of objects, interpretation of activity and behaviour, visual guidance
and navigation, and speech communication with systems. The goal of this workshop
was to review the current status of research, to identify challenges and
opportunities for progress, and to recommend where future research efforts could
focus. The timeframe is from now until 2012.
2. STATUS OF RESEARCH TODAY
2.1 cognitive systems
Perception is fundamental to cognitive systems. Perception provides information
about the environment in which the cognitive system exists. Visual perception is a
particularly powerful sensing modality with many uses. However, several other
perceptual channels are of interest including auditory and tactile perception,
chemical senses such as taste and smell.
Cognitive systems will be characterised by their ability to learn adaptively in realtime from the perceptual input in order to perform specific goal-directed tasks.
This ability to acquire new knowledge and adapt existing knowledge to new
circumstances provides a means of dealing with the unrestricted real world
environment and of using generalised concepts across application domains.
Acquisition of knowledge – or learning – by autonomous interactions with the
environment will enable cognitive systems to perform tasks in ways that were not
conceived of in their design.
Many researchers advocate that cognitive systems should be physically embodied
and derive information from several perceptual modalities. Such systems must be
able to act on the world. Most importantly they should develop as complete
systems.
Cognitive systems will be adaptive and anticipatory, robust and autonomous,
interactive and dynamic. They will be diverse in form and function. They are
expected to play a key enabling role in applications ranging from image
interpretation (eg: in medical or aerospace domains), behavioural interpretation
1
2
computer vision, natural language processing, artificial intelligence, mathematics,
neuroscience, robotics,…
cf: Meeting on Cognitive Vision Systems, June 21, 2000
2
(eg: in crowd surveillance or traffic monitoring), human-machine interaction
(speech and activity recognition) to autonomous mobile robots working in
remote/hazardous environments.
Of particular interest are on the one hand, systems that will allow humans to
interact with machines in a more natural way and on the other, systems that will
support humans in performing tasks that are tedious, difficult or beyond their
capabilities.
2.2 perception
The purpose of perceptual processing is to produce a response. The response may
be an action upon the environment. It may be to reconfigure the system’s internal
models of interaction according to the context (current state of the environment),
Or, it may be to generate in a subsequent step a generalised symbolic
representation, which will allow the functional context to be communicated.
The functional context is important as we rarely use representations in an
intentional vacuum – we always have goals. Representations of context must go
beyond mapping of percepts to linguistic descriptions, as a purely descriptive basis
for understanding will not lead to the development of cognitive systems. Rather
the representation must be grounded in perception. Cognitive systems must thus
be able to act as well as perceive and they must be developed in a full perceptionaction feedback cycle.
It is expected that high performance systems for interpretation of static imagery,
would also be developed as cognitive systems in a perception-action feedback
process. Thus cognitive systems do not necessarily have to perform physical actions
in the external world at run-time, but may operate off-line. The output may not
be physical action but rather can be to communicate the intended actions to
another system.
Thus, cognitive systems can also be useful in applications which do not require
advanced mechanical manipulators. One such important application field is in
activity interpretation for human-machine interaction.
2.3 Outlook
Current successful applications of perceptual systems are primarily in well
understood worlds with limited complexity. Applications involving visual perception
today include video sequence analysis, visual surveillance, man-machine
interaction, and visual inspection. For the most part, these applications are
achieved using pattern recognition techniques with little or no cognitive ability.
Research in psychophysics and neuroscience today devotes considerable efforts to
problems in object recognition and categorization. The AI community is oriented
towards higher level symbolic processing such as reasoning and planning. In the
learning community there is limited attention to vision applications, partly due to
the limited capacity of today’s learning architectures.
Progress will require an integration of insights from these disciplines, which have
so far acted in relative isolation.
3. SIGNIFICANT ADVANCES / BREAKTHROUGHS OVER THE NEXT 5 – 10 YEARS
Context determines how to interpret sensory data. The interpretation of a percept,
whether as a set of pixels or some other sensory data, depends on context. The
representation and acquisition of context is an extremely important issue. The
extension of current recognition techniques to interpretation is not likely to work
3
without the use of context to limit and guide interpretation. The real
vision/perception/cognition problem is not to generate descriptions of shape or
models for this, but to robustly map percepts into action, function and behaviour.
We need to classify things according to what they can be used for or which goals
they can help us achieve. This is necessarily context- or application- specific.
A significant but necessary advance with respect to present day capabilities will be
the recognition of objects under relatively unrestricted conditions, such as variable
illumination, pose, scale, orientation against a structured background and partial
occlusion.
Greatly improved systems for speech recognition and synthesis will be of utmost
importance to support the development as well as the operation of applications in
cognitive vision. The interfacing of language systems to cognitive systems is an
important research challenge. In general terms this implies the removal or the
insertion of detailed, system specific context, to produce or receive symbols
sufficiently invariant to be communicable with another system.
Efficient training environments will be needed for increasingly complex systems, as
training will supplement algorithmic prescription. This will necessarily require the
construction of complete – probably hybrid symbolic-perceptual - systems to
facilitate the study of learning and emergent behaviour. One could argue that
machines with semantics of a human must be trained as if they were human.
A key issue will be to achieve behavioural plasticity – i.e., the ability of an
embodied system to do a task which it was not explicitly programmed to do.
4. MAIN DIFFICULTIES / CHALLENGES
Many basic problems remain. From a computational perspective:
 closing the loop in realistic test cases, i.e. building complete systems that can
deal with non-trivial cases;
 developing the underlying semantics for action (grounding language in
perception)
 combining perceptual and symbolic processes for interpretation of events and
generation of new behaviour;
 speed (of processing, memory access, learning, overall system).
From the perspective of managing complexity and achieving a balance between
distributed and centralised control:
 information representations which are sufficiently adaptable (e.g. generic vs
specific) to allow effective communication to be established between system
parts;
 obtaining a coherent global behaviour from the interactions of all the system
parts (ultimately hand-tuned and unscalable);
5. TARGETS FOR R + D IN EUROPE
Computer vision has emerged as a well defined domain with roots in geometry,
statistics, signal processing and informatics. Cognitive science is an experimental
field and uses methods from psychophysics and neuroscience. Part of cognition
deals with the processes going from percepts to actions, which require numerical
representations. Much of the exploration of human cognition does however involve
4
what should be viewed as symbolic levels of representation and processing, such as
categorization, reasoning and use of language. A skillful integration of these very
different domains is essential. Targets for interdisciplinary research include:
 investigation of evolutionary computation and machine learning as ways to
implement cognition;
 work on extending learning algorithms; on-line (incremental) learning of
conceptual models to allow systems to adapt continuously in real-time to task
and environment;
 studying the interplay of overt and covert attention for vision tasks – biological
vision;
 developing the mathematics of interaction invariance to model attention and
cognition;
 multimodal interaction, i.e., between symbolic speech structure and cognitive
object structure - psycholinguistics
A major target is to develop systems that allow performance to be extendable,
systems that can do more than “canned” predefined tasks. Operation in relatively
unconstrained environments will require systems to be capable of handling new
task specifications rapidly or on the fly. This is accomplished by learning. More
efficient information representations are needed to facilitate learning:
 locality of information representations to allow faster convergence in learning;
 representation of confidence or certainty; representation and handling of
sparse and incomplete data.
It is worth stressing that work should be performed in a systems perspective: there
should be a system that "perceives". Inspiration of how this can be done can be
obtained both from computer and systems science, and from biology.
Two specific targets in this context:
 designing a system with well-developed memory capacities (short-term, longterm, iconic, associative,...) that can be used in multiple applications;
 use and assessment of context and situations as well as system drives and tasks.
In summary, the key to progress lies in the ability of systems to acquire information
about the world through sensory channels, and by combining the perceptual input
with computation, extract the knowledge needed to perform tasks. New behaviour
must driven by knowledge acquired through interaction. This in turn requires a
developmental paradigm for managing emergent behaviour.
The only cognitive systems that we know develop as complete systems. We need to
build systems!
5
Download