Abstract - Middlesex University

advertisement
Grounding Symbols:
Learning 2D Shapes using Cell Assemblies that emerge from
fLIF neurons
Fawad Jamshed and Christian Huyck
School Of Engineering and Information Sciences- Middlesex University
The Burroughs –London NW4 4BT, UK
f.jamshed@mdx.ac.uk, c.huyck@mdx.ac.uk
Abstract
If a system can represent knowledge symbolically, and ground those symbols in an environment, then it has
access to a vast range of data from that environment. The system described in this paper acts in a simple
virtual world. It is implemented solely in fatiguing Leaky Integrate and Fire neurons; it views the environment,
processes natural language commands, plans and acts. This paper describes how visual representations are
labelled, thus gaining associations with symbols. The labelling is done in a semi-supervised manner with
simultaneous presentation of the word (label) and a corresponding item in the visual field. The paper then
shows how these grounded symbols can be useful in reference resolution. All tests performed worked perfectly.
1.
Introduction
A major hurdle in the development of an artificial intelligent agent is the symbol grounding problem (SGP) [6,
20]. A symbol can be defined as an association with an object due to a social convention and usually has an
arbitrary shape with no resemblance to its referent. Each symbol is a part of wider and more complex system
20,22]. Any symbol is meaningless to its user unless, somehow, it is given some meaning. Once a symbol gets
its meaning, it is grounded. How an artificial agent can develop the meanings of symbols autonomously is the
SGP [3, 4, 7, 8].
The SGP is one of the most important open questions in the philosophy of information [23].
Manipulating meaningless symbols into another meaningless symbol is not intelligence [18]. Most artificial
agents do not understand the meanings of the symbols they are processing and mostly they are just processing
information according to predesigned algorithms. Instead of defining symbols in the form of other ungrounded
symbols, a system might ground them in such a way that they independently have meaning without getting any
significant help from any external source [1, 2, 21].
2.
Theoretical Background and Previous Work
The SGP has been in existence for hundreds of years. As knowledge about human cognition has advanced, more
candidate symbol grounding solutions have been proposed. Especially since the development of connectionist
systems, which are inspired by biological neurons, there have been more ideas and solutions to address the SGP.
This paper presents simulations that begin to address the SGP. The simulations are based on fatiguing Leaky
Integrate and Fire (fLIF) neurons [13]. They also make use of the cell assembly (CA) concept; a CA is a set of
neurons with high mutual synaptic strength that is the neural representation of a concept [9]. A brief description
of CAs and fLIF neurons is given below.
2.1 Fatigue Leaky Integrate and Fire neurons (fLIF)
fLIF neurons are a relatively simple model of biological neurons [12]. The model used in this paper makes use
of discrete cycles. Each neuron has some activation, which it receives from other neurons. If a neuron has
enough activation at the end of a cycle, it will fire, spread activation to connected neurons, and will lose all its
energy. Neurons are connected to other neurons with unidirectional, weighted connections. If a neuron fires, it
will pass the activation value of the weight of the connection. If a neuron’s activation is less than the threshold,
it will not fire but will lose some of its activation as it leaks away. fLIF neurons also fatigue just like biological
neurons [14]. If a neuron fires regularly it becomes harder to fire.
As neurons fatigue, they become more difficult to fire again. This is modelled by increasing the
threshold of a neuron as describe in equation 1.
T(t) = T(t-1) + Fc
Equation 1
Where T is the threshold at time t, T(t-1) is a threshold at time t-1 and Fc is fatigue constant. If a neuron does
not fire the threshold associated with that neuron decreases using the fatigue recovery constant as shown in
equation 2. The threshold never goes below the base activation threshold.
T(t) = T(t-1) + Fr
Equation 2
If a neuron does not fire at a given time, some of its energy leaks away but it still integrates energy from the
surrounding active neurons. This is modelled by calculating the activation as describe in equation 3.
A(t) = A(t-1)/D + C
Equation 3
Where A(t) is the activation at time t and is sum of activation A at t-1 time, reduced by a decay constant D, and
C, the sum of incoming activation of all active neurons that are connected to a given neuron and fired at time t1. The value of C is determined by multiplying the incoming activation on all connected links with the
associated weights of those links.
2.2 Cell Assemblies
CAs were proposed by Hebb sixty years ago [9] and still today successfully explain how the human brain learns
different concepts and how they are stored. A single neuron does not represent a memory but a large number of
neurons represent each concept in the human brain. The neurons for a particular concept have high mutual
connection strength that can support a reverberating circuit. This circuit is a CA, and can continue to fire after
the initial stimulus ceases. The learning of a CA is done by a Hebbian learning rule. Hebbian learning states that
the connection strength between two neurons is related to how frequently they fire simultaneously.
When an external input is applied to neurons, the strength of the connections between neurons is
adjusted accordingly. The repeated presentation of input increases the strength of the connection between
simultaneously active neurons while decreasing the connection strength between other neurons. The set of
neurons with increased synaptic strength forms a CA. CAs are reverberating circuits. Initial firing of some
neurons in the CA can lead to further firing of other neurons in the CA due to high connection strength. This
then can lead to a cascade of firing called CA ignition [11, 13, 15, 24].
One advantage of using CAs is they can be used as long and short term memories. A short term
memory persists as long as neurons are firing. Long term memories are formed by synaptic modification due to
the Hebbian learning rule. This dual dynamics (ignition and learning) of a CA makes it more suitable for
developing powerful computational devices [17]. Thus a wide range of tasks can be modelled using CAs.
3 Proposed Work
Labelling is a simple form of symbol grounding. A system, based on an existing agent that contains an existing
semantic CA and an existing label CA, is developed. An association between the semantic and label CAs is
learned. Next, these labels are exploited to provide a form of reference resolution. These are relatively simple
tasks that are a proof of concept.
Labelling depends on categories. Categories are very important as they help in identifying the class of
an object. By putting things together which have similar features, the system learns to categorise things [5]. A
category is represented by a CA. Prior work has shown that CAs can be learned from environmental stimuli
[13]. While they may be learned, it is also possible to set the topology of the system so that a particular CA
already exists.
One theory states that a concept is represented by a semantic pole and a phonological pole [18]. A CA
for the category would represent the semantic pole, and a different CA for the label would be the phonological
pole. If a system has a semantic CA, and a label CA, it can attach them to each other, which means the symbol is
now grounded. By having this iconic representation of categories, the system has attached a name to a category.
Symbol grounding can be used to address the reference resolution problem. Reference resolution is a
common problem in natural language. For example in the sentence
We saw a doll with a black jacket on and it was quite big.
Example 1
The pronoun it can refer to either doll or to the jacket. If the system can decide which, it is resolving the
pronoun. In resolving the pronoun, the system could ignite both the semantic CA and the label CA associated
with the item to which the pronoun is resolved.
4 Simulations
The simulations described below are an extension of the first version of the Cell Assembly Robot (CABot1)
[16]. CABot1 does no real learning. The first simulation shows how a slight modification along with learning
allows the attachment of labels. The second simulation shows how the now labelled semantic CAs can be used
for reference resolution.
CABot
The main aim of CABot is to develop an agent in simulated neurons, which can take natural language as input
and interact with the environment without any external help. By interacting with the environment, it is hoped
that it can learn the semantics of the environment sufficiently well to improve language processing.
For CABot1, a virtual 3D environment was established based on the Crystal Space games engine. Two
agents were placed in the environment, the first controlled by a user, and the second was the CABot1 agent. All
processing in CABot1 was done by a complex network of fLIF neurons, though it emitted symbolic commands
to the Crystal Space stub.
Figure1 Instance of pyramid in virtual environment
Figure2 Instance of stalactite in virtual environment
A complete description of CABot1 is beyond the scope of this paper but further information can be found
elsewhere [16]. A total of 21 sub-networks are used to subdivide the tasks of vision, natural language parsing,
planning, action and system control.
The important subnets for the purposes of this paper are the vision nets and the word nets. There are
three vision subnets, a simulated retina, a primary visual cortex and a secondary visual cortex (V2). These
systems were hard coded, so there was no learning. Visual input was in the form of a bitmap representation of a
view of the game from the agent’s perspective. In particular, the secondary visual cortex subnet was set to
recognise pyramids and stalactites. If one of these was present in the game, a CA in V2 ignited. There were
several position and size dependent CAs associated with both pyramid and stalactite. Figure 1 and figure 2
shows instance of pyramid and stalactite respectively.
Similarly, the parsing component had CAs for words. In the game, the user issues natural language
commands to tell the agent what to do. There was a noun subnet used during parsing and an instance subnet to
store semantic roles during parsing. Both noun and instance subnets had CAs for both pyramid and stalactite
labels.
4.1 Grounding five basic 2-D Shapes
Learning was introduced in to the system with the help of six visual sub nets. Five shapes used are: pyramid,
stalactite, diamond, square and right angle triangle. Currently the vision system consists of six nets which are
Input net, Retina net, V1 net, V1A net, V2 net and V4 net. Each of these six sub nets performs a unique
function.
The Input net displays the input from the environment whereas the Retina net is a series of OnOff and OffOn
detector. The V1 net is position dependent and detects first order features of a solid shape in the picture, whereas
the V2 net detects the second order features. The V1A net is a position independent model of the V1 net. The
V4 net identifies the shape of an object with the help of the second order features which are detected in the V2
net. The detailed working of the vision system is defined below.
Figure 3 Diamond
Figure 4 Pyramid
Figure 5 Right angled Triangle Figure 3 Square
Figure 7 Stalactite
The Input net gets the input from the system in the shape of bits and displays it on the screen. The input is
usually in the shape of pictures but shapes can be hard coded in the system. The Retina net is a biological
plausible model of the OnOff and OffOn detectors that are found in biological systems; it gets the input from
input net, and feeds its output to the V1 net. Three different types of OnOff and Off On detectors are used in the
Retina net 3 by 3, 6 by 6 and 9 by 9 detectors.
V1 is position dependent and gets the input from the OnOff detectors and identifies the first order features e.g.
edges and angles of a solid shape in the picture. The V1 net responds to different types of edges and angles
presented. The connections from the V1 net were made position independent by introducing the V1A net and
making random connections from each V1 CA to the corresponding V1A CA. The V1A net has direct
connections from the V1 net only. The neurons of the V1A net has low decay rate of 1.01 to promote even a
small firing of neurons in V1.
The V1 net and the retina net used are the modified version of the V1net and the retina net of CABot1. More
CAs are introduce in the V1 net vertical edge CA ,and four right angle CA. Whereas theV2 net, the V4 net and
the VT net were introduced into the CABot system for this experiment.
The V2 net gets input from the V1A net, which is the position independent version of the V1 Net. When a three
or four edged shape is presented, each CA of the V2 net gets three inputs from three CAs of the V1A net. CAs
of the V2 net are only ignited when all of three CAs in the V1A net is active when a shape is presented. The V2
net output is used as an input to the V4 net where final shape is determined.
The V4 net is the final part of the vision system where all the shapes are discriminated. The V4 net and the V2
net are fully connected which means each of the CA in V2 net is connected with all CAs of V4 and vice versa.
Learning is carried out between V2 and V4 nets by learning the appropriate connections.
The same vision topology was used within V1A net, V2 net and V4 net. In this vision topology, twenty percent
of the neurons used were inhibitory neurons while eighty percent of the neurons used were excitatory neurons
except in the case of the V4 net where there are inhibitorier intra CAs connections to promote winner takes all
situations so that only one CA will eventually be on. Each inhibitory neuron in a CA of the V4 net is connected
with 1154 other neurons of other CAs where each connection has a high synaptic strength of 30.
No. of Neurons Firing in CA
Pyrami
of V4 net
d
Stalactite
Square
Diamond
Rg triangle
When pyramid is presented
490
0
0
0
0
When Diamond is presented
0
0
0
489
0
Table 1: V4 net CA during testing phase
Table 3 shows the result of a successful test when a pyramid and when a diamond is presented. When pyramid
and diamond shapes are presented during the test, the number of neurons firing in the corresponding CAs of V4
show that specific CAs are committed to pyramid and diamond respectively, whereas the other CAs of the V4
net do not respond.
The simulation is termed successful when all of the five CAs of the V4 net are committed to the five different
distinctive shapes, whereas the quality of the success is determined by how well these five shapes get learned
CAs to respond when different shapes are presented.
To prevent the same shape from getting committed to more than one CA of theV4 net, a winner takes all
strategy is used within the V4 net. In order to promote a winner take all strategy lots of inhibitory connections
are used between the five different CAs of theV4 net so they compete with each other to win but only one of
them wins.
In order to prevent a CA from winning on two different shapes, another net the VT net is used. The VT net uses
the same topology as V1A, V2 and V4 net do and consists of five CAs, where each CA consists of five hundred
neurons. There are strong connections between each CA of V4 net and VT net and vice versa. The connections
from V4 to VT net are plastic whereas the connections from VT net to V4 net are non plastic. During learning,
the connections between the V4 net CAs and the corresponding VT net CAs are adjusted and learned.
The idea behind using the VT net is that its CAs helps corresponding CAs of theV4 net to compete with other
CAs of the V4 net. During the learning phase, the competing CAs of the V4 net also ignites the corresponding
CAs of the VT net due to the strong connections between these CAs. These ignited VT CAs in return transfer
energy back to the V4 net as they also have strong connections corresponding CAs of the V4 net, thus helping
them to compete.
When one of the CAs of the V4 net wins, after competing with other CAs of the V4 net, the connections
between that CA and its corresponding VT net CA gets weakened by reducing the weights between the
connections where post and pre neurons are co firing. Next time, when this CA tries to compete there will be
less help from the corresponding VT net CA thus it will have fewer chances to win.
No Of Neurons firing
600
500
400
300
200
100
0
Pyramid
Stalactite
Diamond
Square
CANT Steps
Right Triangle
Graph 1 Typical example of winner takes all in this case its triangle
The training runs for 1250 cycles where each of the five shapes was presented for 250 cycles. The connections
between the V2 net CAs and the corresponding V4 net CA are adjusted and learned using Hebbian learning. The
learning is bi-directional with weights on connections from the V2 net to the V4 net and weights from the V4
net to V2 net being learned at the same time.
The test runs for 2500 steps. During the testing part of the simulation, shapes were presented in a random order.
After presenting the complete set of five shapes in 1250 cycles, another set of shapes was presented, randomly
again, for another 1250 cycles.
Results
The test is fully automated and runs without human intervention. The test was conducted 28 times. The result is
calculated using the correct numbers of corresponding CAs of the V4 net being fired when a shape is presented
to the system.
The success rate among three shapes diamond, square and rectangle is hundred percent. Each of the shapes i.e.
diamond, square and rectangle, gets committed to a different CA each time they are presented and during testing
each of the committed CAs responds correctly to different shapes presented each time.
Due to the very similar features, pyramid and stalactite shapes do get committed wrongly sometimes and the CA
which is being committed to one of the shapes presented first also sometimes responds to the second shape
presented afterwards. The success rate among the shapes of pyramid and stalactite is 75 percent.
One problem in the system is the detection of correct first order features in V1 net. Due to the slight difference
of edges and angle between different instances of the same shape, sometimes edges and angles are detected
wrongly.
Another problem which contributes to the wrong detection of the edges and angles is the different size of the
instances of the same shape which affects the property associated with a shape for example less pixels are
associated to a shape when the size of the instance of a shape is small as well and hence more weight needs to be
associated with such angles in order to make corresponding neurons fire.
5 Conclusions and Future Work
The results obtained from the above experiments are promising. The label experiment learns the correct
association between shapes and their corresponding labels on all the 28 experiment that were conducted. The
label experiment is a small but important step towards the solution of the SGP. Labelling is an essential aspect
of symbol grounding because it attaches symbols to sub symbolic representations.
The pronoun resolution experiment creates a dynamic association between ambiguous pronouns and
shape categories. Pronoun resolution is not required to ground symbols, but the experiment shows one of the
many benefits of symbol grounding. Though the model presented is not as complex as the biological brain, it
has shown it can be used towards solution of SGP. The promising results of these experiments shows that
Hebbian learning can be used effectively to ground semantic symbols and indicates that the model and the
technique used can be used effectively in solving other aspects of symbol grounding. The main goal of this
research is to develop an agent which can ground symbols and, by using those grounded symbols, can
effectively perceive and interact with its surrounding environment.
The future model of this agent will not only be able to learn and label the new shapes but also will be
able to learn and label new symbols from what has already been learned and labelled. Other more demanding
and difficult aspects of the SGP which need to be addressed in order to ground symbols include symbolic theft
and functional symbol grounding. Symbolic theft is to evolve new categories by breaking or combining the
existing categories into more elementary categories when possible. For example, by combining stripes with
horse, a new category, zebra, can be created. Functional symbol grounding grounds the symbol according to the
context in which it is used. The use of the symbols is really individual as well as domain and situation specific
[23]. By using the functional approach towards the SGP, the usefulness and thus the accuracy of the system can
be enhanced.
Other useful steps include alignment and the use of environmental feedback. Alignment is used to
modify a symbol to cohere with the meaning of the symbol of an experienced agent or a human. Environmental
feedback is used to readjust an agent’s already grounded symbols in response to environmental feedback it
receive. This includes feedback from other the behaviour of other agents. All of the above mentioned aspects of
SGP are not enough to ground the symbols on their own, labelling is needed to attach any symbol to its
semantics.
References
[1] C. Breazeal,"Sociable Machines: Expressive Social Exchange between Humans and Robots". Sc.D.
dissertation, Department of Electrical Engineering and Computer Science, MIT (2000).
[2] A. Cangelosi, “Evolution of Communication and Language Using Signals, Symbols and Words”, IEEE
Transaction in Evolution Computation, 5, pp. 93-101, (2001).
[3] A. Cangelosi, A. Greco and S. Harnad, “From Robotic Toil to Symbolic Theft: Grounding Transfer
from Entry-Level to Higher-Level Categories”, Connection Science, 12, pp. 143-162, (2000).
[4] A. Cangelosi, A. Greco and S. Harnad, “Symbol Grounding and the Symbolic Theft Hypothesis”, in
Simulating the Evolution of Language, A. Cangelosi and D. Parisi, Eds., London, Springer, pp.191210, (2002).
[5] P. Davidsson, “Toward a General Solution to the SGP: Combining Machine Learning and Computer
Vision“, in AAAI Fall Symposium Series, Machine Learning in Computer Vision: What, Why and
How? pp. 157-161 (1993).
[6] S. Harnad, “The SGP”, Physica D, pp. 335-346, (1990).
[7] S. Harnad, “Symbol Grounding in an Empirical Problem: Neural Nets are just a Candidate
Component”, in Proceedings of the Fifteenth Annual Meeting of the Cognitive Science Society, (1993).
[8] S. Harnad, “Grounding Symbols in the Analog World with Neural Nets – a Hybrid Model”,
Psychology, 12, pp. 12-78, (2001).
[9] D. Hebb. The Organization of Behavior. John Wiley and Sons, New York (1949).
[10] D. Hindle, Mats Rooth. “Structural Ambiguity And Lexical Relations” Meeting of the Association for
Computational Linguistics (1993).
[11] C. Huyck. “Overlapping CA from correlators”. Neurocomputing 56:435–9 (2004).
[12] C. Huyck. Developing Artificial intelligence by Modeling the Brain (2005).
[13] C. Huyck. “Creating hierarchical categories using CA”. Connection Science (2007).
[14] C.Huyck, and R.Bowles. “Spontaneous neural firing in biological and artificial neural systems”.
Journal of Cognitive Systems 6:1:31– 40 (2004).
[15] C. Huyck, and V.Orengo. “Information retrieval and categorisation using a cell assembly network”.
Neural Computing and Applications (2005).
[16] C.Huyck, “ CABot1: a Videogame Agent Implemented in fLIF Neurons” IEEE Systems, man and
Cybernetics Society, London. pp 115-120 (2008).
[17] I. Kenny, and C. Huyck. An embodied conversational agent for interactive videogame environments. In
Proceedings of the AISB’05 Symposium on Conversational Informatics for Supporting Social
Intelligence and Interaction, 58–63 (2005).
[18] R. Langacker. Foundations of Cognitive Grammar. Vol. 1. Stanford, CA. Stanford University Press
(1987).
[19] J. Searle, “Minds, Brains, and Programs”, Behavioral and Brain Sciences, 3, pp. 417-458, (1980).
[20] L.Steels “The Symbol Grounding Problem has been solved. So what’s next?” Symbols, Embodiment
and Meaning, Oxford University Press, (2007).
[21] M. Mayo, “Symbol Grounding and its Implication for Artificial Intelligence”, in Twenty-Sixth
Australian Computer Science Conference , pp. 55-60 (2003).
[22] R. Sun, “Symbol Grounding: A New Look at an Old Idea”, Philosophical Psychology, 13, pp. 149-172,
(2000).
[23] M. Taddeo and L. Floridi Solving the Symbol Grounding Problem: a Critical Review of Fifteen Years
of Research (2003).
[24] T. Wennekers and G. Palm "Cell Assemblies, Associative Memory and Temporal Structure in Brain
Signals", in "Time and the Brain: Conceptual Advances in Brain Research (Vol. 2, ),Harwood
Academic Publishers, pp 251—274, (2000).
Download