Grounding Symbols: Learning 2D Shapes using Cell Assemblies that emerge from fLIF neurons Fawad Jamshed and Christian Huyck School Of Engineering and Information Sciences- Middlesex University The Burroughs –London NW4 4BT, UK f.jamshed@mdx.ac.uk, c.huyck@mdx.ac.uk Abstract If a system can represent knowledge symbolically, and ground those symbols in an environment, then it has access to a vast range of data from that environment. The system described in this paper acts in a simple virtual world. It is implemented solely in fatiguing Leaky Integrate and Fire neurons; it views the environment, processes natural language commands, plans and acts. This paper describes how visual representations are labelled, thus gaining associations with symbols. The labelling is done in a semi-supervised manner with simultaneous presentation of the word (label) and a corresponding item in the visual field. The paper then shows how these grounded symbols can be useful in reference resolution. All tests performed worked perfectly. 1. Introduction A major hurdle in the development of an artificial intelligent agent is the symbol grounding problem (SGP) [6, 20]. A symbol can be defined as an association with an object due to a social convention and usually has an arbitrary shape with no resemblance to its referent. Each symbol is a part of wider and more complex system 20,22]. Any symbol is meaningless to its user unless, somehow, it is given some meaning. Once a symbol gets its meaning, it is grounded. How an artificial agent can develop the meanings of symbols autonomously is the SGP [3, 4, 7, 8]. The SGP is one of the most important open questions in the philosophy of information [23]. Manipulating meaningless symbols into another meaningless symbol is not intelligence [18]. Most artificial agents do not understand the meanings of the symbols they are processing and mostly they are just processing information according to predesigned algorithms. Instead of defining symbols in the form of other ungrounded symbols, a system might ground them in such a way that they independently have meaning without getting any significant help from any external source [1, 2, 21]. 2. Theoretical Background and Previous Work The SGP has been in existence for hundreds of years. As knowledge about human cognition has advanced, more candidate symbol grounding solutions have been proposed. Especially since the development of connectionist systems, which are inspired by biological neurons, there have been more ideas and solutions to address the SGP. This paper presents simulations that begin to address the SGP. The simulations are based on fatiguing Leaky Integrate and Fire (fLIF) neurons [13]. They also make use of the cell assembly (CA) concept; a CA is a set of neurons with high mutual synaptic strength that is the neural representation of a concept [9]. A brief description of CAs and fLIF neurons is given below. 2.1 Fatigue Leaky Integrate and Fire neurons (fLIF) fLIF neurons are a relatively simple model of biological neurons [12]. The model used in this paper makes use of discrete cycles. Each neuron has some activation, which it receives from other neurons. If a neuron has enough activation at the end of a cycle, it will fire, spread activation to connected neurons, and will lose all its energy. Neurons are connected to other neurons with unidirectional, weighted connections. If a neuron fires, it will pass the activation value of the weight of the connection. If a neuron’s activation is less than the threshold, it will not fire but will lose some of its activation as it leaks away. fLIF neurons also fatigue just like biological neurons [14]. If a neuron fires regularly it becomes harder to fire. As neurons fatigue, they become more difficult to fire again. This is modelled by increasing the threshold of a neuron as describe in equation 1. T(t) = T(t-1) + Fc Equation 1 Where T is the threshold at time t, T(t-1) is a threshold at time t-1 and Fc is fatigue constant. If a neuron does not fire the threshold associated with that neuron decreases using the fatigue recovery constant as shown in equation 2. The threshold never goes below the base activation threshold. T(t) = T(t-1) + Fr Equation 2 If a neuron does not fire at a given time, some of its energy leaks away but it still integrates energy from the surrounding active neurons. This is modelled by calculating the activation as describe in equation 3. A(t) = A(t-1)/D + C Equation 3 Where A(t) is the activation at time t and is sum of activation A at t-1 time, reduced by a decay constant D, and C, the sum of incoming activation of all active neurons that are connected to a given neuron and fired at time t1. The value of C is determined by multiplying the incoming activation on all connected links with the associated weights of those links. 2.2 Cell Assemblies CAs were proposed by Hebb sixty years ago [9] and still today successfully explain how the human brain learns different concepts and how they are stored. A single neuron does not represent a memory but a large number of neurons represent each concept in the human brain. The neurons for a particular concept have high mutual connection strength that can support a reverberating circuit. This circuit is a CA, and can continue to fire after the initial stimulus ceases. The learning of a CA is done by a Hebbian learning rule. Hebbian learning states that the connection strength between two neurons is related to how frequently they fire simultaneously. When an external input is applied to neurons, the strength of the connections between neurons is adjusted accordingly. The repeated presentation of input increases the strength of the connection between simultaneously active neurons while decreasing the connection strength between other neurons. The set of neurons with increased synaptic strength forms a CA. CAs are reverberating circuits. Initial firing of some neurons in the CA can lead to further firing of other neurons in the CA due to high connection strength. This then can lead to a cascade of firing called CA ignition [11, 13, 15, 24]. One advantage of using CAs is they can be used as long and short term memories. A short term memory persists as long as neurons are firing. Long term memories are formed by synaptic modification due to the Hebbian learning rule. This dual dynamics (ignition and learning) of a CA makes it more suitable for developing powerful computational devices [17]. Thus a wide range of tasks can be modelled using CAs. 3 Proposed Work Labelling is a simple form of symbol grounding. A system, based on an existing agent that contains an existing semantic CA and an existing label CA, is developed. An association between the semantic and label CAs is learned. Next, these labels are exploited to provide a form of reference resolution. These are relatively simple tasks that are a proof of concept. Labelling depends on categories. Categories are very important as they help in identifying the class of an object. By putting things together which have similar features, the system learns to categorise things [5]. A category is represented by a CA. Prior work has shown that CAs can be learned from environmental stimuli [13]. While they may be learned, it is also possible to set the topology of the system so that a particular CA already exists. One theory states that a concept is represented by a semantic pole and a phonological pole [18]. A CA for the category would represent the semantic pole, and a different CA for the label would be the phonological pole. If a system has a semantic CA, and a label CA, it can attach them to each other, which means the symbol is now grounded. By having this iconic representation of categories, the system has attached a name to a category. Symbol grounding can be used to address the reference resolution problem. Reference resolution is a common problem in natural language. For example in the sentence We saw a doll with a black jacket on and it was quite big. Example 1 The pronoun it can refer to either doll or to the jacket. If the system can decide which, it is resolving the pronoun. In resolving the pronoun, the system could ignite both the semantic CA and the label CA associated with the item to which the pronoun is resolved. 4 Simulations The simulations described below are an extension of the first version of the Cell Assembly Robot (CABot1) [16]. CABot1 does no real learning. The first simulation shows how a slight modification along with learning allows the attachment of labels. The second simulation shows how the now labelled semantic CAs can be used for reference resolution. CABot The main aim of CABot is to develop an agent in simulated neurons, which can take natural language as input and interact with the environment without any external help. By interacting with the environment, it is hoped that it can learn the semantics of the environment sufficiently well to improve language processing. For CABot1, a virtual 3D environment was established based on the Crystal Space games engine. Two agents were placed in the environment, the first controlled by a user, and the second was the CABot1 agent. All processing in CABot1 was done by a complex network of fLIF neurons, though it emitted symbolic commands to the Crystal Space stub. Figure1 Instance of pyramid in virtual environment Figure2 Instance of stalactite in virtual environment A complete description of CABot1 is beyond the scope of this paper but further information can be found elsewhere [16]. A total of 21 sub-networks are used to subdivide the tasks of vision, natural language parsing, planning, action and system control. The important subnets for the purposes of this paper are the vision nets and the word nets. There are three vision subnets, a simulated retina, a primary visual cortex and a secondary visual cortex (V2). These systems were hard coded, so there was no learning. Visual input was in the form of a bitmap representation of a view of the game from the agent’s perspective. In particular, the secondary visual cortex subnet was set to recognise pyramids and stalactites. If one of these was present in the game, a CA in V2 ignited. There were several position and size dependent CAs associated with both pyramid and stalactite. Figure 1 and figure 2 shows instance of pyramid and stalactite respectively. Similarly, the parsing component had CAs for words. In the game, the user issues natural language commands to tell the agent what to do. There was a noun subnet used during parsing and an instance subnet to store semantic roles during parsing. Both noun and instance subnets had CAs for both pyramid and stalactite labels. 4.1 Grounding five basic 2-D Shapes Learning was introduced in to the system with the help of six visual sub nets. Five shapes used are: pyramid, stalactite, diamond, square and right angle triangle. Currently the vision system consists of six nets which are Input net, Retina net, V1 net, V1A net, V2 net and V4 net. Each of these six sub nets performs a unique function. The Input net displays the input from the environment whereas the Retina net is a series of OnOff and OffOn detector. The V1 net is position dependent and detects first order features of a solid shape in the picture, whereas the V2 net detects the second order features. The V1A net is a position independent model of the V1 net. The V4 net identifies the shape of an object with the help of the second order features which are detected in the V2 net. The detailed working of the vision system is defined below. Figure 3 Diamond Figure 4 Pyramid Figure 5 Right angled Triangle Figure 3 Square Figure 7 Stalactite The Input net gets the input from the system in the shape of bits and displays it on the screen. The input is usually in the shape of pictures but shapes can be hard coded in the system. The Retina net is a biological plausible model of the OnOff and OffOn detectors that are found in biological systems; it gets the input from input net, and feeds its output to the V1 net. Three different types of OnOff and Off On detectors are used in the Retina net 3 by 3, 6 by 6 and 9 by 9 detectors. V1 is position dependent and gets the input from the OnOff detectors and identifies the first order features e.g. edges and angles of a solid shape in the picture. The V1 net responds to different types of edges and angles presented. The connections from the V1 net were made position independent by introducing the V1A net and making random connections from each V1 CA to the corresponding V1A CA. The V1A net has direct connections from the V1 net only. The neurons of the V1A net has low decay rate of 1.01 to promote even a small firing of neurons in V1. The V1 net and the retina net used are the modified version of the V1net and the retina net of CABot1. More CAs are introduce in the V1 net vertical edge CA ,and four right angle CA. Whereas theV2 net, the V4 net and the VT net were introduced into the CABot system for this experiment. The V2 net gets input from the V1A net, which is the position independent version of the V1 Net. When a three or four edged shape is presented, each CA of the V2 net gets three inputs from three CAs of the V1A net. CAs of the V2 net are only ignited when all of three CAs in the V1A net is active when a shape is presented. The V2 net output is used as an input to the V4 net where final shape is determined. The V4 net is the final part of the vision system where all the shapes are discriminated. The V4 net and the V2 net are fully connected which means each of the CA in V2 net is connected with all CAs of V4 and vice versa. Learning is carried out between V2 and V4 nets by learning the appropriate connections. The same vision topology was used within V1A net, V2 net and V4 net. In this vision topology, twenty percent of the neurons used were inhibitory neurons while eighty percent of the neurons used were excitatory neurons except in the case of the V4 net where there are inhibitorier intra CAs connections to promote winner takes all situations so that only one CA will eventually be on. Each inhibitory neuron in a CA of the V4 net is connected with 1154 other neurons of other CAs where each connection has a high synaptic strength of 30. No. of Neurons Firing in CA Pyrami of V4 net d Stalactite Square Diamond Rg triangle When pyramid is presented 490 0 0 0 0 When Diamond is presented 0 0 0 489 0 Table 1: V4 net CA during testing phase Table 3 shows the result of a successful test when a pyramid and when a diamond is presented. When pyramid and diamond shapes are presented during the test, the number of neurons firing in the corresponding CAs of V4 show that specific CAs are committed to pyramid and diamond respectively, whereas the other CAs of the V4 net do not respond. The simulation is termed successful when all of the five CAs of the V4 net are committed to the five different distinctive shapes, whereas the quality of the success is determined by how well these five shapes get learned CAs to respond when different shapes are presented. To prevent the same shape from getting committed to more than one CA of theV4 net, a winner takes all strategy is used within the V4 net. In order to promote a winner take all strategy lots of inhibitory connections are used between the five different CAs of theV4 net so they compete with each other to win but only one of them wins. In order to prevent a CA from winning on two different shapes, another net the VT net is used. The VT net uses the same topology as V1A, V2 and V4 net do and consists of five CAs, where each CA consists of five hundred neurons. There are strong connections between each CA of V4 net and VT net and vice versa. The connections from V4 to VT net are plastic whereas the connections from VT net to V4 net are non plastic. During learning, the connections between the V4 net CAs and the corresponding VT net CAs are adjusted and learned. The idea behind using the VT net is that its CAs helps corresponding CAs of theV4 net to compete with other CAs of the V4 net. During the learning phase, the competing CAs of the V4 net also ignites the corresponding CAs of the VT net due to the strong connections between these CAs. These ignited VT CAs in return transfer energy back to the V4 net as they also have strong connections corresponding CAs of the V4 net, thus helping them to compete. When one of the CAs of the V4 net wins, after competing with other CAs of the V4 net, the connections between that CA and its corresponding VT net CA gets weakened by reducing the weights between the connections where post and pre neurons are co firing. Next time, when this CA tries to compete there will be less help from the corresponding VT net CA thus it will have fewer chances to win. No Of Neurons firing 600 500 400 300 200 100 0 Pyramid Stalactite Diamond Square CANT Steps Right Triangle Graph 1 Typical example of winner takes all in this case its triangle The training runs for 1250 cycles where each of the five shapes was presented for 250 cycles. The connections between the V2 net CAs and the corresponding V4 net CA are adjusted and learned using Hebbian learning. The learning is bi-directional with weights on connections from the V2 net to the V4 net and weights from the V4 net to V2 net being learned at the same time. The test runs for 2500 steps. During the testing part of the simulation, shapes were presented in a random order. After presenting the complete set of five shapes in 1250 cycles, another set of shapes was presented, randomly again, for another 1250 cycles. Results The test is fully automated and runs without human intervention. The test was conducted 28 times. The result is calculated using the correct numbers of corresponding CAs of the V4 net being fired when a shape is presented to the system. The success rate among three shapes diamond, square and rectangle is hundred percent. Each of the shapes i.e. diamond, square and rectangle, gets committed to a different CA each time they are presented and during testing each of the committed CAs responds correctly to different shapes presented each time. Due to the very similar features, pyramid and stalactite shapes do get committed wrongly sometimes and the CA which is being committed to one of the shapes presented first also sometimes responds to the second shape presented afterwards. The success rate among the shapes of pyramid and stalactite is 75 percent. One problem in the system is the detection of correct first order features in V1 net. Due to the slight difference of edges and angle between different instances of the same shape, sometimes edges and angles are detected wrongly. Another problem which contributes to the wrong detection of the edges and angles is the different size of the instances of the same shape which affects the property associated with a shape for example less pixels are associated to a shape when the size of the instance of a shape is small as well and hence more weight needs to be associated with such angles in order to make corresponding neurons fire. 5 Conclusions and Future Work The results obtained from the above experiments are promising. The label experiment learns the correct association between shapes and their corresponding labels on all the 28 experiment that were conducted. The label experiment is a small but important step towards the solution of the SGP. Labelling is an essential aspect of symbol grounding because it attaches symbols to sub symbolic representations. The pronoun resolution experiment creates a dynamic association between ambiguous pronouns and shape categories. Pronoun resolution is not required to ground symbols, but the experiment shows one of the many benefits of symbol grounding. Though the model presented is not as complex as the biological brain, it has shown it can be used towards solution of SGP. The promising results of these experiments shows that Hebbian learning can be used effectively to ground semantic symbols and indicates that the model and the technique used can be used effectively in solving other aspects of symbol grounding. The main goal of this research is to develop an agent which can ground symbols and, by using those grounded symbols, can effectively perceive and interact with its surrounding environment. The future model of this agent will not only be able to learn and label the new shapes but also will be able to learn and label new symbols from what has already been learned and labelled. Other more demanding and difficult aspects of the SGP which need to be addressed in order to ground symbols include symbolic theft and functional symbol grounding. Symbolic theft is to evolve new categories by breaking or combining the existing categories into more elementary categories when possible. For example, by combining stripes with horse, a new category, zebra, can be created. Functional symbol grounding grounds the symbol according to the context in which it is used. The use of the symbols is really individual as well as domain and situation specific [23]. By using the functional approach towards the SGP, the usefulness and thus the accuracy of the system can be enhanced. Other useful steps include alignment and the use of environmental feedback. Alignment is used to modify a symbol to cohere with the meaning of the symbol of an experienced agent or a human. Environmental feedback is used to readjust an agent’s already grounded symbols in response to environmental feedback it receive. This includes feedback from other the behaviour of other agents. All of the above mentioned aspects of SGP are not enough to ground the symbols on their own, labelling is needed to attach any symbol to its semantics. References [1] C. Breazeal,"Sociable Machines: Expressive Social Exchange between Humans and Robots". Sc.D. dissertation, Department of Electrical Engineering and Computer Science, MIT (2000). [2] A. Cangelosi, “Evolution of Communication and Language Using Signals, Symbols and Words”, IEEE Transaction in Evolution Computation, 5, pp. 93-101, (2001). [3] A. Cangelosi, A. Greco and S. Harnad, “From Robotic Toil to Symbolic Theft: Grounding Transfer from Entry-Level to Higher-Level Categories”, Connection Science, 12, pp. 143-162, (2000). [4] A. Cangelosi, A. Greco and S. Harnad, “Symbol Grounding and the Symbolic Theft Hypothesis”, in Simulating the Evolution of Language, A. Cangelosi and D. Parisi, Eds., London, Springer, pp.191210, (2002). [5] P. Davidsson, “Toward a General Solution to the SGP: Combining Machine Learning and Computer Vision“, in AAAI Fall Symposium Series, Machine Learning in Computer Vision: What, Why and How? pp. 157-161 (1993). [6] S. Harnad, “The SGP”, Physica D, pp. 335-346, (1990). [7] S. Harnad, “Symbol Grounding in an Empirical Problem: Neural Nets are just a Candidate Component”, in Proceedings of the Fifteenth Annual Meeting of the Cognitive Science Society, (1993). [8] S. Harnad, “Grounding Symbols in the Analog World with Neural Nets – a Hybrid Model”, Psychology, 12, pp. 12-78, (2001). [9] D. Hebb. The Organization of Behavior. John Wiley and Sons, New York (1949). [10] D. Hindle, Mats Rooth. “Structural Ambiguity And Lexical Relations” Meeting of the Association for Computational Linguistics (1993). [11] C. Huyck. “Overlapping CA from correlators”. Neurocomputing 56:435–9 (2004). [12] C. Huyck. Developing Artificial intelligence by Modeling the Brain (2005). [13] C. Huyck. “Creating hierarchical categories using CA”. Connection Science (2007). [14] C.Huyck, and R.Bowles. “Spontaneous neural firing in biological and artificial neural systems”. Journal of Cognitive Systems 6:1:31– 40 (2004). [15] C. Huyck, and V.Orengo. “Information retrieval and categorisation using a cell assembly network”. Neural Computing and Applications (2005). [16] C.Huyck, “ CABot1: a Videogame Agent Implemented in fLIF Neurons” IEEE Systems, man and Cybernetics Society, London. pp 115-120 (2008). [17] I. Kenny, and C. Huyck. An embodied conversational agent for interactive videogame environments. In Proceedings of the AISB’05 Symposium on Conversational Informatics for Supporting Social Intelligence and Interaction, 58–63 (2005). [18] R. Langacker. Foundations of Cognitive Grammar. Vol. 1. Stanford, CA. Stanford University Press (1987). [19] J. Searle, “Minds, Brains, and Programs”, Behavioral and Brain Sciences, 3, pp. 417-458, (1980). [20] L.Steels “The Symbol Grounding Problem has been solved. So what’s next?” Symbols, Embodiment and Meaning, Oxford University Press, (2007). [21] M. Mayo, “Symbol Grounding and its Implication for Artificial Intelligence”, in Twenty-Sixth Australian Computer Science Conference , pp. 55-60 (2003). [22] R. Sun, “Symbol Grounding: A New Look at an Old Idea”, Philosophical Psychology, 13, pp. 149-172, (2000). [23] M. Taddeo and L. Floridi Solving the Symbol Grounding Problem: a Critical Review of Fifteen Years of Research (2003). [24] T. Wennekers and G. Palm "Cell Assemblies, Associative Memory and Temporal Structure in Brain Signals", in "Time and the Brain: Conceptual Advances in Brain Research (Vol. 2, ),Harwood Academic Publishers, pp 251—274, (2000).