Using Genetic Algorithms for Sparse Distributed Memory Initialization Ashraf Anwar1,2 Dipankar Dasgupta3 Stan Franklin1,2 Institute for Intelligent Systems The University of Memphis Memphis, TN 38152 aanwar1@memphis.edu Institute for Intelligent Systems The University of Memphis Memphis, TN 38152 dasgupta@msci.memphis.edu Institute for Intelligent Systems The University of Memphis Memphis, TN 38152 stan.franklin@memphis.edu Abstract- In this paper, we describe the use of Genetic Algorithms to initialize a set of hard locations that makes up the storage space of Sparse Distributed Memory (SDM). SDM is an associative memory technique that uses binary spaces, and relies on close memory items tending to be clustered together, with some level of abstraction and with details blurring. An important factor in the physical implementation of SDM is how many hard locations are used, which greatly affects the memory capacity. It is also dependent somehow on the dimension of the binary space used. For the SDM system to function appropriately, the hard locations should be uniformly distributed over the binary space. We represented a set of hard locations of SDM as population members, and employed GA to search for the best (fittest) distribution of hard locations over the vast binary space. Fitness is based on how far each hard location is from all other hard locations, which measures the uniformity of the distribution. The preliminary results are very promising, with the GA significantly outperforming random initialization used in most existing SDM implementations. This use of GA, which is similar to the Michigan approach, differs from the standard approach in that the object of the search is the entire population. 1 Introduction Genetic Algorithms (GA) are finding increasing number of applications in a variety of search, optimization, machine learning and other problems (Dasgupta and McGregor 1992; Dasgupta and Michalewicz 1997; Davis 1991; Fan and Wang 1997). They are usually used whenever there is a large search space, and no efficient algorithm, for trimming or pruning it. Sparse Distributed Memory (SDM) is an associative memory model invented by Kanerva (1988a), which we think is a good model for the human memory. SDM has a vast binary space of possible addresses in the semantic space, whereas for any practical application only a very small portion of this space can actually exist. This makes SDM a perfect match for GA search. Having so 1 2 3 Supported in part by ONR grant N00014-98-1-0332 Member of the Conscious Software Research Group A research grant supported by The University of Memphis, 1998 many possibilities for assignment of actual (hard) locations to addresses suggests the use of GA. GA proved to be much more superior to the random initialization method that is widely used. The only SDM implementation that used non-random initialization of hard locations, is the GSDM work (Fan and Wang 1997), in which they initialized the locations based upon the data set to be stored. This can serve to reduce the memory requirement for the implementation, whereas limiting generalization, and learning capabilities, as the memory becomes full. They used the mutation and crossover genetic operators to allow for change or noise in the assigned addresses, based upon the selected existing addresses, if any, until all available memory locations are assigned to the available input data set. Our approach is totally different, and independent of theirs. The idea here is to keep the general purpose huge memory SDM as is, but to invoke GA for better distribution of the hard locations over the huge semantic space. This approach has two main advantages. It maintains the original functionality of SDM and, at the same time, allows for nonbiased distribution of the hard locations over the semantic space, which allows for a domain independent and general associative memory and learning. There are also neural network implementations of SDM (Hely et al 1997; Joglekar 1989; Scott et al 1993), in which weights for the hidden layer (the hard locations) are initialized randomly. Such random initialization of the hidden layer weights corresponds initially to the random initialization of hard locations in traditional SDM, but allows for dynamic address change as learning or training takes place so that the hard locations correspond to the addresses of the training patterns. SDM has proved successful in modeling associative memory (Anwar 1997; Hely 1994; Keeler 1987; Scott 1993). Associative memory is typically needed for any smart autonomous agent for various reasons (Glenberg 1997; Kosslyn 1992). Cognitive and Conscious agents typically need such a memory (Franklin 1997; Franklin and Graesser 1997). Cmattie, a clerical software agent, uses associative memory to keep track of departmental seminar announcements (Bogner 1999; McCauley 1998; Ramamurthy et al 1998; Zhang et al 1998a;b). IDA, Intelligent Distribution Agent, also needs such a memory to learn and keep associations between various information pieces pertaining to the task of personnel distribution (Franklin et al 1998). 2 Sparse Distributed Memory 2.1 A Brief Tour with SDM Sparse Distributed Memory, SDM, is the work of Pentti Kanerva (1988a;b; 1992) which gets its name from the sparse allocation of storage locations in the vast binary address space, and the distributed nature of information storage and retrieval as we will explain later. For more details and discussions, see (Franklin 1995) for a brief overview, (Willshaw 1990) for a useful commentary, or (Anwar 1997, Chapter 2) for a more detailed summary of its mechanism. Many evaluations, extensions, and enhancements have been suggested for SDM (Evans and Surkan 1991; Karlsson 1995; Kristoferson 1995a;b; Rogers 1988b; Ryan and Andreae 1995). The auto-associative version of SDM is truly an associative memory technique where the contents and the addresses belong to the same space and are used alternatively. The dimension of the space determines how rich each word is. Another important factor is how many real memory locations are in the space. We call these hard locations. Features are represented as one or more bits. Groups of features are concatenated to form a word, which becomes a candidate for writing into SDM. When writing, a copy of this binary string is placed in all close-enough hard locations. When reading, a cue would reach all close enough hard locations and get some sort of aggregate or average return from them. Reading is not always successful. Depending on the cue and the previously written information, among other factors, convergence or divergence during a reading operation may occur. If convergence occurs, the pooled word will be the closest match, possibly with some sort of abstraction, of the input reading cue. On the other hand, when divergence occurs, there is no relation, in general, between the input cue and what is retrieved from SDM. 2.2 SDM from the Inside Out SDM is a content addressable memory that, in many ways, is ideal for use as a long-term associative memory. Content addressable means that items in memory can be retrieved by using part of their contents as a cue, rather than having to know their addresses in memory. Boolean geometry is the geometry of Boolean spaces. A Boolean space is the set of all Boolean vectors (that is, vectors composed of zeros and ones) of some fixed length, n, called the dimension of the space. Points in Boolean space are Boolean vectors. The Boolean space of dimension n contains 2n Boolean vectors, each of length n. The number of points increases exponentially as the dimension increases. Boolean geometry uses a metric called the Hamming distance, where the distance between two points is the number of coordinates at which they differ. Thus d((1,0,0,1,0), (1,0,1,1,1)) = 2. The distance between two points will measure the similarity between two memory items, closer points being more similar. We may think of these Boolean vectors as feature vectors, where each feature can be only on, 1, or off, 0. Two such feature vectors are closer together if more of their features are the same. For n = 1000, most of the space lies between distance 422 and distance 578 from a given vector. In other words, almost all the space is far away from any given vector. A Boolean space implementation is typically sparsely populated, an important property for the construction of the model, and the source of part of its name. A sphere is the set of all points within some fixed distance, the radius, from its center. Spheres in Boolean space are quite different in one respect from the Euclidean spheres we are used to. Points of an Euclidean sphere are uniformly distributed throughout. For r = n/2, most of the points in a sphere in a Boolean space lie close to its boundary. SDM address space is enormous. It contains 2n locations, where n is the dimension of the binary space. For n=1000, one cannot hope for such a vast memory. On the other hand, thinking of feature vectors, a thousand features wouldn't deal with human visual input until a high level of abstraction had been reached. A dimension of 1000 may not be all that much; it may, for some purposes, be unrealistically small. Kanerva proposes to deal with this vast address space by choosing a uniform random sample, size 220, of locations, that is, about a million of them. These he calls hard locations. With 220 hard locations out of 21000 possible locations, the ratio is 2-980, very sparse indeed. In addition, the distance from a random location in the entire address space to the nearest hard location will fall between 411 and 430 ninety-eight percent of the time, with the median distance being 424. The hard locations are certainly sparse. SDM is distributed in that many hard locations participate in storing and retrieving each datum, and one hard location can be involved in the storage and retrieval of many data. This is a very different beast than the store-onedatum-in-one-location type of memory to which we're accustomed. Each hard location, itself a bit vector of length 1000, stores data in 1000 counters, each with range -40 to 40. We now have about a million hard locations, each with a thousand counters, totaling a billion counters in all. Numbers in the range -40 to 40 will take most of a byte to store. Thus we are talking about a billion bytes, a gigabyte, of memory. Quite a lot, but not out of the question. How do these counters work? Writing a 1 to the counter increments it; writing a 0 decrements it. A datum, to be written is a bit vector of length 1000.4 To write at a given hard location x, write each coordinate of to the corresponding counter in x, either incrementing it or decrementing it. Call the sphere of radius 451 centered at location the access sphere of that location. An access sphere typically contains about a thousand hard locations, with the closest to usually some 424 bits away and the median distance from to the hard locations in its access sphere about 448. Any hard location in the access sphere of is accessible from . With this machinery in hand, we can now write distributively to any location, hard or not. To write a datum to a location , simply write to each of the roughly one thousand hard locations accessible from , Distributed storage. With our datum distributively stored, the next question is how to retrieve it. With this in mind, let's ask first how one reads from a single hard location, x. Compute , the bit vector read at x, by assigning its ith bit the value 1 or 0 according as x's ith counter is positive or negative. Thus, each bit of , results from a majority rule decision of all the data that have been written on x. The read datum, , is an archetype of the data that have been written to x, but may not be any one of them. From another point of view, is the datum with smallest mean distance from all data that have been written to x. Knowing how to read from a hard location allows us to read from any of the 21000 arbitrary locations. Suppose is any location. The bit vector, , to be read at , is formed by pooling the data read from each hard location accessible from . Each bit of results from a majority rule decision over the pooled data. Specifically, to get the ith bit of , add together the ith bits of the data read from hard locations accessible from , and use half the number of such hard locations as a threshold. At or over threshold, assign a 1. Below threshold assign a 0. Put another way, pool the bit vectors read from hard locations accessible from , and let each of their ith bits votes on the ith bit of . We now know how to write items into memory, and how to read them out. But what's the relation between the datum in and the datum out? Are these two bit vectors the same, as we'd hope? Let's first look at the special case where the datum is written at the location . This makes sense since both are bit vectors of length one thousand. Kanerva offers a mathematical proof that reading form recovers (Kanerva 1988a)Here’s the idea of the proof. Reading from recovers archetypes from each of some thousand hard locations and takes a vote. The voting is influenced by the ~1000 stored copies of and, typically, by about 10,000 other stored data items. Since the intersection of two access spheres is typically quite small, these other data items 4 We will try to adhere to Kanerva's convention of using lower case Greek letters for locations and for data, and lower case Roman letters for hard locations. The Greek letters will include (xi), (eta), and (zeta). influence a given coordinate only in small groups of ones or twos or threes, i.e. white noise. The thousand copies drown out this slight noise and is successfully reconstructed. Not all of the stored items are needed to recover it. Iterated reading allows recovery when reading from a noisy version of what's been stored. Again, Kanerva offers conditions (involving how much of the stored item is available for the read) under which this is true, and a mathematical proof. Reading with a cue that has never been written to SDM before gives, if convergent, the closest match stored in SDM to the input cue, with some sort of abstraction. Since a convergent sequence of iterates converges very rapidly, while a divergent sequence of iterates bounces about seemingly at random, comparison of adjacent items in the sequence quickly tells whether or not a sequence converges. Thus, this memory is content addressable, provided we write each datum with itself as address. The above discussion, based on the identity of datum and address, produced a content addressable memory with many pleasing properties. It works well for reconstructing individual memories. 3 The GA Design and Implementation 3.1 Population Formation Binary encoding was used since we are dealing with binary spaces. For various values of chromosome length (binary space dimension), the population size (number of hard locations) was determined. To be noted here is that the number of hard locations employed in a certain SDM greatly affects the capacity of the resulting associative memory. It is a design decision after all. In our experiments, we took the size of our storage memory (number of hard locations), to be the same as the dimension (chromosome length), resulting in a non-linear occupation ratio. This makes the storage size equal to the binary log of the vast semantic address space. For n=100, we have 2100 possible addresses over the semantic space, out of which we only assign 100 hard locations. This is different from other occupation ratios employed in other SDM implementations, but again it is after all a design decision based mainly on the memory capacity required. For n=10, 1110011000 is an example of an address or a hard location to be assigned. If we were to have only two hard locations in such a case, then the best two assigned addresses will be at 10 hamming distance from each other, e.g. 1111111111 and 0000000000. The best population will have individuals that are as far apart as possible from each other on the average, with a small enough standard deviation. It helps also to have each pair of individuals at least a certain threshold apart. The size of the search space is clearly 2n. It is immediately apparent how such a space becomes too huge for any traditional search technique as n becomes large. Even a traditional heuristic search technique will find it too hard to arrive at a good coverage of the space in an efficient manner. 3.2 The Fitness Measure The fitness function for an individual m is given by: Hamming Distance (m,n) We have to consider the very huge size of the space before we can even dream about spanning the entire space with any means. GA however can play the rule of a good heuristic, picking representatives out of vast regions in the search space. This is similar in some sense to the basic functionality of SDM when reading or writing, since we consider only actual sparse hard locations existing in access spheres (see Section 2.2 above for explanation). for all nm In other words, it is the sum of the hamming distances between that individual and all others, subject to two constraints: i. The individual is unique in the population (no repetition). This is achieved by checking each new child to be added to a generation against previous members of the generation. If the child is not unique, it is discarded, and the genetic operators are reused to produce another unique child. ii. The individual is at least k bits apart from all other individuals (threshold for hamming distance where k>0). Similar precautions are followed to preserve this condition just as in i. To be noted here, is that the threshold is typically not high, in order to avoid unnecessary reiteration. So, the target is to maximize the mean fitness per generation according to the above two constraints, while preserving a reasonably low standard deviation. Note that our goal is a whole population rather than a particularly fit individual as is usually the case. Actually, there is a whole spectrum between the two extremes of group fitness, the Michigan approach, and individual fitness, the Pittsburgh approach (Grefenstette 1987; 1988; 1994; Schaffer and Grefenstette 1985). Our group fitness measure is similar to the Michigan approach. 3.3 The Genetic Operators Many parameter tunings were considered and compared. The following set of parameters tends to produce the best results among all sets considered. More testing and experimenting with different values of GA parameters is being done. Roulette Selection (Davis 1991; Goldenberg 1989), was used, where individuals get selected in proportion to their fitness. Ranked Replacement was considered where we would replace the worst parent always, and only if the child is better than the parent. Elitism was employed to keep the best fitting individual. Invert Mutation with 2% mutation rate was used along with One-point Crossover with 1.0 rate. The above operators can never span the entire search space for large values of n (space dimension) due to the exponential nature of the space, which becomes too large for any computational algorithm to cover as n increases. However, it is reasonable to assume that they will span enough representatives of the various regions in the space. 3.4 Implementation The GA software package Sugal version 2.1 was used on Solaris UNIX over Sparc 5 platform. We had to modify the fitness function, and replacement strategies in the source code to suit our complex group fitness function, and the constraints imposed on the GA search. The fitness function now is a function of the entire generation, not only one individual. Also the replacement strategy was modified to observe the constraints mentioned in sec 3.2. 4 Results When n is both the space dimension as well as the number of hard locations, it can be shown that the optimal mean fitness is n2/2 (For n=2, 00 and 11 are optimal, with mean fitness of 2. For n=3, 000, 011, 101 are optimal with mean fitness of 4. For n=4, 0000, 1111, 1010, 0101 are optimal, with mean fitness of 8). Apparently, as long as that condition holds, and for small values of n, we may be able to use a deterministic algorithm. But as n becomes large, and the number of hard locations increases significantly over n, the need for a heuristic like GA arises. Table 1 shows the improvement GA has over random initialization in terms of the average run time, taken over 20 sample runs, required to reach the target mean fitness with a specific standard deviation (SD). The times shown in the table are in the format “h:mm:ss:dd”. Each set of two rows gives the results of GA vs. random initialization for 6%, 4%, and 2% standard deviation from the target mean fitness, for a certain number of hard locations. The second row in each set explicitly gives the improvement gained by GA in percentage. The target is to reach certain group fitness with a limit on the standard deviation of the individuals. The lower the standard deviation, the better is the resulting population, and the harder is the task. Note that for non-significant SD, both methods reach target almost instantly. In this case, the random approach may then perform as well as the GA, or even slightly better in rare occasions. When tighter constraints are in place, the GA significantly outperforms random initialization. The improvement can be as high as about 41000 times faster, or 4,114,200%. The nonmonotonic improvement of GA can be due in part to the huge size of the search space, which is never entirely spanned by any means, for large values of n. No. of Hard Locations 10 GA Gain 20 GA Gain 30 GA Gain 40 GA Gain 50 GA Gain Random 6% SD 0:00:00.09 0:00:00.01 0:00:00.10 0:00:00.09 0:00:00.40 GA 6% SD Random 4% SD 0:00:00.01 800% 0:00:00.01 0% 0:00:00.07 43% 0:00:00.10 -10% 0:00:00.20 100% GA 6% SD 0:00:07.00 0:00:04.60 0:00:00.80 0:00:02.50 0:00:02.10 0:00:00.14 4,900% 0:00:00.03 15,233% 0:00:00.04 2,000% 0:00:00.10 2,400% 0:00:00.40 425% Random 2% SD 0:03:16.80 6:34:11.10 6:57:08.65 8:00:00.10 8:41:03.76 GA 2% SD 0:00:01.83 10,654% 0:00:04.60 514,054% 0:00:04.30 581,962% 0:00:00.70 4,114,200% 0:00:01.60 1,953,885% Table 1: Average Runtimes for GA Initialization compared to Random Initialization. Note the increased time requirements of both the GA and the random approach, as the problem size (dimension and number of hard locations) increases, which was certainly expected. Apparently, for large dimensions, random initialization can only match GA, if the space is too sparse. This explains why existing SDM applications were able to manage with the random initialization so far. But, by using our approach to remove this limitation, we can deploy SDM with small dimensions, and relatively large number of evenly distributed hard locations. 5 Discussion The advantage of our approach is immediately obvious. To our knowledge, this is the first use of GA for such a search problem. The random initialization approach was the one used in most SDM implementations (Anwar 1997; Hely et al 1997; Hong and Chen 1991; Rao and Fuentes 1996;1998; Surkan 1992). Also it was the original method suggested by Kanerva (Kanerav 1988a), and discussed elsewhere (Denning 1989; Franklin 1995; Willshaw 1990). A few neural network implementations of SDM were also used (Hely et al 1997; Joglekar 1989; Scott et al 1993), with random initialization of link weights with small values, followed by incremental correction using standard Backpropagation (Haykin 1994). Practical implementations of SDM are not rare. They vary from traditional implementations (Anwar 1997; Manevitz and Zemach 1997; Rogers 1989b), to parallel tree-structure realization (Hamalainen et al 1997), to connection machine implementation (Rogers 1988a; 1989a; Turk 1995; Turk and Gorz 1994), or to neural network simulation (Scott et al 1993). Keeler made a comparison between SDM and Hopfield neural network (Keeler 1988). Jaeckel also proposed a design variation for SDM (Jaeckel 1989a;b). 6 Future Research The use of Parallel Genetic Algorithms (PGA) or Structured Genetic Algorithms (sGA) to divide the search space more efficiently, with migration between subspaces, will be considered and evaluated. The use of a multi-modal niching GA to evolve independent subpopulations out of each a candidate is picked for hard location assignment, is currently under investigation. Also, more tuning of the current GA parameters may well help obtain even better results. To be noted here, is that the straightforward mathematical algorithm to allocate arbitrary number of hard locations in a binary space with large dimension, while observing the maximum hamming distance constraint, has exponential time complexity. An SDM augmented with GA for the assignment of the addresses as illustrated here, or to reduce memory requirement and to assign addresses based on the available inputs (Fan, and Wang 1997), may well be a key to a better understanding and simulation of the human mind. 7 Conclusions GA significantly outperforms the random initialization of Sparse Distributed Memory in almost all cases. The advance is more apparent when all the hard locations are required to have a minimum fitness that is close enough to the target mean fitness. We emphasize here that we are dealing with group fitness, not individual fitness. In other words, the entire generation is considered as a whole when measuring the fitness. When we reach the stopping criterion, the entire generation is considered as a whole. We would like to point out that this approach of group fitness, which is similar to the Michigan approach, is indeed different from the common practice in GA, or the Pittsburgh approach. The use of GA for allocation of hard locations in SDM proves to be very efficient and far more superior to currently used techniques. Bibliography Anwar, Ashraf (1997) TLCA: Traffic Light Control Agent. Master's Thesis, The University of Memphis, TN, USA, Dec 1997. Bogner, Myles, Ramamurthy, Uma, and Franklin, Stan (In Press) Consciousness and Conceptual Learning in a Socially Situated Agent. Brown, R. (1987) Two Demonstrations and a Simulator for a Sparse Distributed Memory. RIACS-TR-87.17, NASA Ames Research Center, Moffett Field, CA, USA. Chou, P. (1989) The Capacity of the Kanerva Associative Memory is Exponential. Stanford University, CA, USA. Danforth, D. (1990) An Empirical Investigation of Sparse Distributed Memory using Discrete Speech Recognition. Proceedings of the International Neural Networks Conference, Paris 1990, v 1. Kluwer Academic Press, Norwell, USA, p 183-6. Dasgupta, D. and McGregor, D. R. (1992) Designing Application-Specific Neural Networks using the Structured Genetic Algorithm. In Proceedings of the International workshop on Combination of Genetic Algorithms and Neural Networks (COGANN-92), p 87-96, IEEE Computer Society Press. Dasgupta, D. and Michalewicz, Z. (1997) Evolutionary Algorithms in engineering Applications. Springer Press. Davis, L. (1991) Handbook of Genetic Algorithms. Von Nostrand Reinhold, New York, first edition. Denning, Peter. (1989) Sparse Distributed Memory. American Scientist, Jul/Aug 1989 v 77, p 333. Evans, Richard, and Surkan, Alvin. (1991) Relating Number of Processing Elements in a Sparse Distributed Memory Model to Learning Rate and Generalization. APL Quote Quad, Aug 1991 v 21 n 4, p 166. Fan, K., and Wang, Y. (1992) Sparse distributed memory for use in recognizing handwritten character. Proceedings of the International Conference on Computer Processing of Chinese and Oriental Languages, 1992, p 412-6. Fan, K., and Wang, Y. (1997) A genetic sparse distributed memory approach to the application of handwritten character recognition. Pattern Recognition, 1997 v 30 n 12, p 2015. Flynn, M., Kanerva, Pentti, Ahanin, B., et al (1987) Sparse Distributed Memory Prototype: Principles of Operation. CSL-TR-87.338, Computer Systems Lab, Stanford University, CA, USA. Franklin, Stan. (1995) Artificial Minds. MIT Press. Franklin, Stan. (1997) Autonomous Agents as Embodied AI. Cybernetics and Systems, special issue on Epistemological Issues in Embedded AI. Franklin, Stan, and Graesser, Art. (1997) Is it an Agent, or just a Program? A Taxonomy for Autonomous Agents. Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages, published as Intelligent Agents III, Springer-Verlag, p21-35. Franklin, Stan, Kelemen, Arpad, and McCauley, Lee. (1998) IDA: A Cognitive Agent Architecture. IEEE transactions on Systems, Man, and Cybernetics, 1998. Glenberg, Arthur M. (1997) What Memory is for? Behavioral and Brain Sciences. Goldenberg, David E. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. AddisonWesley Inc. Grefenstette, John J. (1987) Multilevel Credit Assignment in a Genetic Learning System. Proceedings of the 2nd International Conference on Genetic Algorithms and their Applications (ICGA87). Cambridge, MA, USA, Jul 1987. Lawrence Erlbaum Associates, 1987, p 202-9. Grefenstette, John J. (1988) Credit Assignment in Genetic Learning Systems. Proceedings of the 7th National Conference on Artificial Intelligence (AAAI88), p 596600. Grefenstette, John J. (1994) Predictive Models Using Fitness Distributions of Genetic Operators. Proceedings of the 3rd Workshop on Foundations of Genetic Algorithms (FOGA94). Estes Park, CO, USA, 1994, p 139-62. Hamalainen, T., Klapuri, H., and Kaski, K. (1997) Parallel realizations of Kanerva's Sparse Distributed Memory in a tree-shaped computer. Concurrency practice and experience, Sep 1997 v9 n9, p 877. Hely, T. (1994) The Sparse Distributed Memory: A Neurobiologically Plausible Memory Model? Master's Thesis, Dept. of Artificial Intelligence, Edinburgh University. Hely, T., Willshaw, D. J., and Hayes, G. M. (1997) A New Approach to Kanerva's Sparse Distributed Memory. IEEE transactions on Neural Networks, May 1997 v8 n 3, p 791. Hong, H., and Chen, S. (1991) Character Recognition in a Sparse Distributed Memory. IEEE transactions on Systems, man, and Cybernetics, May 1991 v 21 n 3, p 674. Jaeckel, L. (1989a) A Class of Designs for a Sparse Distributed Memory. RIACS-TR-89.30, NASA Ames Research Center, Moffett Field, CA, USA. Jaeckel, L. (1989b) An Alternative Design for a Sparse Distributed Memory. RIACS-TR-89.28, NASA Ames Research Center, Moffett Field, CA, USA. Joglekar, U. (1989) Learning to read aloud: a neural network approach using Sparse Distributed Memory. RIACS-TR-89.27, NASA Ames Research Center, Moffett Field, CA, USA. Kanerva, Pentti. (1988a) Sparse Distributed Memory. MIT Press. Kanerva, Pentti. (1988b) The Organization of an Autonomous Learning System. RIACS-TR-88, NASA Ames Research Center, Moffett Field, CA, USA. Kanerva, Pentti. (1989) A Cerebellar-Model Associative Memory as a Generalized Random Access Memory. RIACS-TR-89.11, NASA Ames Research Center, Moffett Field, CA, USA. Kanerva, Pentti. (1992) Sparse Distributed Memory and Related Models. RIACS-TR-92.10, NASA Ames Research Center, Moffett Field, CA, USA. Kanerva, Pentti (1993) Sparse Distributed Memory and Related Models. Associative Neural Memories Theory and Implementation (Ed. Hassoun, M.), Oxford University Press, Oxford, UK, p 51-75. Mind. Macmillan Inc. Kristoferson, Jan. (1995a) Best Probability of Activation and Performance Comparisons for Several Designs of SDM. RWCP Neuro SICS Laboratory. Kristoferson, Jan. (1995b) Some Comments on the Information Stored in SDM. RWCP Neuro SICS Laboratory. Lindell, M., Saarinen, Jukka., Kaski, K., Kanerva, Pentti (1991) Highly Parallel Realization of Sparse Distributed Memory System. Proceedings of the 6th Distributed Memory Computing Conference, Apr 1991, Portland, OR, USA, p 686-93. Manevitz, L. M., and Zemach, Y. (1997) Assigning meaning to data: using sparse distributed memory for multilevel cognitive tasks. Neurocomputing, Jan 1997 v 14 n 1, p15. McCauley, Thomas L., and Franklin, Stan. (1998) An Architecture for Emotion. AAAI Fall Symposium “Emotional and Intelligent: The Tangled Knot of Cognition”. Pohja, S., Kaski, K. (1992) Kanerva’s Sparse Distributed Memory with Multiple Hamming Thresholds. RIACS-TR92.6, NASA Ames Research Center, Moffett Field, CA, USA. Prager, R. W., Fallside, F. (1989) The Modified Kanerva Model for Automatic Speech Recognition. Computer Speech and Language, v 3, p 61-81. Ramamurthy, Uma, Bogner, Myles, and Franklin, Stan. (1998) Conscious Learning in an Adpative Software Agent. Proceedings of the 5th international Conference for Simulation of Adaptive Behavior, From Animals to Animats V. Karlsson, Roland. (1995) Evaluation of a Fast Activation Mechanism for the Kanerva SDM. RWCP Neuro SICS Laboratory. Rao, Rajesh P. N., and Fuentes, Olac. (1996) Learning Navigational Behaviors using a Predictive Sparse Distributed Memory. Proceedings of the 4th international Conference on Simulation of Adaptive Behavior, From Animals to Animats IV, p 382 . Keeler, J. (1987) Capacity for patterns and sequences in Kanerva's SDM as compared to other associative memory models. RIACS-TR-87.29, NASA Ames Research Center, Moffett Field, CA, USA. Rao, Rajesh P. N., and Fuentes, Olac. (1998) Hierarchical Learning of Navigation Behaviors in an Autonomous Robot using a Predictive Sparse Distributed Memory. Machine Learning, Apr 1998 v31 n 1/3, p 87. Keeler, J. (1988) Comparison between Kanerva's SDM and Hopfield-type neural network. Cognitive Science 12, 1988, p 299-329. Rogers, D. (1988a) Kanerva's Sparse Distributed Memory: An Associative Memory Algorithm Well-Suited to the Connection Machine. RIACS-TR-88.32, NASA Ames Research Center, Moffett Field, CA, USA. Kosslyn, Stephen M., and Koenig, Olivier. (1992) Wet Rogers, D. (1988b) Using data tagging to improve the performance of Kanerva's Sparse Distributed Memory. RIACS-TR-88.1, NASA Ames Research Center, Moffett Field, CA, USA. Rogers, D. (1989a) Kanerva's Sparse Distributed Memory: An Associative Memory Algorithm Well-Suited to the Connection Machine. International Journal of High Speed Computing, p 349. Rogers, D. (1989b) Statistical Prediction with Kanerva's Sparse Distributed Memory. RIACS-TR-89.02, NASA Ames Research Center, Moffett Field, CA, USA. Ryan, S., and Andreae, J. (1995) Improving the Performance of Kanerva's Associative Memory. IEEE transactions on Neural Networks 6-1, p 125. Saarinen, Jukka, Kotilainen, Petri, Kaski, Kimmo (1991) VLSI Architecture of Sparse Distributed Memory. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS91), Jun 1991, v 5, p 30747. Saarinen, Jukka, Pohja, S., Kaski, K. (1991) Self Organization with Kanerva’s Sparse Distributed Memory. Proceedings of the International Conference on Artificial Neural Networks (ICANN91), Helsinki, Finland, v 1, p 285-90. Schaffer, J. David, Grefenstette, John J. (1985) MultiObjective Learning via Genetic Algorithms. Proceedings of the 9th International Joint Conference on Artificial Intelligence (IJCAI85), p 593-5. Scott, E., Fuller, C., and O'Brien, W. (1993) Sparse Distributed Associative Memory for the Identification of Aerospace Acoustic Sources. AIAA journal, Sep 1993 v 31 n 9, p 1583. Surkan, Alvin. (1992) WSDM: Weighted Sparse Distributed Memory Prototype Expressed in APL. APL quote quad, Jul 1992 v 23 n 1, p 235. Turk, Andreas, and Gorz, Gunther. (1994) Kanerva's Sparse Distributed Memory: An Object-Oriented Implementation on the Connection Machine. Report on a research project at the University of Erlangen-Nurnberg, Germany. Turk, Andreas. (1995) CM-SDM Users Manual. IMMD 8, University of Erlangen-Nurnberg, Germany. Willshaw, David. (1990) Coded Memories. A Commentary on 'Sparse Distributed Memory', by Pentti Kanerva. Cognitive Neuropsychology v 7 n 3, p 245. Zhang, Zhaoua, Franklin, Stan, and Dasgupta, Dipankar (1998a) Metacognition in Software Classifier Systems, Proc AAAI 1998. Agents using Zhang, Zhaoua, Franklin, Stan, Olde, Brent, Wan, Yun, and Graesser, Art (1998b) Natural Language Sensing for Autonomous Agents, Proc. IEEE Joint Symposia on Intelligence and Systems, Rockville, Maryland, p374-81.