Using Genetic Algorithms for Sparse Distributed Memory

advertisement
Using Genetic Algorithms for Sparse Distributed Memory Initialization
Ashraf Anwar1,2
Dipankar Dasgupta3
Stan Franklin1,2
Institute for Intelligent Systems
The University of Memphis
Memphis, TN 38152
aanwar1@memphis.edu
Institute for Intelligent Systems
The University of Memphis
Memphis, TN 38152
dasgupta@msci.memphis.edu
Institute for Intelligent Systems
The University of Memphis
Memphis, TN 38152
stan.franklin@memphis.edu
Abstract- In this paper, we describe the use of Genetic
Algorithms to initialize a set of hard locations that
makes up the storage space of Sparse Distributed
Memory (SDM). SDM is an associative memory
technique that uses binary spaces, and relies on close
memory items tending to be clustered together, with
some level of abstraction and with details blurring. An
important factor in the physical implementation of SDM
is how many hard locations are used, which greatly
affects the memory capacity. It is also dependent
somehow on the dimension of the binary space used. For
the SDM system to function appropriately, the hard
locations should be uniformly distributed over the
binary space. We represented a set of hard locations of
SDM as population members, and employed GA to
search for the best (fittest) distribution of hard locations
over the vast binary space. Fitness is based on how far
each hard location is from all other hard locations,
which measures the uniformity of the distribution. The
preliminary results are very promising, with the GA
significantly outperforming random initialization used
in most existing SDM implementations. This use of GA,
which is similar to the Michigan approach, differs from
the standard approach in that the object of the search is
the entire population.
1 Introduction
Genetic Algorithms (GA) are finding increasing number of
applications in a variety of search, optimization, machine
learning and other problems (Dasgupta and McGregor 1992;
Dasgupta and Michalewicz 1997; Davis 1991; Fan and
Wang 1997). They are usually used whenever there is a
large search space, and no efficient algorithm, for trimming
or pruning it. Sparse Distributed Memory (SDM) is an
associative memory model invented by Kanerva (1988a),
which we think is a good model for the human memory.
SDM has a vast binary space of possible addresses in the
semantic space, whereas for any practical application only a
very small portion of this space can actually exist. This
makes SDM a perfect match for GA search. Having so
1
2
3
Supported in part by ONR grant N00014-98-1-0332
Member of the Conscious Software Research Group
A research grant supported by The University of Memphis, 1998
many possibilities for assignment of actual (hard) locations
to addresses suggests the use of GA. GA proved to be much
more superior to the random initialization method that is
widely used.
The only SDM implementation that used non-random
initialization of hard locations, is the GSDM work (Fan and
Wang 1997), in which they initialized the locations based
upon the data set to be stored. This can serve to reduce the
memory requirement for the implementation, whereas
limiting generalization, and learning capabilities, as the
memory becomes full. They used the mutation and
crossover genetic operators to allow for change or noise in
the assigned addresses, based upon the selected existing
addresses, if any, until all available memory locations are
assigned to the available input data set.
Our approach is totally different, and independent of
theirs. The idea here is to keep the general purpose huge
memory SDM as is, but to invoke GA for better distribution
of the hard locations over the huge semantic space. This
approach has two main advantages. It maintains the original
functionality of SDM and, at the same time, allows for nonbiased distribution of the hard locations over the semantic
space, which allows for a domain independent and general
associative memory and learning.
There are also neural network implementations of SDM
(Hely et al 1997; Joglekar 1989; Scott et al 1993), in which
weights for the hidden layer (the hard locations) are
initialized randomly. Such random initialization of the
hidden layer weights corresponds initially to the random
initialization of hard locations in traditional SDM, but
allows for dynamic address change as learning or training
takes place so that the hard locations correspond to the
addresses of the training patterns.
SDM has proved successful in modeling associative
memory (Anwar 1997; Hely 1994; Keeler 1987; Scott
1993). Associative memory is typically needed for any
smart autonomous agent for various reasons (Glenberg
1997; Kosslyn 1992). Cognitive and Conscious agents
typically need such a memory (Franklin 1997; Franklin and
Graesser 1997). Cmattie, a clerical software agent, uses
associative memory to keep track of departmental seminar
announcements
(Bogner
1999;
McCauley
1998;
Ramamurthy et al 1998; Zhang et al 1998a;b). IDA,
Intelligent Distribution Agent, also needs such a memory to
learn and keep associations between various information
pieces pertaining to the task of personnel distribution
(Franklin et al 1998).
2 Sparse Distributed Memory
2.1 A Brief Tour with SDM
Sparse Distributed Memory, SDM, is the work of Pentti
Kanerva (1988a;b; 1992) which gets its name from the
sparse allocation of storage locations in the vast binary
address space, and the distributed nature of information
storage and retrieval as we will explain later. For more
details and discussions, see (Franklin 1995) for a brief
overview, (Willshaw 1990) for a useful commentary, or
(Anwar 1997, Chapter 2) for a more detailed summary of its
mechanism.
Many
evaluations,
extensions,
and
enhancements have been suggested for SDM (Evans and
Surkan 1991; Karlsson 1995; Kristoferson 1995a;b; Rogers
1988b; Ryan and Andreae 1995).
The auto-associative version of SDM is truly an
associative memory technique where the contents and the
addresses belong to the same space and are used
alternatively. The dimension of the space determines how
rich each word is. Another important factor is how many
real memory locations are in the space. We call these hard
locations. Features are represented as one or more bits.
Groups of features are concatenated to form a word, which
becomes a candidate for writing into SDM. When writing, a
copy of this binary string is placed in all close-enough hard
locations. When reading, a cue would reach all close enough
hard locations and get some sort of aggregate or average
return from them. Reading is not always successful.
Depending on the cue and the previously written
information, among other factors, convergence or
divergence during a reading operation may occur. If
convergence occurs, the pooled word will be the closest
match, possibly with some sort of abstraction, of the input
reading cue. On the other hand, when divergence occurs,
there is no relation, in general, between the input cue and
what is retrieved from SDM.
2.2 SDM from the Inside Out
SDM is a content addressable memory that, in many ways,
is ideal for use as a long-term associative memory. Content
addressable means that items in memory can be retrieved by
using part of their contents as a cue, rather than having to
know their addresses in memory.
Boolean geometry is the geometry of Boolean spaces. A
Boolean space is the set of all Boolean vectors (that is,
vectors composed of zeros and ones) of some fixed length,
n, called the dimension of the space. Points in Boolean
space are Boolean vectors. The Boolean space of dimension
n contains 2n Boolean vectors, each of length n. The
number of points increases exponentially as the dimension
increases. Boolean geometry uses a metric called the
Hamming distance, where the distance between two points
is the number of coordinates at which they differ. Thus
d((1,0,0,1,0), (1,0,1,1,1)) = 2. The distance between two
points will measure the similarity between two memory
items, closer points being more similar. We may think of
these Boolean vectors as feature vectors, where each feature
can be only on, 1, or off, 0. Two such feature vectors are
closer together if more of their features are the same. For n
= 1000, most of the space lies between distance 422 and
distance 578 from a given vector. In other words, almost all
the space is far away from any given vector. A Boolean
space implementation is typically sparsely populated, an
important property for the construction of the model, and
the source of part of its name.
A sphere is the set of all points within some fixed
distance, the radius, from its center. Spheres in Boolean
space are quite different in one respect from the Euclidean
spheres we are used to. Points of an Euclidean sphere are
uniformly distributed throughout. For r = n/2, most of the
points in a sphere in a Boolean space lie close to its
boundary.
SDM address space is enormous. It contains 2n
locations, where n is the dimension of the binary space. For
n=1000, one cannot hope for such a vast memory. On the
other hand, thinking of feature vectors, a thousand features
wouldn't deal with human visual input until a high level of
abstraction had been reached. A dimension of 1000 may not
be all that much; it may, for some purposes, be
unrealistically small. Kanerva proposes to deal with this vast
address space by choosing a uniform random sample, size
220, of locations, that is, about a million of them. These he
calls hard locations. With 220 hard locations out of 21000
possible locations, the ratio is 2-980, very sparse indeed. In
addition, the distance from a random location in the entire
address space to the nearest hard location will fall between
411 and 430 ninety-eight percent of the time, with the
median distance being 424. The hard locations are certainly
sparse.
SDM is distributed in that many hard locations
participate in storing and retrieving each datum, and one
hard location can be involved in the storage and retrieval of
many data. This is a very different beast than the store-onedatum-in-one-location type of memory to which we're
accustomed. Each hard location, itself a bit vector of length
1000, stores data in 1000 counters, each with range -40 to
40. We now have about a million hard locations, each with a
thousand counters, totaling a billion counters in all.
Numbers in the range -40 to 40 will take most of a byte to
store. Thus we are talking about a billion bytes, a gigabyte,
of memory. Quite a lot, but not out of the question.
How do these counters work? Writing a 1 to the counter
increments it; writing a 0 decrements it. A datum, to be
written is a bit vector of length 1000.4 To write  at a given
hard location x, write each coordinate of to the
corresponding counter in x, either incrementing it or
decrementing it.
Call the sphere of radius 451 centered at location  the
access sphere of that location. An access sphere typically
contains about a thousand hard locations, with the closest to
 usually some 424 bits away and the median distance from
 to the hard locations in its access sphere about 448. Any
hard location in the access sphere of  is accessible from .
With this machinery in hand, we can now write
distributively to any location, hard or not. To write a datum
 to a location , simply write  to each of the roughly one
thousand hard locations accessible from , Distributed
storage.
With our datum distributively stored, the next question is
how to retrieve it. With this in mind, let's ask first how one
reads from a single hard location, x. Compute , the bit
vector read at x, by assigning its ith bit the value 1 or 0
according as x's ith counter is positive or negative. Thus,
each bit of , results from a majority rule decision of all the
data that have been written on x. The read datum, , is an
archetype of the data that have been written to x, but may
not be any one of them. From another point of view,  is the
datum with smallest mean distance from all data that have
been written to x.
Knowing how to read from a hard location allows us to
read from any of the 21000 arbitrary locations. Suppose  is
any location. The bit vector, , to be read at , is formed by
pooling the data read from each hard location accessible
from . Each bit of  results from a majority rule decision
over the pooled data. Specifically, to get the ith bit of , add
together the ith bits of the data read from hard locations
accessible from , and use half the number of such hard
locations as a threshold. At or over threshold, assign a 1.
Below threshold assign a 0. Put another way, pool the bit
vectors read from hard locations accessible from , and let
each of their ith bits votes on the ith bit of .
We now know how to write items into memory, and how
to read them out. But what's the relation between the datum
in and the datum out? Are these two bit vectors the same, as
we'd hope? Let's first look at the special case where the
datum  is written at the location . This makes sense since
both are bit vectors of length one thousand. Kanerva offers a
mathematical proof that reading form  recovers (Kanerva
1988a)Here’s the idea of the proof. Reading from 
recovers archetypes from each of some thousand hard
locations and takes a vote. The voting is influenced by the
~1000 stored copies of  and, typically, by about 10,000
other stored data items. Since the intersection of two access
spheres is typically quite small, these other data items
4
We will try to adhere to Kanerva's convention of using lower
case Greek letters for locations and for data, and lower case Roman
letters for hard locations. The Greek letters will include (xi), 
(eta), and  (zeta).
influence a given coordinate only in small groups of ones or
twos or threes, i.e. white noise. The thousand copies drown
out this slight noise and is successfully reconstructed.
Not all of the stored items are needed to recover it.
Iterated reading allows recovery when reading from a noisy
version of what's been stored. Again, Kanerva offers
conditions (involving how much of the stored item is
available for the read) under which this is true, and a
mathematical proof. Reading with a cue that has never been
written to SDM before gives, if convergent, the closest
match stored in SDM to the input cue, with some sort of
abstraction. Since a convergent sequence of iterates
converges very rapidly, while a divergent sequence of
iterates bounces about seemingly at random, comparison of
adjacent items in the sequence quickly tells whether or not a
sequence converges. Thus, this memory is content
addressable, provided we write each datum with itself as
address.
The above discussion, based on the identity of datum and
address, produced a content addressable memory with many
pleasing properties. It works well for reconstructing
individual memories.
3 The GA Design and Implementation
3.1 Population Formation
Binary encoding was used since we are dealing with binary
spaces. For various values of chromosome length (binary
space dimension), the population size (number of hard
locations) was determined.
To be noted here is that the number of hard locations
employed in a certain SDM greatly affects the capacity of
the resulting associative memory. It is a design decision
after all. In our experiments, we took the size of our storage
memory (number of hard locations), to be the same as the
dimension (chromosome length), resulting in a non-linear
occupation ratio. This makes the storage size equal to the
binary log of the vast semantic address space. For n=100,
we have 2100 possible addresses over the semantic space, out
of which we only assign 100 hard locations. This is different
from other occupation ratios employed in other SDM
implementations, but again it is after all a design decision
based mainly on the memory capacity required.
For n=10, 1110011000 is an example of an address or a
hard location to be assigned. If we were to have only two
hard locations in such a case, then the best two assigned
addresses will be at 10 hamming distance from each other,
e.g. 1111111111 and 0000000000. The best population will
have individuals that are as far apart as possible from each
other on the average, with a small enough standard
deviation. It helps also to have each pair of individuals at
least a certain threshold apart.
The size of the search space is clearly 2n. It is
immediately apparent how such a space becomes too huge
for any traditional search technique as n becomes large.
Even a traditional heuristic search technique will find it too
hard to arrive at a good coverage of the space in an efficient
manner.
3.2 The Fitness Measure
The fitness function for an individual m is given by:
 Hamming Distance (m,n)
We have to consider the very huge size of the space before
we can even dream about spanning the entire space with any
means. GA however can play the rule of a good heuristic,
picking representatives out of vast regions in the search
space. This is similar in some sense to the basic
functionality of SDM when reading or writing, since we
consider only actual sparse hard locations existing in access
spheres (see Section 2.2 above for explanation).
for all nm
In other words, it is the sum of the hamming distances
between that individual and all others, subject to two
constraints:
i.
The individual is unique in the population
(no repetition). This is achieved by checking each new child
to be added to a generation against previous members of the
generation. If the child is not unique, it is discarded, and the
genetic operators are reused to produce another unique
child.
ii.
The individual is at least k bits apart from
all other individuals (threshold for hamming distance where
k>0). Similar precautions are followed to preserve this
condition just as in i. To be noted here, is that the threshold
is typically not high, in order to avoid unnecessary
reiteration.
So, the target is to maximize the mean fitness per
generation according to the above two constraints, while
preserving a reasonably low standard deviation.
Note that our goal is a whole population rather than a
particularly fit individual as is usually the case. Actually,
there is a whole spectrum between the two extremes of
group fitness, the Michigan approach, and individual fitness,
the Pittsburgh approach (Grefenstette 1987; 1988; 1994;
Schaffer and Grefenstette 1985). Our group fitness measure
is similar to the Michigan approach.
3.3 The Genetic Operators
Many parameter tunings were considered and compared.
The following set of parameters tends to produce the best
results among all sets considered. More testing and
experimenting with different values of GA parameters is
being done.
Roulette Selection (Davis 1991; Goldenberg 1989), was
used, where individuals get selected in proportion to their
fitness. Ranked Replacement was considered where we
would replace the worst parent always, and only if the child
is better than the parent. Elitism was employed to keep the
best fitting individual. Invert Mutation with 2% mutation
rate was used along with One-point Crossover with 1.0 rate.
The above operators can never span the entire search
space for large values of n (space dimension) due to the
exponential nature of the space, which becomes too large
for any computational algorithm to cover as n increases.
However, it is reasonable to assume that they will span
enough representatives of the various regions in the space.
3.4 Implementation
The GA software package Sugal version 2.1 was used on
Solaris UNIX over Sparc 5 platform. We had to modify the
fitness function, and replacement strategies in the source
code to suit our complex group fitness function, and the
constraints imposed on the GA search. The fitness function
now is a function of the entire generation, not only one
individual. Also the replacement strategy was modified to
observe the constraints mentioned in sec 3.2.
4 Results
When n is both the space dimension as well as the number
of hard locations, it can be shown that the optimal mean
fitness is n2/2 (For n=2, 00 and 11 are optimal, with mean
fitness of 2. For n=3, 000, 011, 101 are optimal with mean
fitness of 4. For n=4, 0000, 1111, 1010, 0101 are optimal,
with mean fitness of 8). Apparently, as long as that
condition holds, and for small values of n, we may be able
to use a deterministic algorithm. But as n becomes large,
and the number of hard locations increases significantly
over n, the need for a heuristic like GA arises.
Table 1 shows the improvement GA has over random
initialization in terms of the average run time, taken over 20
sample runs, required to reach the target mean fitness with a
specific standard deviation (SD). The times shown in the
table are in the format “h:mm:ss:dd”. Each set of two rows
gives the results of GA vs. random initialization for 6%,
4%, and 2% standard deviation from the target mean fitness,
for a certain number of hard locations. The second row in
each set explicitly gives the improvement gained by GA in
percentage.
The target is to reach certain group fitness with a limit
on the standard deviation of the individuals. The lower the
standard deviation, the better is the resulting population, and
the harder is the task. Note that for non-significant SD, both
methods reach target almost instantly. In this case, the
random approach may then perform as well as the GA, or
even slightly better in rare occasions. When tighter
constraints are in place, the GA significantly outperforms
random initialization. The improvement can be as high as
about 41000 times faster, or 4,114,200%. The nonmonotonic improvement of GA can be due in part to the
huge size of the search space, which is never entirely
spanned by any means, for large values of n.
No. of Hard
Locations
10
GA Gain
20
GA Gain
30
GA Gain
40
GA Gain
50
GA Gain
Random
6% SD
0:00:00.09
0:00:00.01
0:00:00.10
0:00:00.09
0:00:00.40
GA
6% SD
Random
4% SD
0:00:00.01
800%
0:00:00.01
0%
0:00:00.07
43%
0:00:00.10
-10%
0:00:00.20
100%
GA
6% SD
0:00:07.00
0:00:04.60
0:00:00.80
0:00:02.50
0:00:02.10
0:00:00.14
4,900%
0:00:00.03
15,233%
0:00:00.04
2,000%
0:00:00.10
2,400%
0:00:00.40
425%
Random
2% SD
0:03:16.80
6:34:11.10
6:57:08.65
8:00:00.10
8:41:03.76
GA
2% SD
0:00:01.83
10,654%
0:00:04.60
514,054%
0:00:04.30
581,962%
0:00:00.70
4,114,200%
0:00:01.60
1,953,885%
Table 1: Average Runtimes for GA Initialization compared to Random Initialization.
Note the increased time requirements of both the GA and
the random approach, as the problem size (dimension and
number of hard locations) increases, which was certainly
expected. Apparently, for large dimensions, random
initialization can only match GA, if the space is too
sparse. This explains why existing SDM applications
were able to manage with the random initialization so far.
But, by using our approach to remove this limitation, we
can deploy SDM with small dimensions, and relatively
large number of evenly distributed hard locations.
5 Discussion
The advantage of our approach is immediately obvious.
To our knowledge, this is the first use of GA for such a
search problem. The random initialization approach was
the one used in most SDM implementations (Anwar 1997;
Hely et al 1997; Hong and Chen 1991; Rao and Fuentes
1996;1998; Surkan 1992). Also it was the original method
suggested by Kanerva (Kanerav 1988a), and discussed
elsewhere (Denning 1989; Franklin 1995; Willshaw
1990). A few neural network implementations of SDM
were also used (Hely et al 1997; Joglekar 1989; Scott et al
1993), with random initialization of link weights with
small values, followed by incremental correction using
standard Backpropagation (Haykin 1994). Practical
implementations of SDM are not rare. They vary from
traditional implementations (Anwar 1997; Manevitz and
Zemach 1997; Rogers 1989b), to parallel tree-structure
realization (Hamalainen et al 1997), to connection
machine implementation (Rogers 1988a; 1989a; Turk
1995; Turk and Gorz 1994), or to neural network
simulation (Scott et al 1993). Keeler made a comparison
between SDM and Hopfield neural network (Keeler
1988). Jaeckel also proposed a design variation for SDM
(Jaeckel 1989a;b).
6 Future Research
The use of Parallel Genetic Algorithms (PGA) or
Structured Genetic Algorithms (sGA) to divide the search
space more efficiently, with migration between subspaces, will be considered and evaluated. The use of a
multi-modal niching GA to evolve independent subpopulations out of each a candidate is picked for hard
location assignment, is currently under investigation.
Also, more tuning of the current GA parameters may well
help obtain even better results. To be noted here, is that
the straightforward mathematical algorithm to allocate
arbitrary number of hard locations in a binary space with
large dimension, while observing the maximum hamming
distance constraint, has exponential time complexity. An
SDM augmented with GA for the assignment of the
addresses as illustrated here, or to reduce memory
requirement and to assign addresses based on the
available inputs (Fan, and Wang 1997), may well be a key
to a better understanding and simulation of the human
mind.
7 Conclusions
GA significantly outperforms the random initialization of
Sparse Distributed Memory in almost all cases. The
advance is more apparent when all the hard locations are
required to have a minimum fitness that is close enough to
the target mean fitness.
We emphasize here that we are dealing with group
fitness, not individual fitness. In other words, the entire
generation is considered as a whole when measuring the
fitness. When we reach the stopping criterion, the entire
generation is considered as a whole. We would like to
point out that this approach of group fitness, which is
similar to the Michigan approach, is indeed different from
the common practice in GA, or the Pittsburgh approach.
The use of GA for allocation of hard locations in SDM
proves to be very efficient and far more superior to
currently used techniques.
Bibliography
Anwar, Ashraf (1997) TLCA: Traffic Light Control Agent.
Master's Thesis, The University of Memphis, TN, USA,
Dec 1997.
Bogner, Myles, Ramamurthy, Uma, and Franklin, Stan (In
Press) Consciousness and Conceptual Learning in a
Socially Situated Agent.
Brown, R. (1987) Two Demonstrations and a Simulator
for a Sparse Distributed Memory. RIACS-TR-87.17,
NASA Ames Research Center, Moffett Field, CA, USA.
Chou, P. (1989) The Capacity of the Kanerva Associative
Memory is Exponential. Stanford University, CA, USA.
Danforth, D. (1990) An Empirical Investigation of Sparse
Distributed Memory using Discrete Speech Recognition.
Proceedings of the International Neural Networks
Conference, Paris 1990, v 1. Kluwer Academic Press,
Norwell, USA, p 183-6.
Dasgupta, D. and McGregor, D. R. (1992) Designing
Application-Specific Neural Networks using the
Structured Genetic Algorithm. In Proceedings of the
International workshop on Combination of Genetic
Algorithms and Neural Networks (COGANN-92), p 87-96,
IEEE Computer Society Press.
Dasgupta, D. and Michalewicz, Z. (1997) Evolutionary
Algorithms in engineering Applications. Springer Press.
Davis, L. (1991) Handbook of Genetic Algorithms. Von
Nostrand Reinhold, New York, first edition.
Denning, Peter. (1989) Sparse Distributed Memory.
American Scientist, Jul/Aug 1989 v 77, p 333.
Evans, Richard, and Surkan, Alvin. (1991) Relating
Number of Processing Elements in a Sparse Distributed
Memory Model to Learning Rate and Generalization.
APL Quote Quad, Aug 1991 v 21 n 4, p 166.
Fan, K., and Wang, Y. (1992) Sparse distributed memory
for use in recognizing handwritten character. Proceedings
of the International Conference on Computer Processing
of Chinese and Oriental Languages, 1992, p 412-6.
Fan, K., and Wang, Y. (1997) A genetic sparse distributed
memory approach to the application of handwritten
character recognition. Pattern Recognition, 1997 v 30 n
12, p 2015.
Flynn, M., Kanerva, Pentti, Ahanin, B., et al (1987)
Sparse Distributed Memory Prototype: Principles of
Operation. CSL-TR-87.338, Computer Systems Lab,
Stanford University, CA, USA.
Franklin, Stan. (1995) Artificial Minds. MIT Press.
Franklin, Stan. (1997) Autonomous Agents as Embodied
AI. Cybernetics and Systems, special issue on
Epistemological Issues in Embedded AI.
Franklin, Stan, and Graesser, Art. (1997) Is it an Agent, or
just a Program? A Taxonomy for Autonomous Agents.
Proceedings of the Third International Workshop on
Agent Theories, Architectures, and Languages, published
as Intelligent Agents III, Springer-Verlag, p21-35.
Franklin, Stan, Kelemen, Arpad, and McCauley, Lee.
(1998) IDA: A Cognitive Agent Architecture. IEEE
transactions on Systems, Man, and Cybernetics, 1998.
Glenberg, Arthur M. (1997) What Memory is for?
Behavioral and Brain Sciences.
Goldenberg, David E. (1989) Genetic Algorithms in
Search, Optimization, and Machine Learning. AddisonWesley Inc.
Grefenstette, John J. (1987) Multilevel Credit Assignment
in a Genetic Learning System. Proceedings of the 2nd
International Conference on Genetic Algorithms and their
Applications (ICGA87). Cambridge, MA, USA, Jul 1987.
Lawrence Erlbaum Associates, 1987, p 202-9.
Grefenstette, John J. (1988) Credit Assignment in Genetic
Learning Systems. Proceedings of the 7th National
Conference on Artificial Intelligence (AAAI88), p 596600.
Grefenstette, John J. (1994) Predictive Models Using
Fitness Distributions of Genetic Operators. Proceedings
of the 3rd Workshop on Foundations of Genetic
Algorithms (FOGA94). Estes Park, CO, USA, 1994, p
139-62.
Hamalainen, T., Klapuri, H., and Kaski, K. (1997)
Parallel realizations of Kanerva's Sparse Distributed
Memory in a tree-shaped computer. Concurrency practice
and experience, Sep 1997 v9 n9, p 877.
Hely, T. (1994) The Sparse Distributed Memory: A
Neurobiologically Plausible Memory Model? Master's
Thesis, Dept. of Artificial Intelligence, Edinburgh
University.
Hely, T., Willshaw, D. J., and Hayes, G. M. (1997) A New
Approach to Kanerva's Sparse Distributed Memory. IEEE
transactions on Neural Networks, May 1997 v8 n 3, p
791.
Hong, H., and Chen, S. (1991) Character Recognition in
a Sparse Distributed Memory. IEEE transactions on
Systems, man, and Cybernetics, May 1991 v 21 n 3, p
674.
Jaeckel, L. (1989a) A Class of Designs for a Sparse
Distributed Memory. RIACS-TR-89.30, NASA Ames
Research Center, Moffett Field, CA, USA.
Jaeckel, L. (1989b) An Alternative Design for a Sparse
Distributed Memory. RIACS-TR-89.28, NASA Ames
Research Center, Moffett Field, CA, USA.
Joglekar, U. (1989) Learning to read aloud: a neural
network approach using Sparse Distributed Memory.
RIACS-TR-89.27, NASA Ames Research Center, Moffett
Field, CA, USA.
Kanerva, Pentti. (1988a) Sparse Distributed Memory.
MIT Press.
Kanerva, Pentti. (1988b) The Organization of an
Autonomous Learning System. RIACS-TR-88, NASA
Ames Research Center, Moffett Field, CA, USA.
Kanerva, Pentti. (1989) A Cerebellar-Model Associative
Memory as a Generalized Random Access Memory.
RIACS-TR-89.11, NASA Ames Research Center, Moffett
Field, CA, USA.
Kanerva, Pentti. (1992) Sparse Distributed Memory and
Related Models. RIACS-TR-92.10, NASA Ames
Research Center, Moffett Field, CA, USA.
Kanerva, Pentti (1993) Sparse Distributed Memory and
Related Models. Associative Neural Memories Theory
and Implementation (Ed. Hassoun, M.), Oxford
University Press, Oxford, UK, p 51-75.
Mind. Macmillan Inc.
Kristoferson, Jan. (1995a) Best Probability of Activation
and Performance Comparisons for Several Designs of
SDM. RWCP Neuro SICS Laboratory.
Kristoferson, Jan. (1995b) Some Comments on the
Information Stored in SDM. RWCP Neuro SICS
Laboratory.
Lindell, M., Saarinen, Jukka., Kaski, K., Kanerva, Pentti
(1991) Highly Parallel Realization of Sparse Distributed
Memory System. Proceedings of the 6th Distributed
Memory Computing Conference, Apr 1991, Portland, OR,
USA, p 686-93.
Manevitz, L. M., and Zemach, Y. (1997) Assigning
meaning to data: using sparse distributed memory for
multilevel cognitive tasks. Neurocomputing, Jan 1997 v
14 n 1, p15.
McCauley, Thomas L., and Franklin, Stan. (1998) An
Architecture for Emotion. AAAI Fall Symposium
“Emotional and Intelligent: The Tangled Knot of
Cognition”.
Pohja, S., Kaski, K. (1992) Kanerva’s Sparse Distributed
Memory with Multiple Hamming Thresholds. RIACS-TR92.6, NASA Ames Research Center, Moffett Field, CA,
USA.
Prager, R. W., Fallside, F. (1989) The Modified Kanerva
Model for Automatic Speech Recognition. Computer
Speech and Language, v 3, p 61-81.
Ramamurthy, Uma, Bogner, Myles, and Franklin, Stan.
(1998) Conscious Learning in an Adpative Software
Agent. Proceedings of the 5th international Conference for
Simulation of Adaptive Behavior, From Animals to
Animats V.
Karlsson, Roland. (1995) Evaluation of a Fast Activation
Mechanism for the Kanerva SDM. RWCP Neuro SICS
Laboratory.
Rao, Rajesh P. N., and Fuentes, Olac. (1996) Learning
Navigational Behaviors using a Predictive Sparse
Distributed Memory. Proceedings of the 4th international
Conference on Simulation of Adaptive Behavior, From
Animals to Animats IV, p 382 .
Keeler, J. (1987) Capacity for patterns and sequences in
Kanerva's SDM as compared to other associative memory
models. RIACS-TR-87.29, NASA Ames Research Center,
Moffett Field, CA, USA.
Rao, Rajesh P. N., and Fuentes, Olac. (1998) Hierarchical
Learning of Navigation Behaviors in an Autonomous
Robot using a Predictive Sparse Distributed Memory.
Machine Learning, Apr 1998 v31 n 1/3, p 87.
Keeler, J. (1988) Comparison between Kanerva's SDM
and Hopfield-type neural network. Cognitive Science 12,
1988, p 299-329.
Rogers, D. (1988a) Kanerva's Sparse Distributed
Memory: An Associative Memory Algorithm Well-Suited
to the Connection Machine. RIACS-TR-88.32, NASA
Ames Research Center, Moffett Field, CA, USA.
Kosslyn, Stephen M., and Koenig, Olivier. (1992) Wet
Rogers, D. (1988b) Using data tagging to improve the
performance of Kanerva's Sparse Distributed Memory.
RIACS-TR-88.1, NASA Ames Research Center, Moffett
Field, CA, USA.
Rogers, D. (1989a) Kanerva's Sparse Distributed
Memory: An Associative Memory Algorithm Well-Suited
to the Connection Machine. International Journal of High
Speed Computing, p 349.
Rogers, D. (1989b) Statistical Prediction with Kanerva's
Sparse Distributed Memory. RIACS-TR-89.02, NASA
Ames Research Center, Moffett Field, CA, USA.
Ryan, S., and Andreae, J. (1995) Improving the
Performance of Kanerva's Associative Memory. IEEE
transactions on Neural Networks 6-1, p 125.
Saarinen, Jukka, Kotilainen, Petri, Kaski, Kimmo (1991)
VLSI Architecture of Sparse Distributed Memory.
Proceedings of IEEE International Symposium on
Circuits and Systems (ISCAS91), Jun 1991, v 5, p 30747.
Saarinen, Jukka, Pohja, S., Kaski, K. (1991) Self
Organization with Kanerva’s Sparse Distributed Memory.
Proceedings of the International Conference on Artificial
Neural Networks (ICANN91), Helsinki, Finland, v 1, p
285-90.
Schaffer, J. David, Grefenstette, John J. (1985) MultiObjective Learning via Genetic Algorithms. Proceedings
of the 9th International Joint Conference on Artificial
Intelligence (IJCAI85), p 593-5.
Scott, E., Fuller, C., and O'Brien, W. (1993) Sparse
Distributed Associative Memory for the Identification of
Aerospace Acoustic Sources. AIAA journal, Sep 1993 v
31 n 9, p 1583.
Surkan, Alvin. (1992) WSDM: Weighted Sparse
Distributed Memory Prototype Expressed in APL. APL
quote quad, Jul 1992 v 23 n 1, p 235.
Turk, Andreas, and Gorz, Gunther. (1994) Kanerva's
Sparse Distributed Memory: An Object-Oriented
Implementation on the Connection Machine. Report on a
research project at the University of Erlangen-Nurnberg,
Germany.
Turk, Andreas. (1995) CM-SDM Users Manual. IMMD 8,
University of Erlangen-Nurnberg, Germany.
Willshaw, David. (1990) Coded Memories. A
Commentary on 'Sparse Distributed Memory', by Pentti
Kanerva. Cognitive Neuropsychology v 7 n 3, p 245.
Zhang, Zhaoua, Franklin, Stan, and Dasgupta, Dipankar
(1998a) Metacognition in Software
Classifier Systems, Proc AAAI 1998.
Agents
using
Zhang, Zhaoua, Franklin, Stan, Olde, Brent, Wan, Yun,
and Graesser, Art (1998b) Natural Language Sensing for
Autonomous Agents, Proc. IEEE Joint Symposia on
Intelligence and Systems, Rockville, Maryland, p374-81.
Download