6!!9 Pergamon

advertisement
6!!9
NeuralNetworks,Vol. 10,No.6, pp. 1017-1036,1997
01997 ElsevierScienceLtd.MI rightsreserved
Printedin GreatBritain
0893-6080/97$17.00+.00
Pergamon
CONTRIBUTED
PII: S0893-6080(97)00016-6
ARTICLE
Convergence-Zone Episodic Memory: Analysis and
Simulations
MARK MOLL1 AND RISTOMIIKKULAINEN2
‘School of Computer Science, Carnegie Mellon University and ‘Department of Computer Sciences, The University of Texas at Austin
(Received3 March1995;accepted11 November1996)
Abstract—Hunrun
episodic memory provides a seemingly unlimited storage for everyday experiences, and a retrieval
system that allows us to access the experiences with partial activation of their components. The system is believed to
consist of afast, temporary storage in the hippocampus,and a slow, long-term storage within the neocortex. Thispaper
presents a neural network model of the hippocampal episodic memory inspired by Darnusio’s idea of Convergence
Zones. The model consists of a layer of perceptual feature maps and a binding layer. A perceptual feature pattern is
coarse coded in the binding layer, and stored on the weights between layers. A partial activation of the stored features
activates the bindingpattern, which in turn reactivates the entire storedpattem. For many configurationsof the model, a
theoretical lower boundfor the memory capacity can be derived, and it can bean order of magnitude or higher than the
number of all units in the model, and several orders of magnitude higher than the number of binding-layer units.
Computational simulationsfirther indicate that the average capacity is an order of magnitude larger than the theoretical lower bound, and making the connectivity between layers sparser causes an even further increase in capacity.
Simulations also show that ifntore descriptive bindingpatterns are used, the errors tend to be rnoreplausible (patterns
are confused with other similar patters), with a slight cost in capacity. The convergence-zone episodic memory therefore accounts for the immediate storage and associative retrieval capability and large capacity of the hippocampal
memory, and shows why the memory encoding areas can be much smaller than the perceptual maps, consist of rather
coarse computational units, and are only sparsely connected to the perceptual maps. 01997 Elsevier Science Ltd.
Keywords-Convergencezones,Episodicmemory,Associative
memory,Long-termmemory,Content-addressable
memory,Memorycapacity,Retrievalerrors,Hippocarnpus.
content-addressability.Most of the memories can be
retrieved simply by activatinga partial representation
of the experience,such as sound, a smell, or a visual
image.
Despite a vast amount of research, no clear understandinghas yet emerged on exactly where and how
the episodic memory traces are represented in the
brain. Severalrecent results,however, suggestthat the
system consists of two components:the hippocampus
servesas a fast, temporarystoragewhere the traces are
createdimmediatelyas the experiencescomein, and the
neocortex has the task of organizing and storing the
experiencesfor the lifetime of the individual(Alvarez
& Squire,1994;Halgren,1984;Marr, 1971;McClelland
et al., 1995;Milner,1989;Squire,1992).It seemsthatthe
traces are transferredfrom the hippocampusto the neocortex in a slow and tedious process, which may take
several days, or weeks, or even years. After that, the
hippocampusis no longer necessary for maintaining
thesetraces,andtheresourcescanbe reusedforencoding
new experiences.
1. INTRODUCTION
Human memory system can be divided into semantic
memoryof facts,rules,and generalknowledge,and episodic memory that records the individual’sday-to-day
experiences(Tulving, 1972, 1983).Episodic memory
is characterizedby extremeefficiencyandhighcapacity.
New memoriesare formedeveryfew seconds,andmany
of those persistfor years, even decades(Squire, 1987).
Another significantcharacteristicof human memory is
Acknowledgements: We thank Greg Plaxton for pointing us to
martingale analysis on this problem, and two anonymous reviewers
for references on hippocampal modeling, and for suggesting
experiments with sparse comectivity. This research was supported in
part by NSF grant no. IRI-9309273 and Texas Higher Education
Coordinating Board Grant no. ARP-444 to the second author. The
simulations were mn on the Cray Y-MP 8/864 at the University of
Texas Center for High-Performance Computing.
Requests of reprints should be sent to: Risto Miildmlainen, Department of Computer Sciences, The University of Texas at Austin, Austin,
TX 78712, USA; Tel: +1 512471 9571; Fsx: +1 512471 8885; e-mail:
risto@cs.utexaa.edu.
1017
1018
I
Althcmgh several artificialneural network modelsof
associative memoryhave been proposed(Ackleyet al.,
1985;/!m ari, 1977,1988;Anderson,1972;Andersonet
al., 191‘7; Cooper, 1973; Grossberg, 1983; GardnerMedwir1, 1976; Hinton & Anderson, 1981; Hopfield,
1982, 1984; Kairiss& Miranker, 1997;Kanerva, 1988;
Knapp & Anderson,1984;Kohonen,1971,1972,1977,
1989;K:ohonen& Miikisara,1986;Kortge, 1990;Little
& Shaw, 1975; McClelland & Rumelhart, 1986b;
Miikla.i.ainen, 1992; Steinbuch, 1961;Willshawet al.,
1969), the fast encoding,reliable associativeretrieval,
capacityof even the hippocampalcomponent
of hum memoryhas been difficultto accountfor. For
n the Hopfieldmodelof N units,N141nN patterns c:m be stored with a 99% probabilityof correct
retrieva1 when N is large (Amit, 1989; Hertz et al.,
1991;R:eeler, 1988;McElieceet al., 1986).This means
that storing and retrieving,for example, 106memories
wouldrequirein the orderof 108nodesand 1016connections, vIhich is unrealistic,given that the hippocamprd
formatim in higheranimalssuch as the rat is estimated
to have about 106primaryexcitatoryneuronswith 1010
connections (Amaral,Ishizuka,& Claibome,1990),and
the entire human brain is estimatedto have about 1011
neurons~and 1015synapses(Jessell,1991).
‘l%es~ earliermodelshad a uniform,abstractstructure
andwe]“enot specificallymotivatedby anyparticularpart
of thehumanmemorysystem.In thispaper,anew model
for asscwiativeepisodicmemoryis proposedthat makes
use of 1.hreeideas
“
about how the hippocampalmemory
mighttK put together.The model abstractsmost of the
low-level biologicalcircuitry,focusingon showingthat
with a biologicallymotivated overall architecture,an
episodi; memorymodel exhibitscapacityand behavior
very silnilar to that of the hippocampalmemorysystem.
The thr>ecentralideasare:(1)value-unitencodingin the
inputfeaturemaps,(2)sparse,randomencodingof traces
in the 1uppocampus,and (3) a convergence-zonestructure bel,weenthem.
Sine: the input to the memory consists of sensory
experierice,in the modelit shouldhave a representation
similarto the perceptualrepresentationsin thebrain.The
low-level sensory representationsare organized into
maps, that is, similar sensoryinputs are representedby
nearbylocationson the corticalsurface(Knudsenet rd.,
1987).:1 is possiblethatalsohigher-levelrepresentations
have a map-likestructure.This is hard to verify,but at
least tkere is plenty of supportfor value-unitencoding,
that is, that the neuronsrespondselectivelyto only certain tyIEs of inputs, such as particularfaces, or facial
expressions, or particularwords (Hassehnoet al., 1989;
Heit et al., 1989;Rolls, 1984).
The structureof the hippocampusis quitewellknown,
and ref:ently its dynamicsin memory processinghave
alSO b xm observed. Wilson & McNaughton (1993)
found %at rats encode locations in the maze through
ensemtdes of seemingly random, sparse activation
..—
M. h4011and R. J4iikkukainen
patterns in the hippocampalarea CA1. When the rat
exploresnew locations,new activationpatternsappear,
and when it returns to the earlier locations,the same
pattern is activatedas during the first visit. O’Reilly&
McClelland(1994)showedthat the hippocampalcircuitry is well-designedto form such sparse,diverseencodings, and that it can also perform pattern completion
duringrecall.
Damasio(1989a,b)proposeda generalframeworkfor
episodicrepresentations,based on observationsof typical patterns of injury-relateddeficits.The idea is that
there is no multi-modalcortical area that would build
an integrated and independent representation of an
experiencefrom its low-level sensory representations.
Instead,the representationtakes place only in the lowlevelcortices,with the differentpartsboundtogetherby
a hierarchyof convergencezones.An episodicrepresentation can be recreated by activatingits corresponding
bindingpatternin the convergencezone.
The convergence-zoneepisodic memory model is
looselybased on the above three ideas. It consistsof a
layerof perceptualmapsand a bindinglayer.An episodic
experienceappearsas a patternof localactivationsacross
the perceptualmaps,and is encodedas a sparse,random
patternin the bindinglayer.The connectionsbetweenthe
mapsand the bindinglayerstorethe encodingin a single
presentation,andthecompleteperceptualpatterncan later
be regeneratedfrompartialactivationof theinputlayer.
Many details of the low-level neural circuitry are
abstractedin the model. The units in the model correspond to functionalcolumns rather than neurons and
their activation levels are represented by integers.
Multi-stageconnectionsfrom the perceptual maps to
the hippocampusare modeledby direct binary connections that are bidirectional,and the connectionswithin
the hippocampusare not takenintoaccount.At this level
of abstraction,thebehaviorof the modelcan be analyzed
both theoretically and experimentally, and general
resultscan be derivedaboutits properties.
A theoreticalanalysisshows that: (1) with realisticsize mapsand bindinglayer,the capacityof the convergence-zonememory can be very high, higher than the
numberof units in the model,and can be severalorders
of magnitudehigher than the number of binding-layer
units;(2) the majorityof the neuralhardwareis required
in the perceptualprocessing;the bindinglayer needs to
be onlya fractionof the sizeof the perceptualmaps;and
(3) the computationalunits could be very coarse in the
hippocampusand in the perceptualmaps; the required
capacityis achievedwith a very small number of such
units. Computationalsimulationsof the model further
suggest that: (1) the average storage capacity may be
an order of magnitudehigherthan the theoreticallower
bound; (2) the capacity can be further increased by
reducingthe connectivitybetweenfeaturemaps and the
binding layer, with best results when the connectivity
matches the sparsenessof the binding representations;
Convergence-Zone Episdoci Memory: Analysis and Simulations
BindinsImer
————
Fe@reMsp 1 FeatureMSp2 FeetureMsp 3
FestureMsp4
FIGURE1. Storage.The weightson the connectionsbetweenthe
appropriatefeature units and tha binding represantstionof the
pattern sre set to 1.
and (3) if thebindingpatternsfor similarinputsare made
more similar, the errors that the model makes become
more plausible:the retrievedpatternsare similarto the
correct patterns. These results suggest how one-shot
storage, content-addressability,high capacity, and
robustnesscould all be achieved within the resources
of the hippocampalmemorysystem.
2. OUTLINE OF THE MODEL
The convergence-zonememory model consists of two
layers of real-valuedunits (the feature map layer and
the bindinglayer), and bidirectionalbinary connections
betweenthe layers(Figure1).Perceptualexperiencesare
representedas vectorsof featurevalues,suchas color=
red,shape = round,size
= small.
Thevrduesare
encodedas unitson the featuremaps.Thereis a separate
map for each featuredomain,and each unit on the map
represents a particular value for that feature. For
instance, on the map for the color feature, the value
red could be specifiedby turning on the unit in the
lower-rightquarter (Figure 1). The feature map units
are connected to the binding layer with bidirectional
binary connections(i.e. the weightis either Oor 1). An
activation of units in the feature map layer causes a
number of units to become active in the bindinglayer,
and viceversa.In effect,the bindinglayeractivationis a
compressed, distributed encoding of the perceptual
value-unitrepresentation.
Initially,allconnectionsare inactiveat O.A perceptual
experienceis stored in the memorythroughthe feature
map layer in three steps.First, thoseunits that represent
the appropriatefeaturevaluesare activatedat 1. Second,
a subsetof m bindingunits are randomlyselectedin the
bindinglayeras the compressedencodingfor thepattern,
and activatedat 1. Third, the weightsof all the correctionsbetweenthe activeunitsin the featuremapsandthe
activeunits in the bindinglayer are set to 1 (Figure 1).
Note that only one presentationis necessaryto store a
patternthis way.
1019
To retrievea pattern,firstall bindingunitsare set to O.
The pattern to be retrieved is partially specifiedin the
feature maps by activatinga subsetof its feature units.
For example,in Figure 2a the memoryis cued with the
two leftmostfeatures.The activationpropagatesto the
binding layer through all connectionsthat have been
turnedon so far. The setof bindingunitsthat a particular
featureunitturnson is calledthebindingconstellationof
thatunit.Allbindingunitsin the bindingencodingof the
patternto be retrievedare activeat 2 becausetheybelong
to the binding constellation of bothretrievalcue units.A
number of other units are also activated at 1, because
eachcueunittakespartin representingmultiplepatterns,
and therefore has several other active connectionsas
well. Only those units active at 2 are retained; units
with less activationare turnedoff (Figure2b).
The activationof the remainingbindingunits is then
propagatedback to the featuremaps(Figure2c).A number of unitsare activatedat variouslevelsin each feature
map, dependingon how well their bindingconstellation
matchesthe currentpatternin thebindinglayer.Chances
are that the unit that belongsto the same pattern as the
cues has the largest overlap and becomes most highly
activated. Only the most active unit in each feature
mapis retained,andas a result,a complete,unambiguous
perceptual pattern is retrieved from the system
(Figure2d).
If thereare n unitsin the bindinglayerand m unitsare
chosen as a representationfor a pattern,the numberof
possibledifferentbindingrepresentationsis equalto
()
n
m
If n is sufficientlylarge and m is relativelysmall compared to n, this number is extremelylarge, suggesting
that the convergence-zonememory could have a very
large capacity.
However,dueto theprobabilisticnatureof the storage
and retrievalprocesses,thereis alwaysa chancethat the
retrieval will fail. The binding constellationsof the
retrievalcue unitsmay overlapsignificantly,and several
spuriousunits may be turned on at the binding layer.
When the activationis propagatedback to the feature
maps, some randomunit in a feature map may have a
bindingconstellationthatmatchesthe spuriousunitsvery
well (Figure3). This rogue unit may receivemore activation than the correct unit, and a wrong feature value
may be retrieved.As more patternsare stored,the binding constellationsof the featureunitsbecomelarger,and
erroneousretrievalbecomesmore likely.
To determinethe capacity of the convergence-zone
memory,the chanceof retrievalerrormustbe computed.
Below, a probabilisticformulationof the model is first
given, and a lower bound for the retrieval error is
derived.
1020
ill hloll and R. Miikkulainen
(a)
(c)
EI
[
Fea(
Featwe Map4
BindiMLas’er
-.
————
FeatureMap 1 Featm Map2 Featurehiap3
BindingLayer
(d)
(b)
FeatureMap 4
l_=_-
■ ■
I_._-.k
r_’_’lr_l
■
r”_
lr—————
X-.L-L--I LK--_-lu
[
Feaf eMSP1 F--2
FeatureMap3
F-tare MSP4
Featurehlap1 FeetnreMap2 FeatureMap3
FeatureMSp4
FIGURI L Ratrlaval. A stored pattern la ratriavad by presenting a partial rapraaantationaa a cue. The eke of the square indicates
activat I level of the unit. (a) Ratrlavalcuee activatea bhtdlngpattam. (b) Laaa active blndlng unite are tumad M. (c) Blndlng pattern
Sctivat feature units. (d) Lass active faatum unlta are turned off.
.
Let
PROBABILISTICFORMULATION
Zi >thesizeof thebindingconstellationof a feature
unitd
increa
m.;
;e ne
randol
r i patternshavebeen storedon it and let Yibe its
after storingthe ith patternon it. Obviously,Y1
obtainthe distributionof Yiwhen i >1, notethat
activeconnectionsbelongto the intersectionof a
y chosen subset of m comections among all n
comections of the unit, and its remaininginactiveconnections(a set with n – zi–l elements,wherezi-l is the
binding constellationat the previous step). Therefore,
Yi$i >1 is hypergeometricallydistributed(Appendix
Al) with parametersm, n – zi-l, and n:
P(Yi
=ylZi -1 =
Zi-
1)=
(n-H(::Y)/()
(1)
BhdtngLayer
The constellationsizeZi is then givenby
z
(2)
k= 1
k!lm
Let Zbe the numberof patternsstoredon a particular
feature unit after p random feature patterns have been
storedin the entire memory.Zis binomiallydistributed
(AppendixAl) withparametersp and 1~ where~is the
numberof units in a featuremap:
[
Feat ! MeII1
FeatureMap2
FeatureMaP3
FeahII’chhP4
FIGURI L Erroneousratrleval.A rogua feature unit Is retrieved,
# the correct one, whan its blndlng conatallatlon hae
moreu b in common with the lnteraactlon oi the ratrlaval cue
tlonsthan the binding constallatlonof the correct unit.
Let Z be the bindingconstellationof a particularfeatureunitafterp patternshavebeen storedin the memory.
It can be shown(AppendixA.2) that
Convergence-Zone Episdoci Memory: Analysis and Simulations
1021
The size of the totalbindingconstellationactivatedduring retrievalis obtainedby takingthis intersectionover
the bindingconstellationsof all c retrievalcues.
The numberof units in commonbetween a potential
rogueunitandthe c retrievalcuesis denotedby RC+l and
is also hypergeometricallydistributed, however with
parametersz, XC,and n, becausewe cannotassumethat
the rogue unit has at least m units in commonwith the
cues:
“z)=n(’-(’-ap)ad
‘4)
‘Ur(z)=n(%)p(+ap)
(
1–
+n(lz–1)
m(2n – m – 1) p
(5)
1)~ ) “
Initially,when no patternsare stored,the bindingconstellationis zero and it will convergeto n as more patternsare stored(sinceO< 1– # < 1).Becausethebases
of the exponential in the varianceof Z are smallerthan
1, the variancewill go to zero when p goes to infinity.
Therefore, in the limit the binding constellationwill
coverthe entirebindinglayer with probability1.
The bindingconstellationof a featureunit, giventhat
at leastonepatternhasbeen storedon it, is denotedby ~.
This variable representsthe binding constellationof a
retrieval cue, which necessarilymust have at least one
pattern storedon it (assumingthat the retrievalcues are
valid).Theexpectedvalueand varianceof~ followfrom
(4) and (5):
n(n –
P(RC+ ~= rlZ = Z,XC = xc) =
(’3(:3/(3
(9)
The correctunit in a retrievalmap (i.e. in a featuremap
where a retrieval cue was not presented and where a
featurevalueneedsto be retrieved)will receivean activationX.+l, becauseit also has at least m units in common with the retrieval cues. The correct unit will be
retrieved if XC+1> Rc+ 1. Now, XC+l and RC+l differ
only in the last intersectionstep, where XC+]depends
on ~ and XC,and Rc+l depends on Z and XC.Since
l?(~) > E(Z)((4) and (6)), E(XC+1)> E(R~+1),~d the
correctunit will be retieved most of the time, although
this advantage gradually decreases as more patterns
are stored in the memory. In each feature map there
are (~ – 1) potential rogue units, so the conditional
probability of successful retrieval is (1 – P(RC+l>
xc+~Ixc+
1,
Z,xc))(f
- 1)>not addressing tie-breaking.
Unfortunately,it is very difficult to compute p~wce~~,
the unconditionalprobability of successful retrieval,
because the distributionfunctions of Z, XC,XC+land
RC+l are not known. However,it is possibleto derive
bounds for p,ua$, and show that with reasonable
valuesfor n, m, f, and p, the memoryis reliable.
‘(z)=m+(n-m)(+a’)md
“)
Var(2)
‘@-mH’-l(l-@-m(l-ap-’)
+(n–m)(n–m–1)
(
1–
m(2n – m –
n(n –
1) ‘–1
1)~ )
“
(7)
Note that the expectedvalue of ~ is alwayslarger than
that of Z. Initiallythe differenceis exactlym, and it goes
to zero asp goes to infinity(because2 also convergesto
n).
Leti?!
be the bindingconstellationof thejth retrieval
cueandletXjbe thenumberof unitsin theintersectionof
the tirstj retrievalcues.Clearly,Xl =21 = m. To getXj
forj >1, we removefrom considerationthem units all
retrievalcuesnecessarilyhavein common(becausethey
belongto the same storedpattern),and randomlyselect
2j – m units from the remainingset of n – m units and
see how manyof thembelongto the currentintersection
of xj–l – m units. This is a hypergeometricdistribution
with parametersi] —m, xj–1– m, and n — m:
~(xj=Xjlzj=~j,Xj–1 ‘Xj
-
1)
4. LOWER BOUNDFOR MEMORYCAPACITY
Memorycapacitycan be definedas the maximumnumber of patternsthat can be storedin the memoryso that
the probabilityof correctretrievalwith a given number
of retrievalcues is greaterthan a (a constantclose to 1).
In thissection,a lowerboundforthechanceof successful
retrievalwill be detived.The analysisconsistsof three
steps:(1) boundsfor the numberof patternsstoredon a
featureunit;(2)boundsforthebindingconstellationsize;
and (3)boundsfor the intersectionsof bindingconstellations.Givenparticularvaluesfor the systemparameters,
and ignoring dependenciesamong constellations,it is
then possibleto derive a lower bound for the capacity
of the model.
4.1. Numberof PatternsStoredon a FeatureUnit
SinceZis binomiallydistributed(withparametersp and
1022
M. h4011and R. iUiikkulainen
~moffbounds(AppendixAl) can be applied:
I/f), (
PI
(
unit, which is equivalentto selecting ki units from the
bindinglayer at random.Let ~E be the expectedsize of
the finalbindingconstellation,estimatedafter v binding
unitshave been selected.Then
@=z’,+@-z’+(l-Y~
(lo)
=n
PI
(
uationsgive the probabilitythat Zis more than
i32foff its mean. The parameters 61 and 62
deterr e the tradeoff between the tightness of the
bound and the probability of satisfying them. If
bound re desiredwith a given probabilityf?,the right
hand ss of (10) and (11) are made equal to /3 and
Solve( m i$l and 62. The lower and upper bound for
the nu m of patternsstoredon a featureunit then are
These
dl~ a
(1– ~1)~ if a solutionfor 61exists
il
(12)
Ootherwise
{
(1 + ~z)~.
Given
ture U
I“f a solutionfor til exists
1+(1 -61)P;
j,=
1 otherwise
(14)
i = 1+(1+ 62)P4.
f
(15)
,
(18)
= J!@lz’,- ~= Z’v- ~)
‘n-(n-z’,-d+)
)(+’-’
n –(n–Z’v–1)
(13)
it at least one patternhas been storedon a feathe boundsbecome
()
ki-v
(n –Z’v) 1– ;
whereZ’Vis the size of the bindingconstellationformed
by the first v selected units. ObviouslyZ’v is equal to
Z’,-l or exactl one larger, and the expected increase
of Z’v is 1– J~.
Since ~E–1 depends stochastically
only on Z’v–1,the expectedvalueof ~E, given~E–1,is
=
iu =
–
()
~ ki-v+ 1
1– ;
(19)
=.$_,.
Therefore,E(AEIZ$E.~)=zE- ~and the sequenceof variables GE,..., Z: is a martingale (see Appendix A.3).
Moreover, it can be shown (Appendix A.4) that
!~E–~B-ll s 1, so that bounds for the final binding
constellationZ can be obtainedfrom Azuma’sinequalities. For the lower bound, the martingale2$E,...,Z~,
(with length kil) is used, and for the upper bound,
(18) and notingthat
Z9 ...7Z~U(with length kiu). Using
Z=Z~, for the lower bound and Z=Z~ti for the upper
bound,Azuma’sinequalitiescan be writtenas
4.2.
Si of the BindingConstellation
*
Instead of choosing exactly m different binding units
for the indingconstellationof a featuremap unit, considerth processof randomlyselectingknot-necessarilydistinctunits in such a way that the expectednumberof
differe t unitsis m. This will makethe analysiseasierat
the cos of larger variance,but the boundsderivedwill
alsobe validfor the actualprocess.To determinek, note
that the numberof unitsthatdo not belongto thebinding
represe tation is equalton – m on average
‘P(z+(+)’’)-’fi)
(20)
t
n
d
()
~k
l—–
n
= n —m.
Solvin fork, we get
~= inn – ln(n– m)
inn –ln(n–1)”
$
(16)
(17)
Note th t k is almostequaltom for large n.
Let assumei patternsare storedon the featuremap
Se
-h’n,A>0.
(21)
After derivinga value for A based on the desired confidencelevel& thefollowinglowerandupperboundsfor
Z are obtained:
“=n(+r)-
’22)
Convergence-Zone Episdoci Memory: Analysis and Simulations
1023
After derivinga value for X based on the desired confidence,the followingupperboundfor Xj is obtained:
‘u=n(l-(1-3ki”)+A@
’23)
The correspondingboundsfor ~ are
~u=n(l-fl-:~k’”l+~lfi.
[
[
n)
Xj,~= m +
(25)
v ‘“
)
The process of forming the intersectionof c binding
constellationsincrementallyone cue at a time can also
be formulatedas a martingaleprocess.To see how,considerthe processof formingan intersectionof two subsets of a common supersetincrementally,by checking
(one at a time) whether each element of the first set
occurs in the second set. Assumethat v elementshave
been checked this way. Let X’, denote the number of
elementsfound to be in the intersectionso far, and X:
the currentlyexpectednumber of elementsin the final
intersection.Then
+
v
(nl–v)(n2–X’V)
v
n$—v
(26)
‘
m)(~u– m)
(n – m)
+A/Xj-~,~
–m. (31)
This bound is computedrecursively,with xl,. = m.
Comparingwith the probabilisticformulationof Section
3, note that
(xi- ~.U– m)(.Zu– m)
is the expectedvalueof the hypergeometricdistribution
derived for Xj (8) when ~j and Xj-l are at their upper
bounds.
As the last step in the analysisof bindingconstellations, the boundsfor XC+land R.+l
must be computed.
When X. is at its upper bound, the intersectionis the
largest,and a potentialrogueunit has the largestchance
of taking over. In this case, a lower bound for XC+lis
obtainedby carrying the intersectionprocess one step
further,and applyingAzuma’sinequality:
p(
x
c+
~ ~ m+ (xc,u–m)(?j –m)
– AJ~rn
(n – m)
1,U — m
)
(32)
whichresultsin
x +1~=m+
wherenl, nz and n, are the sizesof the first,second,and
the superset.As shown in AppendixA.5, the sequence
x;, ..., X:, is a martingale.In addition,if nl + nz – 1<
n,, IX:–X~-ll = 1, and Azuma’s inequalitiescan be
applied.
The aboveprocessappliesto formingthe intersection
of bindingconstellationsof retrievalcueswhenthe intersectionin the previousstep is chosenas the firstset, the
bindingconstellationof thejth cue as the secondset,the
bindinglayeras the commonsuperset,andthem unitsall
retrieval cues have in common are excluded from the
intersection.In this case
nl ‘Xj–
I,. –
(n – m)
4.3. Intersection of Binding Constellations
XE=X,
(Xj-
c,
(xc,u–m)(.Z1–m)
–
(n – m)
A&=7il.
(33)
If the resultinglowerboundis smallerthan m, m can be
used instead.Similarly,to get the upperboundfor R.+l,
one more intersectionstep needs to be carried out, but
this time the m units are not excluded:
p(
R.+ 1 z
‘-+
n
A&
Js
and the upperboundbecomes
xc,u Zu
‘C+1,U=
~
—+
e-
AK.
A2’2,
A > (),
(34)
(35)
(27)
4.4. DependenciesBetweenBindingConstellations
nz = Zu—m
(28)
n~= n —m,
(29)
wherexj–l,Uis an upperboundforXj–l.Azuma’sinequality can be appliedifxj–l, U+ 2U– 1< n (whichneedsto
be checked).Using(27)–(29)in (26)and notingthatX~,
= Xj – m, Azuma’sinequalitybecomes
P(x:’ 2X: – A@)=
(
P Xj km+
—
< e–A2n,
_
(Xj 1,. – m)(%–‘)
(n – m)
A > ().
+
~~xj-,
~ –
m
)
(30)
Strictlyspeaking,the aboveanalysisis valid only when
thebindingconstellationsof each cue are independent.If
the same partial pattern is stored many times, the constellationswill overlap beyond the m units that they
necessarily have in common. Such overlap tends to
increasethe size of the finalintersection.
In mostcasesof realisticsize,however,the increaseis
negligible.Thenumberof featuresVin commonbetween
two randompatternsof c featureseach is given by the
binomialdistribution:
M. A4011and R. Miikkulainen
1024
$
~DÊ‹
‘l-(l-;)C-C(;)(l-;)C-’>
’37)
The ch ce that two randompatternsof c featureshave
more an one featurein commonis
P(v > )=1 –P(V=O)– ZJ(V=1)
F
which~an be rewrittenas
ll(v>l)=l–
F
( a(+)’
1+
’38)
Thisch ceisnegligible forsufficientlylargevaluesoff
For ex pie, already when ~ = 5000 and c = 3, the
chance is 1.2 X 10–7,and can be safely ignoredwhen
compu$nga lower boundfor the capacity.
5.1. A Coarse-GrainedModel
4.5.0 taining the LowerBound
It is n possibleto use (10)-(15), (17) and (20)-(25)
and (3 )–(35)to derivea lowerboundfor the probability
of suc essful retrievalwith given systemparametersn,
~
m, f, t, c, and p, where t is the total numberof feature
maps. e retrieval is successful if rC+l,U,the upper
bound for R.+l,
is lower than x~+l,~,the lower bound
for XC.
~l. Under this constraint, the probability that
none o:: the variablesin the analysisexceedsits bounds
is a lovverboundfor successfulretrieval.
Obtaining the upper bound for Xc involvesbounding
‘c – 1 variables:Zand ~ for the c cues and Xc for the
c—l intersections. Computing XC+l,land r=+l,~each
involvet bounding3 variables(Z,Z, and Xc+l;Z,~, and
R.+1).‘~ ere are t – c maps, each with one xC+l,lbound
and f -- 1 different r.+l,. bounds (one for each rogue
unit). :%e total number of bounds is therefore 3C– 1
+ ‘f( t – c). Setting the righthand sides of the
inequa..ities (10), (11), (20), (21), (30), (32) and (34)
equaltto a smallconstant/3,a lowerboundfor successful
retriewJ is obtained:
Psuccess > U – 6)3’-1‘3f(’-c)7
which,lforsmall13,can be approximatedby
I
p,ucce,,>1 – (3’–1+‘f(t–‘))/3.
thereshouldbe. Thenumbersof neuronsandconnections
in the rat hippocampalformationhave been used as a
guidelinebelow. Althoughthe human hippocampusis
certainly larger than that of the rat, the hippocampus,
being phylogeneticallyone of the oldest areas of the
brain,is fairlysimilaracrosshighermammalsand should
give an indicationof the ordersof magnitudeinvolved.
More importantly,the convergence-zonemodel can be
shownto applyto a widerangeof theseparameters.Two
cases at opposite ends of the spectrum are analyzed
below: one where the number of computationalunits
and connectionsis assumedto be limited, and another
that is based on a large number of effectiveunits and
connections.
(39)
(40)
On b e other hand, if it is necessaryto determinea
lower und for the capacity of a model with given n,
/3is first
m,f, t, and c at a givenconfidencelevelp,-,
obtain d from (40), and the numberof patternsp is then
until one of the bounds(10), (11), (20), (21),
incre
(30), ( 2) or (34) is exceeded,or rc+l,ubecomesgreater
than 1x +1,1.
5. EX+MPLE:MODELINGTHE HIPPOCAMPAL
MEMORYSYSTEM
As an Iexample,let us apply the above analysisto the
hippodampalmemory system.It is difficultto estimate
how camrsethe representationsare in sucha system,and
howmanyeffectivecomputationalunitsand connections
First note that each unit in the modelis meantto correspondto a verticalcolumnin the cortex.It is reasonable
to assume feature maps with 1Os of such columns
(Sejnowski& Churchland,1989).Each input activates
a local area on the map, includingperhaps 102columns
abovethethreshold.Therefore,thefeaturemapscouldbe
approximatedwith 104computationalunits.Therewould
be a minimumof perhaps4 suchmaps,of which3 could
be used to cue the memory.
There are some 1Osprimaryexcitatorycellsin the rat
hippocampalformation(Arnaralet al., 1990;Bosset al.,
1985,1987;Squireet al., 1989).If we assumethat functionalunits contain 102of them, then the model should
have 104bindingunits.Only about0.5–2.5%of the hippocampal neurons are simultaneouslyhighly active
(0’Reilly & McClelland,1994),so a bindingpatternof
102units wouldbe appropriate.Assurning that all computationalunits in the featuremaps are connectedto all
unitsin thehippocampus,thereare a totalof 108afferent
connectionsto the hippocampus,and the numberof such
connectionsper verticalcolumnin the featuremapsand
per excitatoryneuronin the hippocampusis 102,both of
which are small but possible numbers (Arnarrdet al.,
1990).
If we selectf= 17000,n = 11500,m = 150,and store
1.5 x 104patternsin the memory,2*and Xj-l,u are less
thm (.@, the chance of partial overlapof more than 1
featureis lessthan 1.04X 10-8,andthe analysisaboveis
valid. Settingd = 1.96 X 10-7 yieldsbounds rc+l,u<
Xc+l,lWithp~-~~ > 99%.In other words, 1.5 X 104
tracescan be storedin the memorywith 99%probability
of successfulretrieval.Sucha capacityis approximately
equivalentof storing one new memory every 15s for
4 days, 16h a day, which is similarto what is required
from the humanhippocampalsystem.
5.2. A Fine-GrainedModel
It is possiblethat a lot moreneuronsand connectionsare
involved in the hippocampal memory system than
Convergence-Zone Episdoci Memory: Analysis and Simulations
assumedabove.For example,let us assumethat each of
the verticalcolumnsin the featuremapsis computationally distinctive,that is, there are 106units in the feature
maps.Let us furtherassumethat the systemhas 15 feature maps, 10 of whichare used to cue the memory,and
the bindinglayerconsistsof 105units,with 150used for
each binding pattern. Assuming full connectivity
between the feature units and the binding units, there
are 1.5 X 1012connectionsin the system,which might
be possibleif it is assumedthat a large numberof collateralsexist on the inputsto the hippocampus.
Applyingthe above analysisto this memory configuration, 0.85 X 108patterns can be stored with 9970
probabilityof successfulretrieval. In this case, :U and
xj–l,vare less than ($n, the chanceof partialoverlapof
morethan 1featureis lessthan0.45 X 10–10,and setting
8 = 0.5X 10-’ yieldsboundsrc+l,u<x.+,,, withp,uu.,,
> 99%.In otherwords,a newtracecouldbe storedevery
15s for 62 years, 16h a day, without much memory
interference.
Thiskindof capacityis probablyenoughfor the entire
human lifetime, and exceeds the requirementsfor the
hippocampal formation. With such a capacity, there
wouldbe no need to transferrepresentationsto the neocorticalmemory system.One conclusionfrom this analysisis thatthe hippocampalformationis likelyto havea
morecoarse-grainedthan fine-grainedstructure.Another
conclusionis that it is possiblethatthe neocorticalmemory componentmay alsobe basedon convergencezones.
Theresultis interestingalsofromthe theoreticalpointof
view,becausethe lowerboundis an orderof magnitude
higherthan the numberof units in the system,and three
orders of magnitudehigher than the numberof binding
units. To our knowledge,this lower bound is already
higherthan whatis practicallypossiblewithotherneural
networkmodelsof associativememoryto date.
6. EXPERIMENTALAVERAGECAPACITY
The analysisabovegivesus a lowerboundfor the capacity of the convergence-zonememory;the average-case
capacitymay be much higher.Althoughit is difficultto
derivethe averagecapacitytheoretically,an estimatecan
be obtainedthroughcomputersimulation.Not all configurations of the model can be simulated,though. The
model has to be small enough to fit in the available
memory,while at the same time fulfillingthe assumptionsof the analysisso thatlowerboundscan be obtained
for the same model.
To findsuchconfigurations,firstthe featuremap parameters~,t, and c, and the confidencelevelP are fixedto
valuessuchthat,....,, = 99% (40).Second,a valuefor
n is chosen so that the model will fit in the available
memory.The connectionstake up most of the memory
space(evenif each connectionis representedby onebit)
and the amountof memoryallocatedfor the featuremap
and the bindinglayer activations,the array of patterns,
1025
and the simulationprogramitself is negligible.Finally,
the sizeof the bindingpatternm and the maximumnumber of pattemsp is foundsuchthatthe theoreticalbounds
yieldrC+l,U< XC+l,I
and the partialoverlapis negligible.
In the models studied so far, the highest capacity has
been obtainedwhen m is only a few percentof the size
of the bindinglayer, as in the hippocampus.
The simulationprogramis straightforward.The activationsin each map are representedas arraysof integers.
The connectionsbetweena featuremap and the binding
layerare encodedas a two-dimensionalarrayof bits,one
bit for each connection.Before the simulation,a set of
p- randompatternsare generatedas a two-dimensional
array of pw X t integers.A simulationconsistsof storingthepatternsoneat a timeandperiodicallytestinghow
many of a randomly-selectedsubsetof them can be correctly retrievedwith a partialcue.
The ‘fine-grained’exampleof Section5.2 is unfortunatelytoo largeto simulate.With 15 X 105X 106= 1.5
X 1012 connections h would require 187.5 GB of
memory,which is not possiblewith the currentcomputers. However, the ‘coarse-grained’model has 4 X
17000 X 11500 = 7.82 X 108 one-bit connections,
which amounts to approximately 100MB, and easily
fitsin availablememory.
Several configurationswere simulated,and they all
gave qualitatively similar results (Figure 4). In the
coarse-grained model, practically no retrieval errors
were produceduntil 370000 patternshad been stored.
With 375000 patterns, 99% were correctly retrieved,
and after that the performance degraded quickly to
94% with 400000 patterns, 71% with 460000, and
23% with 550000 (Figure4). Each run took about two
hoursof CPU time on a Cray Y-MP 8/864.From these
simulations,and those with other configurationsshown
in Figure4, it can be concludedthatthe averagecapacity
of the convergence-zoneepisodicmodelmaybe as much
as one order of magnitudelarger than the theoretical
lowerbound.
7. LESSIS MORE:THE EFFECTOF SPARSE
CONNECTIVITY
In the convergence-zonemodel so far, the featuremaps
have been fully connected to the binding layer. Such
uniformconfigurationmakes analysisand simulationof
themodeleasier,butit is notveryrerdistic.Bothfromthe
pointof viewof modelingthe hippocampalmemoryand
building artificialmemory systems, it is important to
know how the capacityis affectedwhen the two layers
are only sparselyconnected.
A seriesof simulationswererun to determinehow the
model would perform with decreasingconnectivity.A
given percentageof connectionswere chosenrandomly
and disabled,and the 99$10
capacitypointof the resulting
model was found. The binding patterns were chosen
slightlydifferentlyto accountfor missingconnections:
1026
M. Moll and R. Miikkulainen
1.0
I
I
I
a
O.e
i! 0.6
-in
I
%
0.4
f ~
*1 7000
n-l 1500
m==150
C.15000
0.2
0.0
m-l 25
c.7omo
f.4ooo
n+OOOO
m=l50
C-.35OO
a
r
-mi=l
I
2
1
I
3
stored
pilttCITMl
(%
4
56
1000OO)
7
8
9. 0
L
FtGURE . Experimentalavaragacapacity.The horizontalaxis ehowstha numberof pattamaetorad ins logarithmicacaleof hundredsof
thouaan e. The vertical axle indicates the pamanWa of correctly retrieved pattame out of a randomly-selectedeubaat of 500 stored
-s
(ad-t
~~e
SOH
mh ~fns). ~ch m*l COIWi*~ of four~m
maps, endduringretrieval,thafourth featurewaa
ratrfav uaingthafiratthraa aacuaa. The modsisdifferedin theelzaa ofthafeaturarnepa f, bindingIsyereiza n,andblndlngpatternsize
= 99%wes35000, that of
ntlllap
indicatesvaragaeoverthraa simulations.Thathaoratlcal capacity Cofthafirat modalwith p—
the sac d 15000, and that of the third 70000.
the m inding units were chosen among those binding
layer “ts that were connected to all features in the
featurepattern.If there were less than m such units,the
bindin patternconsistedof all availableunits.
Due o high computationalcost of the simulations,a
small onvergence-zonememory with ~ = 1000, n =
3000,? = 20, t = 4, and c = 3 was used in these
experiments.At each level of comectivity from 100’ZO
down to 20%, five different simulationswith different
randomcomectivitywere run and the resultswere averaged (Figure5). Sincethe main effectof sparseconnectivity is to limit the number of binding units that are
availablefor the bindingpattern,one would expectthat
the capacitywouldgo down.However,just the opposite
1000O
L__—_LJ
100
L
60
40
60
connectivity rate (%)
20
0
FIGURE 5. Capacity with sparse connectivity. The horizontal axis showe the percentageof connection that were available to form
theretrfevai pettams
s,andthavarticai
axielndicatea
thacapeoityattha
W%co*level.Aaconmctlvftydecraaaaa,
Mnding
becomemoreratiabia
untii about 30% connactivlty,where there are no longer enough conbacomdrnora fooUaed, and the mtriavai
n = 3000,m = 20,t= 4, c = 3, andthe plotie an averageof five
nation* to form binding pattarneof m unite. The model had f= 1000,
Simufationa.
1027
Convergence-Zone Episdoci Memory: Analysis and Simulations
turnsout to be true:the fewerconnectionsthe modelhad
available(downto 30% connectivity),the morepatterns
couldbe storedwith 99~oprobabilityof correctretrieval.
Theintuitionturnsoutto be incorrectfor an interesting
reason:sparseconnectivitycausesthe bindingconstellations to becomemore focused,removingspuriousoverlap that is the main causeof retrievalerrors.To see this,
considerhow~ (thesizeof the bindingconstellationof a
featureunit after at least one patternhas been storedon
it) growswhen a new patternis storedon the unit. A set
of bindingunitsis chosento representthe pattern.When
only a fractionr of the connectionsto the bindinglayer
are available,there are only r~nbindingunits to choose
from, comparedto n in the fully connectedmodel.It is
therefore more likely that a chosen binding unit is
already part of the binding constellationof the feature
unit, and this constellationthereforegrows at a slower
pace than in the fully connectedcase. The expectedsize
of the constellationafter p patternshave been storedin
the memorybecomes(from (6))
To testwhetherthe convergence-zonearchitecturecan
modelsuch “human-like”memorybehavior,the storage
mechanismmustbe changedso thatit takesinto account
similaritiesbetween stored patterns. So far the spatial
arrangement of the units has been irrelevant in the
model;let us nowtreat the featuremapsand the binding
layer as one-dimensionalmaps (Figure6). The binding
layer is dividedinto sections,one for each featuremap.
The bindingunits are selectedstochasticallyfrom a distribution function that consists of one componentfor
each of the sections.Each componentis a flatrectangle
with a radius of u units aroundthe unit whose location
correspondsto thelocationof the activefeaturemapunit.
The distributionfunctionis scaled so that on averagem
units will be chosenfor the entirebindingpattern.
The radius u of the distributioncomponentsdeterminesthe varianceof the resultingbindingpatterns.By
settingu a (n/t – 1)/2a uniformdistributionis obtained,
which gives us the basic convergence-zonemodel. By
makingu smaller,the bindingrepresentationsfor similar
inputpatternsbecomemore similar.
Severalsimulationswithvaryingdegreesof a wererun
to see how the error behavior and the capacity of the
modelwouldbe affected(Figure7). A configurationof
4 feature maps (3 cues) with 20000 bindingunits, 400
feature map units, and a 20-rmitbinding pattern was
used. The results show that when the binding patterns
are mademoredescriptive,the errorsbecomemoreplausible:whenan incorrectfeatureis retrieved,it tendsto be
close to the correct one. When the binding units are
selectedcompletelyat random (u = (rzh – 1)/2), the
averagedistanceis 5000;when u = 20, it dropsto 2500;
and whena = 5, to 700.Such ‘‘human-like”behavioris
obtainedwitha slightcostin memorycapacity.Whenthe
patternsbecomelessrandom,thereis moreoverlapin the
encodings, and the capacity tends to decrease. This
effect, however,appearsto be rather minor, at least in
the smallmodelssimulatedin our experiments.
‘(z)=m+(r-+%)p)
“’)
which decreaseswith comectivity r as long as there are
at leastmbindingunitsavailable(i.e. rk > m). Whenthe
binding constellations are small, their intersections
beyondthe m commonunitswillbe smallas well.During
retrieval it is then less likely that a rogue unit will
become more active than the correct unit. The activity
patternsin the sparselyconnectedsystemare thus better
focused,and retrievalmore reliable.
When there are fewer than m bindingunits available,
the capacity decreases very quickly. In the model of
Figure5 (with m = 20), the averagenumberof binding
unitsavailableis 45 for 3590connectivity,24 for3090,12
for 25%,and5 for 20%.In otherwords,theconvergenzezone episodic memory performs best when it is connected just enough to support activationof the sparse
binding patterns. This is an interestingand surprising
result, indicatingthat sparse connectivityis not just a
necessitydue to limitedresources,but also givesa computationaladvantage.In the contextof the hippocampal
memorysystemit makessensesinceevolutionwouldbe
likely to producea memoryarchitecturethat makesthe
best use of the availableresources.
Binding Layer and DistributionFunction
1
I [1-
When the bindingpatternsare selectedat randomas in
the model outlined,when errors occur duringretrieval,
the resulting feature values are also random. Human
memory,however,rarely producessuchrandomoutput.
Human memoryperformanceis often approximate,but
robust and plausible.That is, when a feature cannotbe
retrievedexactly,a valueis generatedthatis closeto the
correctvalue.
m
I
I I I I I I
1
ul—
I-III-111
uuLmnAu
Feature Map 1
8. ERRORBEHAVIOR
r J
Feature Map 2
Feature Map 3
Feature Map 4
FIGURE6. Selecting a binding rsprsssntstionfor an input psttern. The binding layer Is dlvidsd into eeetlone. On average, m
blndlng unite wIII be selected from a dlatrlbution function that
conaista of one rectangularcomponent for each section, csntersdaround the unitwhose locationeorrsapondatoths location
of the input feature unit. If the center is close to the boundaryof
the esetlon, the distribution componentwraps around the ssctlon (aa for Feature Map 1). Ths parameter u determines the
radiua of the components,and therebythe variance in the binding pattern.Thie particularfunction had u = 2, that la, the width
of the individual components was 2 x 2 + 1 = 5. Ths model
parameterswere f= 10, n = 40, m = 6.
1028
M. Moll and R. Miikkulainen
6000
70000
5000
60000
50000
4000
2000
20000
1000
n
50
1000O
40
30
20
10
0
0
a
FIGURE~. Errorbehaviorwith deecriptlvebhflng patterns.lhe Iowarcuwe(wfth ecefeat left)ehowsthesveraga distanceof incorrectfyretrievaqfeaturasss sfunction of the distributionradiusu. The uppercurve(with scale st right) indlceteethe correepondlngcapacityst
the 99% $onfidencelevel.When the bindingpatternis iaea random,the erroneousunitetend to be cloaarto the correctonce. The model
hsd n= q0,000,f= 400,m= 20, t= 4, c= 3. Retrievalof all storedpatternswas checked.The plote indlceteaveregaaoversix simulations.
9. DISCUSSION
The co~vergence-zoneepisodicmodel as analyzedand
simula d aboveassumesthat the featurepatternsdo not
overlapmuch,and thatthe patternis retrievedin a single
iteratio . Possiblerelaxation of these assumptionsand
the effe
1 ts of such modificationsare discussedbelow.
9.1. PafternOverlap
The th~oreticallower-boundcalculationsassumed that
the ch ce of overlap of more than one feature is very
small,$ d this was also true in the models that were
analyzed and simulated.However,this restrictiondoes
not limit the applicabilityof the model as much as it
mightfirstseem,for two reasons:
First,it mightappearthat certainfeaturevaluesoccur
more often than others in the real world, causingmore
overlap~than there currentlyis in the model. However,
notethdtthe inputto the modelis representedon feature
maps.One of the basicpropertiesof both computational
and biologicalmaps is that they adapt to the input distribution by magnifyingthe dense areas of the input
space. other words, if some perceptualexperienceis
morefrt uent,moreunitswillbe allocatedforrepresenting it sp that each unit gets to respondequallyof?ento
inputs Kohonen, 1989;Merzenichet al., 1984;Ritter,
1991). erefore,overlapin the featuremap representations is,
i significantlymore rare than it may be in the
absolu~ experience:the minordifferencesare magnified
andtheImpresentations
beeomemoredistinguishableand
more ~emorable.
Second, as discussed in Section 4.4, the chance of
overlapof more than one feature is clearly small if the
feature values are independent. For example in the
coarse-grainedmodel, at the 9990 capacity point, on
averagethere were 88 otherpatternsthat sharedexactly
one commonfeaturewith a givenpattern,whereasthere
were only 0.0078 other patternsthat shared more than
one feature. To be sure, in the real world the feature
values across maps are correlated,which would make
overlap of more than one featmremore likely than it
currentlyis in the model. While it is hard to estimate
how common such correlationswould be, they could
grow quite a bit before they become significant.In
other words, the conclusionsdrawn from the current
model are valid for at least small amounts of such
correlations.
9.2. ProgressiveRecall
The retrievalprocess adoptedin the convergence-zone
model is a version of simple recall (Gardner-Medwin,
1976), where the pattern is retrieved based on only
direct associationsfrom the retrieval cues. In contrast,
progressive recall is an iterative process that uses the
retrievedpattern at each step as the new retrieval cue.
Progressiverecall could be implementedin the convergence-zonemodel.Supposefeaturesneedto be retrieved
in severalmaps.Afterthe firstretrievalattempt,the right
featureunit will be clearlyidentifiedin most maps. For
the secondretrievaliteration,all theseunits can be used
as cues,andit is likelythata patternwillbe retrievedthat
is closerto the correctpatternthanthe one obtainedwith
just simple recall. This way, progressiverecall would
cause an increase in the capacity of the model. Also,
1029
Convergence-Zbne Episdoci Memory: Analysis and Simulations
such a retrievalwould probablybe more robustagainst
invalid retrieval cues (i.e. cues that are not part of the
patternto be retrieved).The dynamicsof the progressive
recall process are difficult to analyze (see Gibson &
Robinson(1992)for a possibleapproach)and expensive
to simulate,and simplerecall was thus used in this first
implementationof the convergence-zonemodel.
Above, a theoreticallower bound for the capacityof
simplerecall withina givenerrortolerancewas derived,
and the averagecapacitywas estimatedexperimentally.
Two other types of capacitycan also be definedfor an
associativememorymodel(Amari, 1988).The absolute
capaci~ refers to the maximumnumberof patternsthat
the networkcan representas equilibriumstates,and the
relative capacity is themaximumnumberof patternsthat
can be retrievedby progressiverecall.The lowerbound
for the simplerecallderivedin this paperis also a lower
bound for the absolutecapacity,and thus also a lower
bound for the relative capacity, which may be rather
difficultto derivedirectly.
10. RELATEDWORK
Associativememoryis one of the earliestand still most
activeareas of neuralnetworkresearch,and the convergence-zonemodel needs to be evaluatedfrom this perspective.Althoughthe architectureis mostlymotivated
by the neurosciencetheory of perceptualmaps, hippocampalencoding,and convergencezones,it is mathematically most closely related to statistical associative
memories and the sparse distributed memory model.
Contrastingthe architecturewith the Hopfieldnetwork
and modified backpropagationis appropriatebecause
these are the best-knownassociativememorymodelsto
date. Eventuallyconvergence-zonememorymightserve
as a modelof humanepisodicmemorytogetherwiththe
trace featuremap modeldescribedbelow.Althoughit is
an abstractmodelof the hippocampalsystem,it is consistentwith the morelow-levelmodelsof the hippocampal circuitry,and complementsthem well.
activation to the network, and the unit activations
are updated asynchronouslyuntil they stabilize. The
finalstableactivationpatternis then taken as the output
of the network.In the convergence-zonemodel, on the
other hand, retrieval is a four-step version of simple
recall: first the activationis propagatedfrom the input
maps to the bindinglayer, thresholded,and then propagated back to all feature maps, where it is thresholded
again. This algorithmcan be seen as a computational
abstractionof an underlyingasynchronousprocess.In a
more low-levelimplementation,thresholdingwould be
achieved through inhibitory lateral connections. The
neuronswould update their activationone at a time in
random order, and eventually stabilize to a state that
representsretrievalof a pattern.
Thecapacityfor the Hopfieldnetworkhasbeen shown
theoreticallyto be N/41nN(Amit, 1989; Hertz et al.,
1991; Keeler, 1988; McEliece et al., 1986) and
experimentallyabout 0.15N (Hopfield, 1982). For the
convergence-zonemodelsucha simpleclosed-formformula is difficultto derive,becausethe modelhas many
more parameters and correlationsthat complicatethe
analysis.However,as was shownabove,a lowerbound
canbe derivedfora givensetofparameters.Suchbounds
and alsoexperimentalsimulationsshowthatthe capacity
for the modelcan be ordersof magnitudehigherthanthe
numberof units,whichis ratheruniquefor an associative
memoryneuralnetwork.
However,it shouldbe noted that in the convergencezone model, each pattern is much smaller than the
network.In a Hopfieldnetwork of size N each pattern
containsNbits of information,whilein the convergencezonemodeleach patternconsistsof onlyt features.Each
featurecan be seen as a numberbetween 1 andj correspondingto its locationin the featuremap. To represent
such a number,210g~bits are needed,and a featurepattern thus containst210g~bits of information.Compared
to the Hopfield model and other similar associative
memorymodels,the informationcontentof each pattern
hasbeentradedoff forthe capacityto storemoreof them
in a networkof equal size.
10.L The HopfieldModel
The Hopfieldnetwork (Hopfield,1982)was originally
developedto modelthe computationalpropertiesof neurobiologicalsystemsfrom the perspectiveof statistical
mechanics(Amit et al., 1985a,b; Kirkpatrick& Shernngton, 1988;Peretto & Niez, 1986).The Hopfieldnetworkis characterizedby full connectivity,exceptfroma
unit to itself.Patternscan be storedone at a time,but the
storagemechanismis rather involved.To storean additionalpatternin a networkof, say,Nunits, theweightsof
all the N X (N – 1) connectionshave to be changed.In
contrast,the convergence-zonememoryis moresparsein
that only t X m < n <<~weightshave to be modified.
A pattern is retrieved from the Hopfield network
through progressive recall. The cues provide initial
10.2.Backpropagationand RelatedModels
Several models of associativememory have been proposedthatarebasedon backpropagationor similarincremental learning rules (Ackley et al., 1985; Anderson
et al., 1977;Knapp & Anderson, 1984;McClelland&
Rumelhart, 1986a, b). However, these models suffer
from catastrophicinterference,which makes it difficult
to apply them to modelinghuman associativememory
(Grossberg,1987;McCloskey& Cohen, 1989;Ratcliff,
1990).If the patterns are to be learned incrementally,
withoutrepeatingthe earlier patterns,the later patterns
in the sequencemust erase the earlier associationsfrom
memory.
Several techniqueshave been proposed to alleviate
1030
forgetti~g,includingusing weightswith differentlearning ratqs (Hinton & Plaut, 1987), gradually including
new eixamples and phasing out earlier ones
(Hether@gton& Seidenberg,1989),forcingsemidistributed hiqlden-layerrepresentations(French, 1991),concentra~g changeson novelparts of the inputs(Kortge,
1990), iusing units with localized receptive fields
(KmschMce,
1992),and addingnew units and weightsto
encode ~newinformation(Fahlman, 1991; Fahlman &
Lebiere~1990). In these models, one-shot storage is
still notlpossible,althoughthe numberof requirediterations is !reduced,and old informationcan be relearned
very fast. At this point it is also unclear whetherthese
architechueswould scale up to the numberof patterns
appropriatefor humanmemory.
10.3.S~tistical AssociativeMemoryModels
The convergence-zonemodel is perhaps most closely
related to the correlation matrix memory (Kohonen,
1971, 1P72;see also, Anderson, 1972;Cooper, 1973).
In this model there are a number of receptors (correspondin~ to feature maps in the convergence-zone
model) i?hatare comected to a set of associators(the
bindingilayer).The receptorsare dividedinto key fields
wheret.lperetrievalcue is specified,and datafieldswhere
the re~eved pattern appears. Each key and data field
correqxlmdsto a feature map in the convergence-zone
model.Insteadof one value for each field,a wholefeaturemaprepresentsthe field,modelingvalue-unitencoding in I biological perceptual systems. There is no
distinctjbnbetween key and data fields either; every
feature map can functionas a key in the convergencezone model.
othe~ related statisticalmodels include the learning
matrix @teinbuch,1961)and the associativenet (Willshawet Id., 1969),whichareprecursorsof thecorrelation
matrixmodel.Thesehad a uniformmatrixstructureconnectinginputsto outputsin a singlestep.Suchmodelsare
simpleandeasyto implementin hardware,althoughthey
do not have a very high capacity(Faris& Maier, 1988;
Palm, 1980,1981).Otherassociativematrixmodelshave
relied on progressiverecall, and thereforeare similarin
spirittoltheHopfieldnetwork.The networkof stochastic
neumnslof Little & Shaw (1975)and the modelsof the
hippocampus of Gardner-Medwin (1976) and Marr
(1971) fall into this category.Progressiverecall gives
them a potentially higher capacity, which with high
connectivity exceeds that of the Hopfield network
(Gibsod & Robinson, 1992). They are also more
plausibk in that they do not requireN*internalconnections (where N is the number of units in the network),
althoughthe capacitydecreasesrapidlywith decreasing
connectivity.
M. Moll and R. Miikkulainen
10.4.SparseDistributedMemory
*
TheSparseDistributedMemorymodel(SDM;Kanerva,
1988) was originally developed as a mathematical
abstractionof an associativememory machine. Keeler
(1988) developeda neural-networkimplementationof
the idea and showedthat the SDM comparesfavorably
with the Hopfield model; the capacity is larger and
the patterns do not have to include all units in the
network.
It is possibleto give the convergence-zonememory
model an interpretationas a special case of the SDM
model:A fully-connectedtwo-layernetworkconsisting
of a combinedinput/outputlayer and a hiddenlayer. In
alternatingsteps, activityis propagatedfrom the input/
outputlayerto the hiddenlayerand back.Seenthis way,
every unit in the input/outputlayer correspondsto one
feature map in the convergence-zonemodel, and the
hiddenlayer correspondsto the bindinglayer.
It can be shownthat the capacityof the SDM is independentof the sizeof theinputioutputlayer.Moreover,if
the size of the input/outputlayer is fixed, the capacity
increaseslinearlywiththe sizeof the hiddenlayer.These
results suggestthat similarpropertiesapply also to the
convergence-zoneepisodicmemorymodel.
10.5.TraceFeatureMaps
The Trace Feature Map model of Miikkulainen(1992,
1993)consistsof a self-organizingfeaturemap wherethe
spaceof allpossibleexperiencesis firstlaidout.Themap
is laterallyfullyconnectedwith weightsthat are initially
inhibitory.Traces of experiencesare encodedas attractors usingthese lateralconnections.When a partialpattern is presented to the map as a cue, the lateral
connections move the activation pattern around the
nearestattractor.
The Trace FeatureMap was designedas an episodic
memorycomponentof a storyunderstandingsystem.The
main emphasiswas not on capacity,but on psychologically valid behavior. The basins of attraction for the
tracesinteract,generatingmany interestingphenomena.
For example,the later traceshave largerattractorbasins
andare easierto retrieve,anduniquetracesare preserved
even in an otherwiseoverloadedmemory.On the other
hand,becauseeach basin is encodedthroughthe lateral
comectionsof severalunits,the capacityof the modelis
several times smaller than the number of units. Also,
there is no mechanismfor encodingtruly novel experiences; only vectors that are already representedin the
map can be stored. In this sense, the Trace Feature
Map model can be seen as the cortical componentof
the human long-termmemory system.It is responsible
for many of the effects,but incorporatingnovel experiencesintoits existingstructureis a lengthyprocess,as it
appearsto be in humanmemorysystem(Halgren,1984;
McClellandet al., 1995).
———–.—.—- -.
1031
Convergence-Zone Episdoci Memory: Analysis and Simulations
10.6.Modelsof the Hippocampus
A largenumberof modelsof the hippocampalformation
and its role in memoryprocessinghave been proposed
(Alvarez & Squire, 1994; Gluck & Myers, 1993;
McNaughton & Morris, 1987; Marr, 1971; Murre,
1995;O’Reilly& McClelland,1994;Read et al., 1994;
Schmajuk& DiCarlo, 1992;Sutherland& Rudy, 1989;
Teyler & Discenna,1986;Treves & Rolls, 1991,1994;
Wickelgren, 1979). They include a more detailed
description of the circuitry inside hippocampus,and
aim at showinghow memorytraces could be createdin
such a circuitry.The convergence-zonemodel operates
at a higherlevelof abstractionthan thesemodels,and in
this senseis complementaryto them.
Man (1971)presenteda detailedtheoryof the hippocampalformation,includingnumericalconstraints,capacity analysis, and interpretationat the level of neural
circuitry.The input is based on local input fibersfrom
the neocortex,processedby an input and outputlayer of
hippocampalneuronswith collateralconnections.Recall
is based on recurrentcompletionof a pattern.The convergence-zonememory can be seen as a high-level
abstractionof Marr’s theory, with the emphasison the
convergence-zonestructurethat allowsfor highercapacity than Marr predicted.
Several authors have proposeda role for the hippocampus similar to the convergence-zoneidea (Alvarez
& Squire, 1994;McClellandet al., 1995;Murre, 1995;
Teyler & Discenna,1986;Treves & Rolls, 1994;Wickelgren, 1979).In these models,hippocampusitself does
not store a completerepresentationof the episode,but
acts as an indexingarea, or compressedrepresentation,
that binds togetherparts of the actual representationin
the neocorticalareas.Treves& Rolls(1994)alsopropose
backprojectioncircuitry for accomplishingsuch recall.
Convergence-zonememoryis consistentwith suchinterpretations,focusing on analyzingthe capacity of such
structures.
One of the assumptions of the convergence-zone
memory, motivated by recent results by Wilson &
McNaughton (1993), is that the binding encoding is
sparse and random.The model by O’Reilly& McClelland (1994)showshow the hippocampalcircuitrycould
form such sparse,diverseencodings.They exploredthe
tradeoffbetweenpattern separationand completionand
showedthat the circuitrycouldbe setup to performboth
of these tasks simultaneously.The entorhinal-dentate–
CA3 pathwaycould be responsiblefor formingrandom
encodingsof traces in CA3, and the separationbetween
storageand recall could be due to overalldifferencein
the activitylevel in the system.
11. FUTUREWORK
Futurework on the convergence-zoneepisodicmemory
modelwill focuson three areas.First,the modelcan be
extendedin severalwaystowardsa moreaccuratemodel
of the actualneuralprocesses.For instance,lateralinhibitory connectionsbetween units within a feature map
couldbe addedto selecttheunitwiththe highestactivity.
A similarextensioncouldbe appliedto thebindinglayer;
only insteadof a singleunit, multipleunits shouldstay
activein theend.Lateralconnectionsin thebindinglayer
couldalso be usedto partiallycompletethe bindingpattern even beforepropagationto the featuremaps.As the
next step, the bindinglayer could be expandedto take
into accountfinerstructurein the hippocampus,including the encoding and retrieval circuitry proposed by
O’Reilly & McClelland (1994). A variation of the
Hebbian learning mechanism (Hebb, 1949; Miller &
MacKay, 1992) could then be used to implementthe
storageand recall mechanisms.In additionto providing
insight into the hippocampal memory system, such
researchcould lead to a practicalimplementationof the
convergence-zonememory,and perhapseven to a hardware implementation.
Second,a numberof potentialextensionsto the model
could be studiedin more detail. It might be possibleto
take sparseconnectivityinto accountin the analysis,and
obtain tighter lower boundsin this more realisticcase.
Recurrencecould be introducedbetween feature maps
and the bindinglayer, and capacitycould be measured
under progressiverecall. Possibilitiesfor extendingthe
modelto toleratemore overlapbetweenpatternsshould
also be studied.
Third, the model could be used as a steppingstone
towardsa morecomprehensivemodelof humanepisodic
memory,includingmodulesforthe hippocampusandthe
neocorticalcomponent.As discussedabove,the convergence-zonemodel seems to fit the capabilitiesof the
hippocampalcomponentwell, whereas somethinglike
trace feature maps could be used to model the cortical
component.It wouldbe necessaryto observeand characterizethe memoryinterferenceeffects of the components and compare them with experimentalresults on
human memory. However, the main challengeof this
researchwouldbe on the interactionof the components,
that is, how the hippocarnpalmemorycould transferits
contentsto the corticalcomponent.At thispointit is not
clearhowthisprocesscouldtakeplace,althoughseveral
proposalsexist(Alvarez& Squire,1994;Halgren,1984;
McClellandet al., 1995; Milner, 1989; Murre, 1995;
Treves & Rolls, 1994). Computationalinvestigations
could prove instrumentalin understandingthe foundationsof this remarkablesystem.
12. CONCLUSION
Mathematical analysis and experimental simulations
show that a large number of episodescan be stored in
the convergence-zonememory with reliable contentaddressableretrieval.For the hippocampus,a sufficient
capacitycan be achievedwith a fairly smallnumberof
1032
M. Moll and R. Miikkulainen
units and connections.Moreover,the convergencezone
itselfrequiresonlya fractionof thehardwarerequiredfor
perceptualrepresentation.Theseresultsprovidea possible explanationfor why human memory is so efficient
with su~ha highcapacity,and whymemoryareasappear
small aompared to the areas devoted to low-level
perceptualprocessing.It also suggeststhat the computationaluhitsof the hippocampusand the perceptualmaps
can be quite coarse, and gives a computationalreason
why the maps and the hippocampusshouldbe sparsely
connected.
The model makes use of the combinatoricsand the
clean-up properties of coarse coding in a neurallyinspireclarchitecture.The practical storage capacity of
the modelappearsto be at leasttwo ordersof magnitude
higher than that of the Hopfieldmodel with the same
number of units, while using two orders of magnitude
fewerconnections.On the otherhand,the patternsin the
convergence-zonemodelare smallerthanin theHopfield
network. Simulations also show that psychologically
valid error behaviorcan be achievedif the bindingpatterns are made more descriptive:the erroneouspatterns
are closeto the correctones.The convergence-zoneepisodic memory is a step towards a psychologicallyand
neurophysiologicallyaccuratemodelof humanepisodic
memory,the foundationsof which are only now beginning to be understood.
REFERENCES
Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning
rdgoritbmfor Boltzmann machines. Cognitive Science, 9, 147–169.
Alon, N., & Spencer, J. H. (1992). The Probabilistic Method. New
York: Wiley.
Alvarez, P., & Squire, L. R. (1994). Memory consolidation and the
medial temporrd lobe: A simple network model. Proceedings of
the National Academy of Sciences of the USA, 91; (pp. 7041-7045).
Amaral, D. G., Ishizuh N. & Claibome, B., (1990). Neurons, numbers
and the hippocarnpzdnetwork. In J. Storm-Mathisen, J. Zirmner &
O. P. Ottcraen (Eda), Understanding the Brain Through the Hippocamptis, vol. 83 of Progress in Brain Research (pp. 1–1 1). New
York: Elsevier.
Amari, S.-I. (1977). Neural theory of association and concept
fonnafion. Biological Cybernetics, 26, 175-185.
Arnari, S.-I. (1988). Associative memory and its statistical neurodynamical anatysis. In H. Haken (Ed.), Neural and Synergetic Computers (pp. 85–99). Berlin: Springer Verlag.
Amit, D. J. (1989). Modeling Brain Function: The World of Attractor
Neural Networks. Cambridge: Cambridge University press.
Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Spin-glass
models of neural networka. Physical Review A, 32, 1007-1018.
Amit,
D. J., Gutfreund, H., & Sompolinsky, H. (1985). Storing infinite
numbers of patterns in a spin-glass model of neural networks.
Physical Review Letters, 55, 1530-1533.
Anderson, J. A. (1972). A simple neural network generating an interactive memory. Mathematical Biosciences, 14, 197–220.
Andersoq, J. A., Silverstein, J. W., Ritz, S. A., & Jones, R.S. (1977).
Distinctive features, categorical perception and probability learning: some applications of a neurrd model. Psychological Review,
84,413-451.
Bain, L. J., & EngelhardG M. (1987). Introduction to Probability and
Mathematical Statistics. Boston, MA: PWS publishers.
Boss, B., Peterson, G., & Cowan, W. (1985). On the nmnbersof neurons
in the dentate gmys of the rat. Brain Research, 338, 144-150.
Boss, B., Turlejski, K., Stanfield, B., & Cowan, W. (1987). On the
numbers of neurons in fields CA1 and CA3 of the hippocrnnpus
of Sprague-Dawley and Wistar rats. Brain Research, 406, 280–287.
Cooper, L. N. (1973). A possible organization of animal memory and
learning. In B. Lundquist and S. Lundquist (Ed.), Proceedings of the
Nobel Symposium on Collective Properties of Physical Systems
(pp. 252-264). New York: Academic Press.
Darnaaio, A. R. (1989). The brain binds entities and events by multiregionrd activation from convergence zones. Neural Computation,
1, 123-132.
Darnasio, A. R. (1989). Time-locked mukiregional retroactivation: A
systems-level proposal for the neural substrates of recall and
recognition. Cognition, 33, 25–62.
Fahlrnan, S. E. (1991). The recurrent cascade-correlation architecture.
In R. P. Lippmann, J. E. Moody & D. S. Touretzky (Eds), Advances
in Neural Information Processing Systems, 3 (pp. 190–205). San
Mateo, CA: Morgan Kaufmann.
Fahlrnan, S. E., & Lebierc, C. (1990). The cascade-correlation learning
architecture. In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems, 2 (pp. 524-532). San MatCo,CA: Morgan
Kaufmarm.
Faris, W. G., & Maier, R. S. (1988). Probabilistic analysis of a learning
matrix. Advances in Applied Probability, 20, 695–705.
French, R. M. (1991). Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proceedings of the 13th Annual Conference of the Cognitive Science Society
(pp. 173-178). Hillsdale, NJ: Erlbamn.
Gardner-Medwin, A. R. (1976). The recall of events through the learningof associations between their parts. Proceedings of the Royal
Society of Lnndon B, 194 375–402.
Gibson, W. G., & Robinson, J. (1992). Statistical analysis of the
dynamics of a sparae associative memory. Neural Networks, 5,
645-661.
Gluck, M. A., & Myers, C. E. (1993). Hippocampaf mediation of stimulus representation: A computational theory. Hippocampus, 3,
491–516.
Grossberg, S. (1983). Studies of Mind and Brain: Neural Principles of
Learning, Perception, Development, Cognition and Motor Control.
Boston, Dordrecht, Holland: Reidel.
Grossberg, S. (1987). Competitive learning: From interactive activation
to adaptive resonance. Cognitive Science, 11, 23–63.
Halgren, E. (1984). Human hippocarnpal and amygdrda recording and
stimulation: Evidence for a neural model of recent memory. In L.
Squire & N. Butters (Eds), The Neuropsychology of Memory
(pp. 165-182). New York: Guilford.
Hasselmo, M. E., Rolls, E. T., & Baylis, G. C. (1989). The role of
expression and identity in the face-selective responses of neurons
in the temporal visual cortex of the monkey. Behavioral Brain
Research, 32, 203-218.
Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: Wiley.
Heit, G., Smith, M. E., & Halgren, E. (1989). Neural encoding of
individual words and faces by the human hippocarnpus and
arnygdrda.Nature, 333, 773-775.
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theory
of Neural Computation. Reading, MA: Addison-Wesley.
Hetherington, P. A., & Seidenberg, M. S. (1989). Is there “catastrophic
interference” incomrectionist networks? In Proceedings of the Ilth
Annual Conference of the Cognitive Science Society (pp. 26-33).
Hillsdrde, NJ: Erlbamn.
Hinton, G. E., & Anderson, J. A. (Eds) (1981). Parallel Models of
Associative Memory. Hillsdale, NJ: Erlbaum.
Hinton, G. E., & Plaut, D. C. (1987). Using fast weights to deblur
old memories. In Proceedings of the Ninth Annual Conference of
the Cognitive Science Society (pp. 177–186). Hillsdale, NJ:
Erlbamn.
Convergence-Zone Episdoci Memory: Analysis and Simulations
Hopfield, J. J. (1982). Neural networks and physical systems with
emergent collective computational abilities. Proceedings of the
National Academy of Sciences USA, 79, 2554–2558.
Hopfield, J.J. (1984). Neurons with graded response have collective
computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences USA, 81, 3088–3092.
Kairiss, E. W., & Miranker, W. L. (1997). Cortical memory dynamics.
Biological Cybernetics, in press.
Kandel, E. R. (1991). Nerve cells and behavior. In E. R. Karrdel,J. H.
Schwartz & T. M. Jessell (Eds), Principles of Neural Science
(pp. 18-32). New York: Elsevier.
Kanerva, P. (1988). Sparse distributed nsemo~. Cambridge, MA: MIT
Press.
Keeler, J. D. (1988). Comparison between Kanerva’s SDM and
Hopfield-type neural networks. Cognitive Science, 12, 299-329.
Kirkpatrick, S., & Sherrington, D. (1988). Infinite range models of spinglasses. Physical Review B, 17, 4384-4403.
Knapp, A. G., & Anderson, J. A. (1984). Theory of categorization based
on distributed memory storage. Journal of Experimental Psychology: Learrring. Memoq and Cognition, 10, 616–637.
Knudsen, E. I., du Lac, S., & Esterly, S. D. (1987). Computational maps
in the brain. In W. M. Cowan, E. M. Shooter, C. F. Stevens &R. F.
Thompson (lids),AnnualReview ofNeuroscience (pp. 41–65). Palo
Alto, CA: Annual Reviews.
Kohonen, T. (1971). A class of randomly organized associative
memories. Acta Polytechnic Scarrdinavica,EL25.
Kohonen, T. (1972). Correlation matrix memories. IEEE Transactions
on Computers, C-21, 353–359.
Kohonen, T. (1977). Associative Memory: A System-Theoretical
Approach. Berlin, Heidelberg, New York: Springer.
Kohonen, T. (1989). SeZf-OrganizationmsdAssociative Memory, Third
Edn. Berlin, Heidelberg, New York: Springer.
Kohonen, T., & Mikisara, K. (1986). Representation of sensory information in self-organizing feature maps. In J. S. Denker (Ed.) Neural
Networks for Computing op. 271–76). New York: American Institute of Physics.
Kortge, C. A. (1990). Episodic memory in connectionist networks. In
Proceedings of the 12th Annual Conference of the Cognitive
Science Society (pp. 7~–771). HillsdaJe, NJ: Erlbaum.
Kmscbke, J. K. (1992). ALCOVE: An exemplar-based
conneetionist model of category learning. Psychological Review,
99, 22–44.
Little, W. A., & Shaw, G. L. (1975). A statistical theory of short and
long term memory. Behavioral Biology, 14, 115–133.
Marr, D. (1971). Simple memory: a theory for archicortex. Philosophical Transactions of the Royal Society of London B, 262, 23–
81.
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). why
there are complementary learning systems in the hippocampus and
neocortex: Insights from the successes and failures of comrectiorrist
models of learning and memory. Psychological Review, 102, 419–
457.
McClelland, J. L., & Rumelhart, D. E. (1986a). Anmesiaand distributed
memory. In D. E. Rumelhart & J. L. McClelland (Eds), Parallel
Distributed Processing: Explorations in the Microstructure of Cognition, vol. 2, Psychological and Biological Models (pp. 503–527).
Cambridge, MA: MIT Press.
McClelland, J. L., & Rumelhart, D. E. (1986b). A distributed model of
human learning and memory. In J. L. McClelland &D. E. Rumelhart (Eds), Parallel Distn’buted Processing: Explorations in the
Microstructure of Cognition, vol. 2, Psychological and Biological
Models (pp. 170–215). Cambridge, MA: MIT Press.
McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in
connectionist networks: The sequential learning problem. The Psychology of Learning and Motivation, 24, 104-169.
McEliece, R. J., Posner, E. C., Rodemich, E. R., & Verrkatesh, S. S.
(1986). The capacity of the Hopfield associative memory. IEEE
Transactions on Information Theory, 33, 461–482.
1033
McNaughton, B. L., & Morris, R. G. M. (1987). Hippocampal synaptic
enhancement and information storage within a distributed memory
system. Trends in Neuroscience, 10, 408–415.
Merzenich, M. M., Nelson, R. J., Stryker, M. P., Cynader, M.S.,
Schoppmarm, A., & Zook, J. M. (1984). Somatosensory corticaJ
map changes following digit amputation in adult monkeys. Journal
of Comparative Neurology, 224, 591-605.
Miikkulainen, R. (1992). Trace feature map: A model of episodic associative memory. Biological Cybernetics, 67, 273–282.
Miikkulairren, R. (1993). Subsymbolic Natural Language Processing:
An Integrated Model of Scripts, Lexicon, and Memory. Cambridge,
MA: MIT Press.
Miller, K. D., & MacKay, D. J. C. (1992). The role of constrains in
Hebbiarr learning. Neural Computation, 6, 100-126.
Milner, P. (1989). A cell assembly theory of bippocarnpal amnesia.
Neuropshychologia, 27, 23–30.
Mrrrre, J. M. J. (1995). A model of amnesia. Manuscript.
O’Reilly, R. C., & McClelland, J. L. (1994). HippocampaJ conjunctive
encoding, storage, and recall: Avoiding a trade-off. Hippocampus,
4, 661–682.
Palm, G. (1980). On associative memory. Biological Cybernetics, 36,
19–31.
Palm, G. (1981). On the storage capacity of an associative memory with
randomly distributed storage elements. Biological Cybernetics, 39,
125-127.
Peretto, P., & Niez, J. (1986). Collective properties of neural networks.
In E. Bienenstock, F. Fogelman Souli6 & G. Weisbuch (E&),
Disordered systems and biological organization (pp. 171-185).
Berlin, Heidelberg, New York: Springer.
RatcJiff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological
Review, 97, 285-308.
Read, W., Nenov, V. I., & Halgren, E. (1994). Role of inhibition in
memory retrieval by hippocampal area CA3. Neuroscience and
Biobehavioral Reviews, 18, 55–68.
Ritter, H. J. (1991). Asymptotic level density for a class of vector
quantization processes. IEEE Transactions on Neural Networks,
2, 173-175.
Rolls, E. T. (1984). Nerrronsinthe cortex of the temporal lobe and in the
amygdala of the monkey with responses selective for faces. Human
Neurobiology, 3, 209-222.
Schmajuk, N. A., & DiCarlo, J. J. (1992). Stimulus configuration, classical conditioning, and Irippocampal function. Psychological
Review, 99, 268-305.
Sejnowski, T. J., & Churchland, P. S. (1989). Brain and cognition. InM.
I. Posner (Ed.), Foundations of Cognitive Science (Chapter 8,
PP. 31$3s6). Cambridge, MA: MIT Press.
Squire, L. R. (1987). Memory and Brain. Oxford, UK; New York:
Oxford University Press.
Squire, L. R. (1992). Memory and the hippocarnpus: A synthesis from
findingswith rats, monkeys, andhurnans. Psychological Review, 99,
195–231.
Squire, L. R., SIrimamura,A. P., & Amarrd, D. G. (1989). Memory and
the hippocampus. In J. H. Byrne & W. O. Berry (E&), Neural
Models of PlasticiQ (pp. 208–239). New York: Academic Press.
Steinbuch, K. (1961). Die Lemmatrix. Kybemetic, 1, 36-45.
Sutherland, R. W., & Rudy, J. W. (1989). Configural association theory:
The role of the hippoctunpal formation in learning, memory and
amnesia. Psychobiology, 17, 129–144.
Teyler, T. J., & Discenna, P. (1986). The Irippocarnpalmemory indexing theory. Behavioral Neuroscience, 100, 147–154.
Treves, A., & Rolls, E. T. (1991). What determines the capacity of
autoassociative memories in the brain? Network, 2, 371–398.
Treves, A., & Rolls, E. T. (1994). Computational analysis of the role of
the hippocarnpus in memory. Hippocarnpus,4, 374-391.
Tulving, E. (1972). Episodic and semantic memory. InE. Tulving & W.
Donaldson (Eds), organization of memory (pp. 381–403). New
York: Academic Press.
1034
M. Moll and R. Miikkulainen
Tulving, E. (1983). E2ements of Episodic Memory. Oxford, UK; New
York: Oxford University Press.
Wickelgren, W. A. (1979). Chunking and consolidation: A theoretical
synthesis of semantic networks, configuring, S-R versus cognitive
learning, normal forgetting, the amnesic syndrome, and the hippocampal arousal system. Psychological &view, 86, 44-60.
Willshaw, D. J., Buneman, O. P., & Longuet-Higgins, H. C. (1969).
Non.holographic associative memory. Natire, 222, 960-%2.
Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippocampal ensemble code for space. Science, 261, 1055–1058.
Using (A.5), an expression for E(Zi) can be derived:
i
E(Z,)=k~,E(Y~)=,~lm(l
-~)’
-’=n(l
- (1-
~)i).
(A.6)
This equation indicates that initially, when no patterns are stored, Zi =
O,and that Zi converges ton as i goes to infinity.
The variance of Zi can be computed in a similar way. First a recurrent expression for the variance of Yiis derived:
Var(Yt)=Ez-l IVarfYilZi - ~)]+ Varzi-, IE(YilZi_ 1)1
APPENDIXA: PROBABILITYTHEORY
BACKGROUNDAND PROOFS
In this appendix, concepts from probability theory that are necessary for
understanding the main text are briefly reviewed, and details of the
probabilistic formulation and mmtingale analysis are presented (for
more background on probability theory and statistics, see e.g. Alon &
Spencer, 1992 or Bain & Engelhardt, 1987).
‘(x =‘1‘*)np)
e-’
w
(1 –/$-6
=
()
TOobti the variance of Zi, the covariance of Zi-l ~d Yineeds to k
computed:
=Ez,-, [E(Zi- ~Yil~- ~)]–E(Zi- ~)E(~)
In the analysis of Sections 3 and4, two probability density functions are
used. Tbe first one is the binomial distribution with parameters n andp,
denoted as B(n#), and it gives the outcome of n independent trials
where ~e two possible alternatives have probability p and 1 – p.
This distribution has the expected value np and variance np(l – p).
The other one is the hypergeometric distribution with parameters n, m
and N, denoted as HYP(n,mM, and representing the number of elements in common between two independently-chosen subsets of n and
m elements of a common superset of N elements. The distribution has
the expected value ~ and variance Y(1– ~) H.
The Chemoff bounds can be used to estimate how likely a binomially distributed variable, X, is to have a value within a given distance 6
from its mean:
()
(A.7)
COV(Zi
- ~,Yi)=E(Zi - 1~) – E(Zi - l)E(yi)
Al. Distributionsand Bounds
P(xs(1– a)np)s
=%+(+:)
’-’)(:):+HV’JHV-J
, 0<6<1,
e’
‘,6>0
(1 +6)1+6
(
m(n – Zi- I )
=E Zi- ~
~
– E(Zi - ~)E(Yi)
)
=mE(Zi-, ) – ~[(E(Zi - ~))z+ Var(Zi- ~)]– E(~- ,)E(Yi)
=— ~Var(Zi-l).
(A.8)
The variance of Zi cm now be easily derived:
Var(Zi) = Var(Z’- ~+ Yi)
= Var(Zi- ~)+ Var(Yi)+ 2C0V(Zi_ 1,‘i)
=
(Al)
>Ivar(yk)+z$,c.v(zk-,,y
=,$1 Var(Yk)- ~j~
“
Var(Z~)
(A.2)
= Var(Yi)+
Even if the trials are not independent (and therefore Xis not binomially
distributed), in some cases the sequence of variables XO,..., X“ can be
analyzed as a martingale, and bounds similar to Chemoff bounds
derived using Azrrma’s inequalities (see Appendix A.3).
1– ~
()
=n(l-:)i(l-~(l-
Var(Zi-1)
:)i)
+n(?l – 1) 1 – m(:(-–ml;
(
A.2. Detailsof the ProbabilisticFormulation
Asin,%etion 3, let Zibethe size of the binding constellation of a featore
unit after i patterns have been stored on it, and let Yi be its increase after
storing the itb pattern on it. Then
i
Zi= ~~,Yk.
(A.3)
Let Yi, i > 1 be hypergeometically distributed with parameters m,
n — zi-l), ad n:
1)
)’
‘rhebase of every exponential in this expression is smaller tharr 1, so the
variance goes to Oas i goes to infinity. This makes sense in the model. If
many patterns are stored, it becomes very likely that Zi has a value close
to n, and hence its variance should go to zero.
So far it has been assumed that i is an ordinary variable. If it is
replaced by a stochastic variable I that is bioomially distributed with
parameters p and 1~, the previously computed expected values and
variances must be made conditional on 1.To compute the mrconditiomd
expected values and variances, the following lemma is needed:
LEMMAAl. ~X - B(n,p), thenE((l–a~)=(1–ap)n.
‘(y=’’zi-l=zi-’)=(
H(::Y)Y(‘A4)
Proof.
:):)
E((l –aF)=
,.. (l –a~
and let ZI = YI = m with probability 1. Tben
n
i—1
‘(y’)=E(m(n-:-l))
=rn - ~E(z._l)=tn-
—
— (1– ~) E(Yi-,) =m(l
– ~)’-].
=
~,~lE(Yk)
q
n
.r=o x
n #(l-p)”-x
() x
)
(p - ap~(l –p)n-x
= ((p – ap) + (1 -p))”
(A.5)
(A.9)
=(1 – ap)”
Convergence-Zone Episdoci Memory: Analysis and Simulations
Let Zdenote the unconditional value of Zi. The desired results follow
immediately from Lemma A.1 and the linearity of expected values:
“2’=”(1-(
%)’)
(A.10)
Var(Z) = Varl[E(ZIII)] +E1[Var(Zrll)]
=n2Var
‘z-z’-”=
+-z’x’-+)ki-v+:)
1– ~ 1
+.(. –
In order to apply Azmna’s inequality, the Lipscbitz condition
l~E–Z~-l I = 1 must to be satisfied. The binding units are chosen
one at a time. and thev
. mav. either alreadv be Dart of the constellation
or add one more tit to it. Therefore, tb~ diff~rence between Z’y and
Z’v-l is rdways either Oor 1:
Case 1: Z’v –Z’, - ~=0
[( )]
+E “(1 - ~)r-nz(l
[
1035
- ~)”
—
—
)1
– m – 1)
1) 1– m(2n
n(n – 1)
(
fi-v
–( )
.—Z’v
n
—
‘
1–:
n
=s1.
(A.19)
Case 2: Z’V=Z’, _ 1+ 1
‘n(l-;)p(l-n(l-%)p)
(
+n(n – 1) 1 –
m(2n – m – 1) p
n(n – 1)$ )
(All)
‘~-z~-l’=l-(n-z’+:)ki-”(
’-(’-i))
()
+
.
(A.12)
+ (n – m)(n –m–
(1-@-ml-;
1) 1 –
(
—
—
)’-1)
m(2n – m – 1) ‘–1
n(n – l)f )
I-%+)
ki-v+w”+’l
1(
n–1
n
n–Z’v
l–~
n )(
n)
——
ki-v
—
b–v
– Z’v–l
—
l–~
n)
–(n
<1.
(A.13)
From (A.1O)and (A.12) it follows that on the average, ~is larger tbarrZ.
Initially the difference is exactly m, and the difference goes to zero asp
goes to infinity. This means that initially, when few patterns are stored,
the right units are very likely to be retrieved, since they have the advantage of receiving activation from at least m binding units. However,
when more patterns are stored, rdmost every unit in a feature map
receives the same amount of activation. Units that have been used
most often are most likely to be retrieved.
It is difficult to derive an exact expression for the expected value and
variance of XC(the intersection of c retrieval cues) because the binding
constellations are correlated. Since the same partial patterns might have
been stored several times, certain combinations of retrieval cues can
give rise to spurious activity in the binding layer, which is difficult to
take into account in the analysis. For this reason, the analysis is carried
out under the assumption that the chance of partial patterns is negligible, which is reasonable in most cases and easy to check.
1–!
n
“2)=m+(n-m)(’-(
1-%
Y)>
‘ar(2)=@-m’(1-%)p-’
1
ki–v+
Let ~ be defined as Z-ZIIZ >1. Then, ~ will always be at least m.
Expressions sirniku to (All) and (A.12) can be derived for Z:
(A.20)
In both cases we have [~E– ~E-, I <1 and hence Azuma’s inequality
can be applied to obtain a bound for Z.
A.5. The IntersectionMartingale
Let X; be defined as in Appendix 4.3:
XE=X,
+
““
(., –v)(rrz –X’v)
., —v
. HXf
n. —v
j (.l – v)nz
n, —v
‘
(A.21)
First, X~,;.uxPE
XE is shown to be a martingrde. The expected increase
of X’v is ; -,;;. The conditional expected value of X,B given X:-l
is
A.3. Martingales
Martingales give us a way to analyze the outcome of n trials with
limited dependencies. A martingale is a sequence of random variables
XO,..., Xn such that
E(X’IXO,
...,
Xi-, ) =Xi - ~, 1S i < n.
(A.14)
If a martingale satisfies the Lipschitz condition, that is
IXi–Xi- ~I =
1, 1S i = n,
E(X; IX;- ~=$-
(A.15)
rr.2—X’j-,
(.l – v)rr~
= ~(x’j-l
+ n, –v+ ~)+ —
n, – v
s
n.
—nl
(n,
–nl)nz
+ (.l – v)nz
.—~
n, – v + 1x j- 1+ (n, – v)(rr,– v+ 1)
., —v
(A.16)
.— ., —n,
n, – v +
then the following inequalities hold:
F’(Xis XO– Xti)s
e-A’n
~)=E(X; IX’V- ~=x’j- ,)
r
(n, –nl)nz
1x‘- 1+ (n, – v)(rr,– v + 1)
+ (., – v)(rrl– v)nz +(.l –v)nz
(., –v)(n, –v+
These equations are called Azmna’s inequalities and they can beusedto
bound the final value of the sequence within a chosen confidence level.
rr~—nl
= —X’j–
n, – v + 1
=2-,.
A.4. The BindingConstellationMartingale
Let ~E be defined as in Section 4.2:
n – (n – Z’.)
()
~
1– ;
ki-.
.
(A.18)
(A.22)
Since E(X~lX~-~)=X~- ~, the sequence X:, ..., X~l is a martingale.
To a ply Azuma’s inequality, it is necessary to show that
IX! –X< ~I = 1 forv = 1,...,nl. Unlike for the martingale of Appendix
A.4, this is not generally true, but true only when certain constraints on
nl, .2 md n, are satisfied. If these parameters are fixed, the difference
can be seen as a function of v, X’. and X’,-1. The function is monotonic
in these variables, and to find its maximum, it is sufficient to look only
~=z’.+(n-z’+(+ki-v)
=
1)
(n, – v + l)rr*
~+
rr$– v + 1
1036
M. Moll and R. Miiklzulainen
at the extreme values of v, X’, snd X .–l. The maximum in turn determines the consh-aintson nl, n2 and n..
As in Appendix A.4, the proof consists of two main cases: (1) X’, =
X’y-l and (2) X’v = X’v-l + 1. Each of these is divided into several
srrbcases.The following tables show the cases, the absolute difference
in each case between X.E and X~-l, and the constraint that follows from
thedifference.
case1: X’v = x’v-~.
In this case, using (A.21), IX! –X;- ~I can be rewritten as
ere are three subcases listed in the table below.
(.2–X1,)(., -n,)
~
(n, –,)(.” –v+ 1).
Subcaae
v=
1,x’”=o
v = n,, X’v = max(O,nl + nz – n,)
Difference
n2(%
w)
–
rt=(n8–
n, >0
1)
n2
if X’u = O
n, – nl + 1
n$ —nl
if X’ti = nl + n2 – n.
n. – nl + 1
n2 —
v= n,, X’v = min(nl,nz)
Resultingconstraint
nl
n$ – nl + 1
v n2 < ns
nl + nz —
1s n~
no constraint
if X’v = nl
no constraint
Oif X’. = nz
case 2: x’, = xd_l + 1.
Now IX: –Xv-, I can be rewritten as (“s~~’~~~j,~j~~)x’v).Again
there are three subcases listed in the table below.
To conclude, if nl + nz – 1 = n,, then [X?– X:+, I = 1 and
Azuma’s inequality can be applied.
Subcase
Difference
v = 1,X’v
=1
(n, – nl)(n, –nz)
n,(n. – 1)
v = nl, X’v = max(l ,nl + nz – n$)
n.– nl – nz+ 1
IfX’v=1
n, – nl + 1
Resultingconstraint
no constraint
Oif X’. = nl + nz – n,
v= n,, X’v = min(nl,nz)
n8— n2
if X’v = nl
n. – n~ + 1
72$—nl
if X’v = nz
n. – nl + 1
no constraint
.–—.-
Download