6!!9 NeuralNetworks,Vol. 10,No.6, pp. 1017-1036,1997 01997 ElsevierScienceLtd.MI rightsreserved Printedin GreatBritain 0893-6080/97$17.00+.00 Pergamon CONTRIBUTED PII: S0893-6080(97)00016-6 ARTICLE Convergence-Zone Episodic Memory: Analysis and Simulations MARK MOLL1 AND RISTOMIIKKULAINEN2 ‘School of Computer Science, Carnegie Mellon University and ‘Department of Computer Sciences, The University of Texas at Austin (Received3 March1995;accepted11 November1996) Abstract—Hunrun episodic memory provides a seemingly unlimited storage for everyday experiences, and a retrieval system that allows us to access the experiences with partial activation of their components. The system is believed to consist of afast, temporary storage in the hippocampus,and a slow, long-term storage within the neocortex. Thispaper presents a neural network model of the hippocampal episodic memory inspired by Darnusio’s idea of Convergence Zones. The model consists of a layer of perceptual feature maps and a binding layer. A perceptual feature pattern is coarse coded in the binding layer, and stored on the weights between layers. A partial activation of the stored features activates the bindingpattern, which in turn reactivates the entire storedpattem. For many configurationsof the model, a theoretical lower boundfor the memory capacity can be derived, and it can bean order of magnitude or higher than the number of all units in the model, and several orders of magnitude higher than the number of binding-layer units. Computational simulationsfirther indicate that the average capacity is an order of magnitude larger than the theoretical lower bound, and making the connectivity between layers sparser causes an even further increase in capacity. Simulations also show that ifntore descriptive bindingpatterns are used, the errors tend to be rnoreplausible (patterns are confused with other similar patters), with a slight cost in capacity. The convergence-zone episodic memory therefore accounts for the immediate storage and associative retrieval capability and large capacity of the hippocampal memory, and shows why the memory encoding areas can be much smaller than the perceptual maps, consist of rather coarse computational units, and are only sparsely connected to the perceptual maps. 01997 Elsevier Science Ltd. Keywords-Convergencezones,Episodicmemory,Associative memory,Long-termmemory,Content-addressable memory,Memorycapacity,Retrievalerrors,Hippocarnpus. content-addressability.Most of the memories can be retrieved simply by activatinga partial representation of the experience,such as sound, a smell, or a visual image. Despite a vast amount of research, no clear understandinghas yet emerged on exactly where and how the episodic memory traces are represented in the brain. Severalrecent results,however, suggestthat the system consists of two components:the hippocampus servesas a fast, temporarystoragewhere the traces are createdimmediatelyas the experiencescomein, and the neocortex has the task of organizing and storing the experiencesfor the lifetime of the individual(Alvarez & Squire,1994;Halgren,1984;Marr, 1971;McClelland et al., 1995;Milner,1989;Squire,1992).It seemsthatthe traces are transferredfrom the hippocampusto the neocortex in a slow and tedious process, which may take several days, or weeks, or even years. After that, the hippocampusis no longer necessary for maintaining thesetraces,andtheresourcescanbe reusedforencoding new experiences. 1. INTRODUCTION Human memory system can be divided into semantic memoryof facts,rules,and generalknowledge,and episodic memory that records the individual’sday-to-day experiences(Tulving, 1972, 1983).Episodic memory is characterizedby extremeefficiencyandhighcapacity. New memoriesare formedeveryfew seconds,andmany of those persistfor years, even decades(Squire, 1987). Another significantcharacteristicof human memory is Acknowledgements: We thank Greg Plaxton for pointing us to martingale analysis on this problem, and two anonymous reviewers for references on hippocampal modeling, and for suggesting experiments with sparse comectivity. This research was supported in part by NSF grant no. IRI-9309273 and Texas Higher Education Coordinating Board Grant no. ARP-444 to the second author. The simulations were mn on the Cray Y-MP 8/864 at the University of Texas Center for High-Performance Computing. Requests of reprints should be sent to: Risto Miildmlainen, Department of Computer Sciences, The University of Texas at Austin, Austin, TX 78712, USA; Tel: +1 512471 9571; Fsx: +1 512471 8885; e-mail: risto@cs.utexaa.edu. 1017 1018 I Althcmgh several artificialneural network modelsof associative memoryhave been proposed(Ackleyet al., 1985;/!m ari, 1977,1988;Anderson,1972;Andersonet al., 191‘7; Cooper, 1973; Grossberg, 1983; GardnerMedwir1, 1976; Hinton & Anderson, 1981; Hopfield, 1982, 1984; Kairiss& Miranker, 1997;Kanerva, 1988; Knapp & Anderson,1984;Kohonen,1971,1972,1977, 1989;K:ohonen& Miikisara,1986;Kortge, 1990;Little & Shaw, 1975; McClelland & Rumelhart, 1986b; Miikla.i.ainen, 1992; Steinbuch, 1961;Willshawet al., 1969), the fast encoding,reliable associativeretrieval, capacityof even the hippocampalcomponent of hum memoryhas been difficultto accountfor. For n the Hopfieldmodelof N units,N141nN patterns c:m be stored with a 99% probabilityof correct retrieva1 when N is large (Amit, 1989; Hertz et al., 1991;R:eeler, 1988;McElieceet al., 1986).This means that storing and retrieving,for example, 106memories wouldrequirein the orderof 108nodesand 1016connections, vIhich is unrealistic,given that the hippocamprd formatim in higheranimalssuch as the rat is estimated to have about 106primaryexcitatoryneuronswith 1010 connections (Amaral,Ishizuka,& Claibome,1990),and the entire human brain is estimatedto have about 1011 neurons~and 1015synapses(Jessell,1991). ‘l%es~ earliermodelshad a uniform,abstractstructure andwe]“enot specificallymotivatedby anyparticularpart of thehumanmemorysystem.In thispaper,anew model for asscwiativeepisodicmemoryis proposedthat makes use of 1.hreeideas “ about how the hippocampalmemory mighttK put together.The model abstractsmost of the low-level biologicalcircuitry,focusingon showingthat with a biologicallymotivated overall architecture,an episodi; memorymodel exhibitscapacityand behavior very silnilar to that of the hippocampalmemorysystem. The thr>ecentralideasare:(1)value-unitencodingin the inputfeaturemaps,(2)sparse,randomencodingof traces in the 1uppocampus,and (3) a convergence-zonestructure bel,weenthem. Sine: the input to the memory consists of sensory experierice,in the modelit shouldhave a representation similarto the perceptualrepresentationsin thebrain.The low-level sensory representationsare organized into maps, that is, similar sensoryinputs are representedby nearbylocationson the corticalsurface(Knudsenet rd., 1987).:1 is possiblethatalsohigher-levelrepresentations have a map-likestructure.This is hard to verify,but at least tkere is plenty of supportfor value-unitencoding, that is, that the neuronsrespondselectivelyto only certain tyIEs of inputs, such as particularfaces, or facial expressions, or particularwords (Hassehnoet al., 1989; Heit et al., 1989;Rolls, 1984). The structureof the hippocampusis quitewellknown, and ref:ently its dynamicsin memory processinghave alSO b xm observed. Wilson & McNaughton (1993) found %at rats encode locations in the maze through ensemtdes of seemingly random, sparse activation ..— M. h4011and R. J4iikkukainen patterns in the hippocampalarea CA1. When the rat exploresnew locations,new activationpatternsappear, and when it returns to the earlier locations,the same pattern is activatedas during the first visit. O’Reilly& McClelland(1994)showedthat the hippocampalcircuitry is well-designedto form such sparse,diverseencodings, and that it can also perform pattern completion duringrecall. Damasio(1989a,b)proposeda generalframeworkfor episodicrepresentations,based on observationsof typical patterns of injury-relateddeficits.The idea is that there is no multi-modalcortical area that would build an integrated and independent representation of an experiencefrom its low-level sensory representations. Instead,the representationtakes place only in the lowlevelcortices,with the differentpartsboundtogetherby a hierarchyof convergencezones.An episodicrepresentation can be recreated by activatingits corresponding bindingpatternin the convergencezone. The convergence-zoneepisodic memory model is looselybased on the above three ideas. It consistsof a layerof perceptualmapsand a bindinglayer.An episodic experienceappearsas a patternof localactivationsacross the perceptualmaps,and is encodedas a sparse,random patternin the bindinglayer.The connectionsbetweenthe mapsand the bindinglayerstorethe encodingin a single presentation,andthecompleteperceptualpatterncan later be regeneratedfrompartialactivationof theinputlayer. Many details of the low-level neural circuitry are abstractedin the model. The units in the model correspond to functionalcolumns rather than neurons and their activation levels are represented by integers. Multi-stageconnectionsfrom the perceptual maps to the hippocampusare modeledby direct binary connections that are bidirectional,and the connectionswithin the hippocampusare not takenintoaccount.At this level of abstraction,thebehaviorof the modelcan be analyzed both theoretically and experimentally, and general resultscan be derivedaboutits properties. A theoreticalanalysisshows that: (1) with realisticsize mapsand bindinglayer,the capacityof the convergence-zonememory can be very high, higher than the numberof units in the model,and can be severalorders of magnitudehigher than the number of binding-layer units;(2) the majorityof the neuralhardwareis required in the perceptualprocessing;the bindinglayer needs to be onlya fractionof the sizeof the perceptualmaps;and (3) the computationalunits could be very coarse in the hippocampusand in the perceptualmaps; the required capacityis achievedwith a very small number of such units. Computationalsimulationsof the model further suggest that: (1) the average storage capacity may be an order of magnitudehigherthan the theoreticallower bound; (2) the capacity can be further increased by reducingthe connectivitybetweenfeaturemaps and the binding layer, with best results when the connectivity matches the sparsenessof the binding representations; Convergence-Zone Episdoci Memory: Analysis and Simulations BindinsImer ———— Fe@reMsp 1 FeatureMSp2 FeetureMsp 3 FestureMsp4 FIGURE1. Storage.The weightson the connectionsbetweenthe appropriatefeature units and tha binding represantstionof the pattern sre set to 1. and (3) if thebindingpatternsfor similarinputsare made more similar, the errors that the model makes become more plausible:the retrievedpatternsare similarto the correct patterns. These results suggest how one-shot storage, content-addressability,high capacity, and robustnesscould all be achieved within the resources of the hippocampalmemorysystem. 2. OUTLINE OF THE MODEL The convergence-zonememory model consists of two layers of real-valuedunits (the feature map layer and the bindinglayer), and bidirectionalbinary connections betweenthe layers(Figure1).Perceptualexperiencesare representedas vectorsof featurevalues,suchas color= red,shape = round,size = small. Thevrduesare encodedas unitson the featuremaps.Thereis a separate map for each featuredomain,and each unit on the map represents a particular value for that feature. For instance, on the map for the color feature, the value red could be specifiedby turning on the unit in the lower-rightquarter (Figure 1). The feature map units are connected to the binding layer with bidirectional binary connections(i.e. the weightis either Oor 1). An activation of units in the feature map layer causes a number of units to become active in the bindinglayer, and viceversa.In effect,the bindinglayeractivationis a compressed, distributed encoding of the perceptual value-unitrepresentation. Initially,allconnectionsare inactiveat O.A perceptual experienceis stored in the memorythroughthe feature map layer in three steps.First, thoseunits that represent the appropriatefeaturevaluesare activatedat 1. Second, a subsetof m bindingunits are randomlyselectedin the bindinglayeras the compressedencodingfor thepattern, and activatedat 1. Third, the weightsof all the correctionsbetweenthe activeunitsin the featuremapsandthe activeunits in the bindinglayer are set to 1 (Figure 1). Note that only one presentationis necessaryto store a patternthis way. 1019 To retrievea pattern,firstall bindingunitsare set to O. The pattern to be retrieved is partially specifiedin the feature maps by activatinga subsetof its feature units. For example,in Figure 2a the memoryis cued with the two leftmostfeatures.The activationpropagatesto the binding layer through all connectionsthat have been turnedon so far. The setof bindingunitsthat a particular featureunitturnson is calledthebindingconstellationof thatunit.Allbindingunitsin the bindingencodingof the patternto be retrievedare activeat 2 becausetheybelong to the binding constellation of bothretrievalcue units.A number of other units are also activated at 1, because eachcueunittakespartin representingmultiplepatterns, and therefore has several other active connectionsas well. Only those units active at 2 are retained; units with less activationare turnedoff (Figure2b). The activationof the remainingbindingunits is then propagatedback to the featuremaps(Figure2c).A number of unitsare activatedat variouslevelsin each feature map, dependingon how well their bindingconstellation matchesthe currentpatternin thebindinglayer.Chances are that the unit that belongsto the same pattern as the cues has the largest overlap and becomes most highly activated. Only the most active unit in each feature mapis retained,andas a result,a complete,unambiguous perceptual pattern is retrieved from the system (Figure2d). If thereare n unitsin the bindinglayerand m unitsare chosen as a representationfor a pattern,the numberof possibledifferentbindingrepresentationsis equalto () n m If n is sufficientlylarge and m is relativelysmall compared to n, this number is extremelylarge, suggesting that the convergence-zonememory could have a very large capacity. However,dueto theprobabilisticnatureof the storage and retrievalprocesses,thereis alwaysa chancethat the retrieval will fail. The binding constellationsof the retrievalcue unitsmay overlapsignificantly,and several spuriousunits may be turned on at the binding layer. When the activationis propagatedback to the feature maps, some randomunit in a feature map may have a bindingconstellationthatmatchesthe spuriousunitsvery well (Figure3). This rogue unit may receivemore activation than the correct unit, and a wrong feature value may be retrieved.As more patternsare stored,the binding constellationsof the featureunitsbecomelarger,and erroneousretrievalbecomesmore likely. To determinethe capacity of the convergence-zone memory,the chanceof retrievalerrormustbe computed. Below, a probabilisticformulationof the model is first given, and a lower bound for the retrieval error is derived. 1020 ill hloll and R. Miikkulainen (a) (c) EI [ Fea( Featwe Map4 BindiMLas’er -. ———— FeatureMap 1 Featm Map2 Featurehiap3 BindingLayer (d) (b) FeatureMap 4 l_=_- ■ ■ I_._-.k r_’_’lr_l ■ r”_ lr————— X-.L-L--I LK--_-lu [ Feaf eMSP1 F--2 FeatureMap3 F-tare MSP4 Featurehlap1 FeetnreMap2 FeatureMap3 FeatureMSp4 FIGURI L Ratrlaval. A stored pattern la ratriavad by presenting a partial rapraaantationaa a cue. The eke of the square indicates activat I level of the unit. (a) Ratrlavalcuee activatea bhtdlngpattam. (b) Laaa active blndlng unite are tumad M. (c) Blndlng pattern Sctivat feature units. (d) Lass active faatum unlta are turned off. . Let PROBABILISTICFORMULATION Zi >thesizeof thebindingconstellationof a feature unitd increa m.; ;e ne randol r i patternshavebeen storedon it and let Yibe its after storingthe ith patternon it. Obviously,Y1 obtainthe distributionof Yiwhen i >1, notethat activeconnectionsbelongto the intersectionof a y chosen subset of m comections among all n comections of the unit, and its remaininginactiveconnections(a set with n – zi–l elements,wherezi-l is the binding constellationat the previous step). Therefore, Yi$i >1 is hypergeometricallydistributed(Appendix Al) with parametersm, n – zi-l, and n: P(Yi =ylZi -1 = Zi- 1)= (n-H(::Y)/() (1) BhdtngLayer The constellationsizeZi is then givenby z (2) k= 1 k!lm Let Zbe the numberof patternsstoredon a particular feature unit after p random feature patterns have been storedin the entire memory.Zis binomiallydistributed (AppendixAl) withparametersp and 1~ where~is the numberof units in a featuremap: [ Feat ! MeII1 FeatureMap2 FeatureMaP3 FeahII’chhP4 FIGURI L Erroneousratrleval.A rogua feature unit Is retrieved, # the correct one, whan its blndlng conatallatlon hae moreu b in common with the lnteraactlon oi the ratrlaval cue tlonsthan the binding constallatlonof the correct unit. Let Z be the bindingconstellationof a particularfeatureunitafterp patternshavebeen storedin the memory. It can be shown(AppendixA.2) that Convergence-Zone Episdoci Memory: Analysis and Simulations 1021 The size of the totalbindingconstellationactivatedduring retrievalis obtainedby takingthis intersectionover the bindingconstellationsof all c retrievalcues. The numberof units in commonbetween a potential rogueunitandthe c retrievalcuesis denotedby RC+l and is also hypergeometricallydistributed, however with parametersz, XC,and n, becausewe cannotassumethat the rogue unit has at least m units in commonwith the cues: “z)=n(’-(’-ap)ad ‘4) ‘Ur(z)=n(%)p(+ap) ( 1– +n(lz–1) m(2n – m – 1) p (5) 1)~ ) “ Initially,when no patternsare stored,the bindingconstellationis zero and it will convergeto n as more patternsare stored(sinceO< 1– # < 1).Becausethebases of the exponential in the varianceof Z are smallerthan 1, the variancewill go to zero when p goes to infinity. Therefore, in the limit the binding constellationwill coverthe entirebindinglayer with probability1. The bindingconstellationof a featureunit, giventhat at leastonepatternhasbeen storedon it, is denotedby ~. This variable representsthe binding constellationof a retrieval cue, which necessarilymust have at least one pattern storedon it (assumingthat the retrievalcues are valid).Theexpectedvalueand varianceof~ followfrom (4) and (5): n(n – P(RC+ ~= rlZ = Z,XC = xc) = (’3(:3/(3 (9) The correctunit in a retrievalmap (i.e. in a featuremap where a retrieval cue was not presented and where a featurevalueneedsto be retrieved)will receivean activationX.+l, becauseit also has at least m units in common with the retrieval cues. The correct unit will be retrieved if XC+1> Rc+ 1. Now, XC+l and RC+l differ only in the last intersectionstep, where XC+]depends on ~ and XC,and Rc+l depends on Z and XC.Since l?(~) > E(Z)((4) and (6)), E(XC+1)> E(R~+1),~d the correctunit will be retieved most of the time, although this advantage gradually decreases as more patterns are stored in the memory. In each feature map there are (~ – 1) potential rogue units, so the conditional probability of successful retrieval is (1 – P(RC+l> xc+~Ixc+ 1, Z,xc))(f - 1)>not addressing tie-breaking. Unfortunately,it is very difficult to compute p~wce~~, the unconditionalprobability of successful retrieval, because the distributionfunctions of Z, XC,XC+land RC+l are not known. However,it is possibleto derive bounds for p,ua$, and show that with reasonable valuesfor n, m, f, and p, the memoryis reliable. ‘(z)=m+(n-m)(+a’)md “) Var(2) ‘@-mH’-l(l-@-m(l-ap-’) +(n–m)(n–m–1) ( 1– m(2n – m – n(n – 1) ‘–1 1)~ ) “ (7) Note that the expectedvalue of ~ is alwayslarger than that of Z. Initiallythe differenceis exactlym, and it goes to zero asp goes to infinity(because2 also convergesto n). Leti?! be the bindingconstellationof thejth retrieval cueandletXjbe thenumberof unitsin theintersectionof the tirstj retrievalcues.Clearly,Xl =21 = m. To getXj forj >1, we removefrom considerationthem units all retrievalcuesnecessarilyhavein common(becausethey belongto the same storedpattern),and randomlyselect 2j – m units from the remainingset of n – m units and see how manyof thembelongto the currentintersection of xj–l – m units. This is a hypergeometricdistribution with parametersi] —m, xj–1– m, and n — m: ~(xj=Xjlzj=~j,Xj–1 ‘Xj - 1) 4. LOWER BOUNDFOR MEMORYCAPACITY Memorycapacitycan be definedas the maximumnumber of patternsthat can be storedin the memoryso that the probabilityof correctretrievalwith a given number of retrievalcues is greaterthan a (a constantclose to 1). In thissection,a lowerboundforthechanceof successful retrievalwill be detived.The analysisconsistsof three steps:(1) boundsfor the numberof patternsstoredon a featureunit;(2)boundsforthebindingconstellationsize; and (3)boundsfor the intersectionsof bindingconstellations.Givenparticularvaluesfor the systemparameters, and ignoring dependenciesamong constellations,it is then possibleto derive a lower bound for the capacity of the model. 4.1. Numberof PatternsStoredon a FeatureUnit SinceZis binomiallydistributed(withparametersp and 1022 M. h4011and R. iUiikkulainen ~moffbounds(AppendixAl) can be applied: I/f), ( PI ( unit, which is equivalentto selecting ki units from the bindinglayer at random.Let ~E be the expectedsize of the finalbindingconstellation,estimatedafter v binding unitshave been selected.Then @=z’,+@-z’+(l-Y~ (lo) =n PI ( uationsgive the probabilitythat Zis more than i32foff its mean. The parameters 61 and 62 deterr e the tradeoff between the tightness of the bound and the probability of satisfying them. If bound re desiredwith a given probabilityf?,the right hand ss of (10) and (11) are made equal to /3 and Solve( m i$l and 62. The lower and upper bound for the nu m of patternsstoredon a featureunit then are These dl~ a (1– ~1)~ if a solutionfor 61exists il (12) Ootherwise { (1 + ~z)~. Given ture U I“f a solutionfor til exists 1+(1 -61)P; j,= 1 otherwise (14) i = 1+(1+ 62)P4. f (15) , (18) = J!@lz’,- ~= Z’v- ~) ‘n-(n-z’,-d+) )(+’-’ n –(n–Z’v–1) (13) it at least one patternhas been storedon a feathe boundsbecome () ki-v (n –Z’v) 1– ; whereZ’Vis the size of the bindingconstellationformed by the first v selected units. ObviouslyZ’v is equal to Z’,-l or exactl one larger, and the expected increase of Z’v is 1– J~. Since ~E–1 depends stochastically only on Z’v–1,the expectedvalueof ~E, given~E–1,is = iu = – () ~ ki-v+ 1 1– ; (19) =.$_,. Therefore,E(AEIZ$E.~)=zE- ~and the sequenceof variables GE,..., Z: is a martingale (see Appendix A.3). Moreover, it can be shown (Appendix A.4) that !~E–~B-ll s 1, so that bounds for the final binding constellationZ can be obtainedfrom Azuma’sinequalities. For the lower bound, the martingale2$E,...,Z~, (with length kil) is used, and for the upper bound, (18) and notingthat Z9 ...7Z~U(with length kiu). Using Z=Z~, for the lower bound and Z=Z~ti for the upper bound,Azuma’sinequalitiescan be writtenas 4.2. Si of the BindingConstellation * Instead of choosing exactly m different binding units for the indingconstellationof a featuremap unit, considerth processof randomlyselectingknot-necessarilydistinctunits in such a way that the expectednumberof differe t unitsis m. This will makethe analysiseasierat the cos of larger variance,but the boundsderivedwill alsobe validfor the actualprocess.To determinek, note that the numberof unitsthatdo not belongto thebinding represe tation is equalton – m on average ‘P(z+(+)’’)-’fi) (20) t n d () ~k l—– n = n —m. Solvin fork, we get ~= inn – ln(n– m) inn –ln(n–1)” $ (16) (17) Note th t k is almostequaltom for large n. Let assumei patternsare storedon the featuremap Se -h’n,A>0. (21) After derivinga value for A based on the desired confidencelevel& thefollowinglowerandupperboundsfor Z are obtained: “=n(+r)- ’22) Convergence-Zone Episdoci Memory: Analysis and Simulations 1023 After derivinga value for X based on the desired confidence,the followingupperboundfor Xj is obtained: ‘u=n(l-(1-3ki”)+A@ ’23) The correspondingboundsfor ~ are ~u=n(l-fl-:~k’”l+~lfi. [ [ n) Xj,~= m + (25) v ‘“ ) The process of forming the intersectionof c binding constellationsincrementallyone cue at a time can also be formulatedas a martingaleprocess.To see how,considerthe processof formingan intersectionof two subsets of a common supersetincrementally,by checking (one at a time) whether each element of the first set occurs in the second set. Assumethat v elementshave been checked this way. Let X’, denote the number of elementsfound to be in the intersectionso far, and X: the currentlyexpectednumber of elementsin the final intersection.Then + v (nl–v)(n2–X’V) v n$—v (26) ‘ m)(~u– m) (n – m) +A/Xj-~,~ –m. (31) This bound is computedrecursively,with xl,. = m. Comparingwith the probabilisticformulationof Section 3, note that (xi- ~.U– m)(.Zu– m) is the expectedvalueof the hypergeometricdistribution derived for Xj (8) when ~j and Xj-l are at their upper bounds. As the last step in the analysisof bindingconstellations, the boundsfor XC+land R.+l must be computed. When X. is at its upper bound, the intersectionis the largest,and a potentialrogueunit has the largestchance of taking over. In this case, a lower bound for XC+lis obtainedby carrying the intersectionprocess one step further,and applyingAzuma’sinequality: p( x c+ ~ ~ m+ (xc,u–m)(?j –m) – AJ~rn (n – m) 1,U — m ) (32) whichresultsin x +1~=m+ wherenl, nz and n, are the sizesof the first,second,and the superset.As shown in AppendixA.5, the sequence x;, ..., X:, is a martingale.In addition,if nl + nz – 1< n,, IX:–X~-ll = 1, and Azuma’s inequalitiescan be applied. The aboveprocessappliesto formingthe intersection of bindingconstellationsof retrievalcueswhenthe intersectionin the previousstep is chosenas the firstset, the bindingconstellationof thejth cue as the secondset,the bindinglayeras the commonsuperset,andthem unitsall retrieval cues have in common are excluded from the intersection.In this case nl ‘Xj– I,. – (n – m) 4.3. Intersection of Binding Constellations XE=X, (Xj- c, (xc,u–m)(.Z1–m) – (n – m) A&=7il. (33) If the resultinglowerboundis smallerthan m, m can be used instead.Similarly,to get the upperboundfor R.+l, one more intersectionstep needs to be carried out, but this time the m units are not excluded: p( R.+ 1 z ‘-+ n A& Js and the upperboundbecomes xc,u Zu ‘C+1,U= ~ —+ e- AK. A2’2, A > (), (34) (35) (27) 4.4. DependenciesBetweenBindingConstellations nz = Zu—m (28) n~= n —m, (29) wherexj–l,Uis an upperboundforXj–l.Azuma’sinequality can be appliedifxj–l, U+ 2U– 1< n (whichneedsto be checked).Using(27)–(29)in (26)and notingthatX~, = Xj – m, Azuma’sinequalitybecomes P(x:’ 2X: – A@)= ( P Xj km+ — < e–A2n, _ (Xj 1,. – m)(%–‘) (n – m) A > (). + ~~xj-, ~ – m ) (30) Strictlyspeaking,the aboveanalysisis valid only when thebindingconstellationsof each cue are independent.If the same partial pattern is stored many times, the constellationswill overlap beyond the m units that they necessarily have in common. Such overlap tends to increasethe size of the finalintersection. In mostcasesof realisticsize,however,the increaseis negligible.Thenumberof featuresVin commonbetween two randompatternsof c featureseach is given by the binomialdistribution: M. A4011and R. Miikkulainen 1024 $ ~DÊ‹ ‘l-(l-;)C-C(;)(l-;)C-’> ’37) The ch ce that two randompatternsof c featureshave more an one featurein commonis P(v > )=1 –P(V=O)– ZJ(V=1) F which~an be rewrittenas ll(v>l)=l– F ( a(+)’ 1+ ’38) Thisch ceisnegligible forsufficientlylargevaluesoff For ex pie, already when ~ = 5000 and c = 3, the chance is 1.2 X 10–7,and can be safely ignoredwhen compu$nga lower boundfor the capacity. 5.1. A Coarse-GrainedModel 4.5.0 taining the LowerBound It is n possibleto use (10)-(15), (17) and (20)-(25) and (3 )–(35)to derivea lowerboundfor the probability of suc essful retrievalwith given systemparametersn, ~ m, f, t, c, and p, where t is the total numberof feature maps. e retrieval is successful if rC+l,U,the upper bound for R.+l, is lower than x~+l,~,the lower bound for XC. ~l. Under this constraint, the probability that none o:: the variablesin the analysisexceedsits bounds is a lovverboundfor successfulretrieval. Obtaining the upper bound for Xc involvesbounding ‘c – 1 variables:Zand ~ for the c cues and Xc for the c—l intersections. Computing XC+l,land r=+l,~each involvet bounding3 variables(Z,Z, and Xc+l;Z,~, and R.+1).‘~ ere are t – c maps, each with one xC+l,lbound and f -- 1 different r.+l,. bounds (one for each rogue unit). :%e total number of bounds is therefore 3C– 1 + ‘f( t – c). Setting the righthand sides of the inequa..ities (10), (11), (20), (21), (30), (32) and (34) equaltto a smallconstant/3,a lowerboundfor successful retriewJ is obtained: Psuccess > U – 6)3’-1‘3f(’-c)7 which,lforsmall13,can be approximatedby I p,ucce,,>1 – (3’–1+‘f(t–‘))/3. thereshouldbe. Thenumbersof neuronsandconnections in the rat hippocampalformationhave been used as a guidelinebelow. Althoughthe human hippocampusis certainly larger than that of the rat, the hippocampus, being phylogeneticallyone of the oldest areas of the brain,is fairlysimilaracrosshighermammalsand should give an indicationof the ordersof magnitudeinvolved. More importantly,the convergence-zonemodel can be shownto applyto a widerangeof theseparameters.Two cases at opposite ends of the spectrum are analyzed below: one where the number of computationalunits and connectionsis assumedto be limited, and another that is based on a large number of effectiveunits and connections. (39) (40) On b e other hand, if it is necessaryto determinea lower und for the capacity of a model with given n, /3is first m,f, t, and c at a givenconfidencelevelp,-, obtain d from (40), and the numberof patternsp is then until one of the bounds(10), (11), (20), (21), incre (30), ( 2) or (34) is exceeded,or rc+l,ubecomesgreater than 1x +1,1. 5. EX+MPLE:MODELINGTHE HIPPOCAMPAL MEMORYSYSTEM As an Iexample,let us apply the above analysisto the hippodampalmemory system.It is difficultto estimate how camrsethe representationsare in sucha system,and howmanyeffectivecomputationalunitsand connections First note that each unit in the modelis meantto correspondto a verticalcolumnin the cortex.It is reasonable to assume feature maps with 1Os of such columns (Sejnowski& Churchland,1989).Each input activates a local area on the map, includingperhaps 102columns abovethethreshold.Therefore,thefeaturemapscouldbe approximatedwith 104computationalunits.Therewould be a minimumof perhaps4 suchmaps,of which3 could be used to cue the memory. There are some 1Osprimaryexcitatorycellsin the rat hippocampalformation(Arnaralet al., 1990;Bosset al., 1985,1987;Squireet al., 1989).If we assumethat functionalunits contain 102of them, then the model should have 104bindingunits.Only about0.5–2.5%of the hippocampal neurons are simultaneouslyhighly active (0’Reilly & McClelland,1994),so a bindingpatternof 102units wouldbe appropriate.Assurning that all computationalunits in the featuremaps are connectedto all unitsin thehippocampus,thereare a totalof 108afferent connectionsto the hippocampus,and the numberof such connectionsper verticalcolumnin the featuremapsand per excitatoryneuronin the hippocampusis 102,both of which are small but possible numbers (Arnarrdet al., 1990). If we selectf= 17000,n = 11500,m = 150,and store 1.5 x 104patternsin the memory,2*and Xj-l,u are less thm (.@, the chance of partial overlapof more than 1 featureis lessthan 1.04X 10-8,andthe analysisaboveis valid. Settingd = 1.96 X 10-7 yieldsbounds rc+l,u< Xc+l,lWithp~-~~ > 99%.In other words, 1.5 X 104 tracescan be storedin the memorywith 99%probability of successfulretrieval.Sucha capacityis approximately equivalentof storing one new memory every 15s for 4 days, 16h a day, which is similarto what is required from the humanhippocampalsystem. 5.2. A Fine-GrainedModel It is possiblethat a lot moreneuronsand connectionsare involved in the hippocampal memory system than Convergence-Zone Episdoci Memory: Analysis and Simulations assumedabove.For example,let us assumethat each of the verticalcolumnsin the featuremapsis computationally distinctive,that is, there are 106units in the feature maps.Let us furtherassumethat the systemhas 15 feature maps, 10 of whichare used to cue the memory,and the bindinglayerconsistsof 105units,with 150used for each binding pattern. Assuming full connectivity between the feature units and the binding units, there are 1.5 X 1012connectionsin the system,which might be possibleif it is assumedthat a large numberof collateralsexist on the inputsto the hippocampus. Applyingthe above analysisto this memory configuration, 0.85 X 108patterns can be stored with 9970 probabilityof successfulretrieval. In this case, :U and xj–l,vare less than ($n, the chanceof partialoverlapof morethan 1featureis lessthan0.45 X 10–10,and setting 8 = 0.5X 10-’ yieldsboundsrc+l,u<x.+,,, withp,uu.,, > 99%.In otherwords,a newtracecouldbe storedevery 15s for 62 years, 16h a day, without much memory interference. Thiskindof capacityis probablyenoughfor the entire human lifetime, and exceeds the requirementsfor the hippocampal formation. With such a capacity, there wouldbe no need to transferrepresentationsto the neocorticalmemory system.One conclusionfrom this analysisis thatthe hippocampalformationis likelyto havea morecoarse-grainedthan fine-grainedstructure.Another conclusionis that it is possiblethatthe neocorticalmemory componentmay alsobe basedon convergencezones. Theresultis interestingalsofromthe theoreticalpointof view,becausethe lowerboundis an orderof magnitude higherthan the numberof units in the system,and three orders of magnitudehigher than the numberof binding units. To our knowledge,this lower bound is already higherthan whatis practicallypossiblewithotherneural networkmodelsof associativememoryto date. 6. EXPERIMENTALAVERAGECAPACITY The analysisabovegivesus a lowerboundfor the capacity of the convergence-zonememory;the average-case capacitymay be much higher.Althoughit is difficultto derivethe averagecapacitytheoretically,an estimatecan be obtainedthroughcomputersimulation.Not all configurations of the model can be simulated,though. The model has to be small enough to fit in the available memory,while at the same time fulfillingthe assumptionsof the analysisso thatlowerboundscan be obtained for the same model. To findsuchconfigurations,firstthe featuremap parameters~,t, and c, and the confidencelevelP are fixedto valuessuchthat,....,, = 99% (40).Second,a valuefor n is chosen so that the model will fit in the available memory.The connectionstake up most of the memory space(evenif each connectionis representedby onebit) and the amountof memoryallocatedfor the featuremap and the bindinglayer activations,the array of patterns, 1025 and the simulationprogramitself is negligible.Finally, the sizeof the bindingpatternm and the maximumnumber of pattemsp is foundsuchthatthe theoreticalbounds yieldrC+l,U< XC+l,I and the partialoverlapis negligible. In the models studied so far, the highest capacity has been obtainedwhen m is only a few percentof the size of the bindinglayer, as in the hippocampus. The simulationprogramis straightforward.The activationsin each map are representedas arraysof integers. The connectionsbetweena featuremap and the binding layerare encodedas a two-dimensionalarrayof bits,one bit for each connection.Before the simulation,a set of p- randompatternsare generatedas a two-dimensional array of pw X t integers.A simulationconsistsof storingthepatternsoneat a timeandperiodicallytestinghow many of a randomly-selectedsubsetof them can be correctly retrievedwith a partialcue. The ‘fine-grained’exampleof Section5.2 is unfortunatelytoo largeto simulate.With 15 X 105X 106= 1.5 X 1012 connections h would require 187.5 GB of memory,which is not possiblewith the currentcomputers. However, the ‘coarse-grained’model has 4 X 17000 X 11500 = 7.82 X 108 one-bit connections, which amounts to approximately 100MB, and easily fitsin availablememory. Several configurationswere simulated,and they all gave qualitatively similar results (Figure 4). In the coarse-grained model, practically no retrieval errors were produceduntil 370000 patternshad been stored. With 375000 patterns, 99% were correctly retrieved, and after that the performance degraded quickly to 94% with 400000 patterns, 71% with 460000, and 23% with 550000 (Figure4). Each run took about two hoursof CPU time on a Cray Y-MP 8/864.From these simulations,and those with other configurationsshown in Figure4, it can be concludedthatthe averagecapacity of the convergence-zoneepisodicmodelmaybe as much as one order of magnitudelarger than the theoretical lowerbound. 7. LESSIS MORE:THE EFFECTOF SPARSE CONNECTIVITY In the convergence-zonemodel so far, the featuremaps have been fully connected to the binding layer. Such uniformconfigurationmakes analysisand simulationof themodeleasier,butit is notveryrerdistic.Bothfromthe pointof viewof modelingthe hippocampalmemoryand building artificialmemory systems, it is important to know how the capacityis affectedwhen the two layers are only sparselyconnected. A seriesof simulationswererun to determinehow the model would perform with decreasingconnectivity.A given percentageof connectionswere chosenrandomly and disabled,and the 99$10 capacitypointof the resulting model was found. The binding patterns were chosen slightlydifferentlyto accountfor missingconnections: 1026 M. Moll and R. Miikkulainen 1.0 I I I a O.e i! 0.6 -in I % 0.4 f ~ *1 7000 n-l 1500 m==150 C.15000 0.2 0.0 m-l 25 c.7omo f.4ooo n+OOOO m=l50 C-.35OO a r -mi=l I 2 1 I 3 stored pilttCITMl (% 4 56 1000OO) 7 8 9. 0 L FtGURE . Experimentalavaragacapacity.The horizontalaxis ehowstha numberof pattamaetorad ins logarithmicacaleof hundredsof thouaan e. The vertical axle indicates the pamanWa of correctly retrieved pattame out of a randomly-selectedeubaat of 500 stored -s (ad-t ~~e SOH mh ~fns). ~ch m*l COIWi*~ of four~m maps, endduringretrieval,thafourth featurewaa ratrfav uaingthafiratthraa aacuaa. The modsisdifferedin theelzaa ofthafeaturarnepa f, bindingIsyereiza n,andblndlngpatternsize = 99%wes35000, that of ntlllap indicatesvaragaeoverthraa simulations.Thathaoratlcal capacity Cofthafirat modalwith p— the sac d 15000, and that of the third 70000. the m inding units were chosen among those binding layer “ts that were connected to all features in the featurepattern.If there were less than m such units,the bindin patternconsistedof all availableunits. Due o high computationalcost of the simulations,a small onvergence-zonememory with ~ = 1000, n = 3000,? = 20, t = 4, and c = 3 was used in these experiments.At each level of comectivity from 100’ZO down to 20%, five different simulationswith different randomcomectivitywere run and the resultswere averaged (Figure5). Sincethe main effectof sparseconnectivity is to limit the number of binding units that are availablefor the bindingpattern,one would expectthat the capacitywouldgo down.However,just the opposite 1000O L__—_LJ 100 L 60 40 60 connectivity rate (%) 20 0 FIGURE 5. Capacity with sparse connectivity. The horizontal axis showe the percentageof connection that were available to form theretrfevai pettams s,andthavarticai axielndicatea thacapeoityattha W%co*level.Aaconmctlvftydecraaaaa, Mnding becomemoreratiabia untii about 30% connactivlty,where there are no longer enough conbacomdrnora fooUaed, and the mtriavai n = 3000,m = 20,t= 4, c = 3, andthe plotie an averageof five nation* to form binding pattarneof m unite. The model had f= 1000, Simufationa. 1027 Convergence-Zone Episdoci Memory: Analysis and Simulations turnsout to be true:the fewerconnectionsthe modelhad available(downto 30% connectivity),the morepatterns couldbe storedwith 99~oprobabilityof correctretrieval. Theintuitionturnsoutto be incorrectfor an interesting reason:sparseconnectivitycausesthe bindingconstellations to becomemore focused,removingspuriousoverlap that is the main causeof retrievalerrors.To see this, considerhow~ (thesizeof the bindingconstellationof a featureunit after at least one patternhas been storedon it) growswhen a new patternis storedon the unit. A set of bindingunitsis chosento representthe pattern.When only a fractionr of the connectionsto the bindinglayer are available,there are only r~nbindingunits to choose from, comparedto n in the fully connectedmodel.It is therefore more likely that a chosen binding unit is already part of the binding constellationof the feature unit, and this constellationthereforegrows at a slower pace than in the fully connectedcase. The expectedsize of the constellationafter p patternshave been storedin the memorybecomes(from (6)) To testwhetherthe convergence-zonearchitecturecan modelsuch “human-like”memorybehavior,the storage mechanismmustbe changedso thatit takesinto account similaritiesbetween stored patterns. So far the spatial arrangement of the units has been irrelevant in the model;let us nowtreat the featuremapsand the binding layer as one-dimensionalmaps (Figure6). The binding layer is dividedinto sections,one for each featuremap. The bindingunits are selectedstochasticallyfrom a distribution function that consists of one componentfor each of the sections.Each componentis a flatrectangle with a radius of u units aroundthe unit whose location correspondsto thelocationof the activefeaturemapunit. The distributionfunctionis scaled so that on averagem units will be chosenfor the entirebindingpattern. The radius u of the distributioncomponentsdeterminesthe varianceof the resultingbindingpatterns.By settingu a (n/t – 1)/2a uniformdistributionis obtained, which gives us the basic convergence-zonemodel. By makingu smaller,the bindingrepresentationsfor similar inputpatternsbecomemore similar. Severalsimulationswithvaryingdegreesof a wererun to see how the error behavior and the capacity of the modelwouldbe affected(Figure7). A configurationof 4 feature maps (3 cues) with 20000 bindingunits, 400 feature map units, and a 20-rmitbinding pattern was used. The results show that when the binding patterns are mademoredescriptive,the errorsbecomemoreplausible:whenan incorrectfeatureis retrieved,it tendsto be close to the correct one. When the binding units are selectedcompletelyat random (u = (rzh – 1)/2), the averagedistanceis 5000;when u = 20, it dropsto 2500; and whena = 5, to 700.Such ‘‘human-like”behavioris obtainedwitha slightcostin memorycapacity.Whenthe patternsbecomelessrandom,thereis moreoverlapin the encodings, and the capacity tends to decrease. This effect, however,appearsto be rather minor, at least in the smallmodelssimulatedin our experiments. ‘(z)=m+(r-+%)p) “’) which decreaseswith comectivity r as long as there are at leastmbindingunitsavailable(i.e. rk > m). Whenthe binding constellations are small, their intersections beyondthe m commonunitswillbe smallas well.During retrieval it is then less likely that a rogue unit will become more active than the correct unit. The activity patternsin the sparselyconnectedsystemare thus better focused,and retrievalmore reliable. When there are fewer than m bindingunits available, the capacity decreases very quickly. In the model of Figure5 (with m = 20), the averagenumberof binding unitsavailableis 45 for 3590connectivity,24 for3090,12 for 25%,and5 for 20%.In otherwords,theconvergenzezone episodic memory performs best when it is connected just enough to support activationof the sparse binding patterns. This is an interestingand surprising result, indicatingthat sparse connectivityis not just a necessitydue to limitedresources,but also givesa computationaladvantage.In the contextof the hippocampal memorysystemit makessensesinceevolutionwouldbe likely to producea memoryarchitecturethat makesthe best use of the availableresources. Binding Layer and DistributionFunction 1 I [1- When the bindingpatternsare selectedat randomas in the model outlined,when errors occur duringretrieval, the resulting feature values are also random. Human memory,however,rarely producessuchrandomoutput. Human memoryperformanceis often approximate,but robust and plausible.That is, when a feature cannotbe retrievedexactly,a valueis generatedthatis closeto the correctvalue. m I I I I I I I 1 ul— I-III-111 uuLmnAu Feature Map 1 8. ERRORBEHAVIOR r J Feature Map 2 Feature Map 3 Feature Map 4 FIGURE6. Selecting a binding rsprsssntstionfor an input psttern. The binding layer Is dlvidsd into eeetlone. On average, m blndlng unite wIII be selected from a dlatrlbution function that conaista of one rectangularcomponent for each section, csntersdaround the unitwhose locationeorrsapondatoths location of the input feature unit. If the center is close to the boundaryof the esetlon, the distribution componentwraps around the ssctlon (aa for Feature Map 1). Ths parameter u determines the radiua of the components,and therebythe variance in the binding pattern.Thie particularfunction had u = 2, that la, the width of the individual components was 2 x 2 + 1 = 5. Ths model parameterswere f= 10, n = 40, m = 6. 1028 M. Moll and R. Miikkulainen 6000 70000 5000 60000 50000 4000 2000 20000 1000 n 50 1000O 40 30 20 10 0 0 a FIGURE~. Errorbehaviorwith deecriptlvebhflng patterns.lhe Iowarcuwe(wfth ecefeat left)ehowsthesveraga distanceof incorrectfyretrievaqfeaturasss sfunction of the distributionradiusu. The uppercurve(with scale st right) indlceteethe correepondlngcapacityst the 99% $onfidencelevel.When the bindingpatternis iaea random,the erroneousunitetend to be cloaarto the correctonce. The model hsd n= q0,000,f= 400,m= 20, t= 4, c= 3. Retrievalof all storedpatternswas checked.The plote indlceteaveregaaoversix simulations. 9. DISCUSSION The co~vergence-zoneepisodicmodel as analyzedand simula d aboveassumesthat the featurepatternsdo not overlapmuch,and thatthe patternis retrievedin a single iteratio . Possiblerelaxation of these assumptionsand the effe 1 ts of such modificationsare discussedbelow. 9.1. PafternOverlap The th~oreticallower-boundcalculationsassumed that the ch ce of overlap of more than one feature is very small,$ d this was also true in the models that were analyzed and simulated.However,this restrictiondoes not limit the applicabilityof the model as much as it mightfirstseem,for two reasons: First,it mightappearthat certainfeaturevaluesoccur more often than others in the real world, causingmore overlap~than there currentlyis in the model. However, notethdtthe inputto the modelis representedon feature maps.One of the basicpropertiesof both computational and biologicalmaps is that they adapt to the input distribution by magnifyingthe dense areas of the input space. other words, if some perceptualexperienceis morefrt uent,moreunitswillbe allocatedforrepresenting it sp that each unit gets to respondequallyof?ento inputs Kohonen, 1989;Merzenichet al., 1984;Ritter, 1991). erefore,overlapin the featuremap representations is, i significantlymore rare than it may be in the absolu~ experience:the minordifferencesare magnified andtheImpresentations beeomemoredistinguishableand more ~emorable. Second, as discussed in Section 4.4, the chance of overlapof more than one feature is clearly small if the feature values are independent. For example in the coarse-grainedmodel, at the 9990 capacity point, on averagethere were 88 otherpatternsthat sharedexactly one commonfeaturewith a givenpattern,whereasthere were only 0.0078 other patternsthat shared more than one feature. To be sure, in the real world the feature values across maps are correlated,which would make overlap of more than one featmremore likely than it currentlyis in the model. While it is hard to estimate how common such correlationswould be, they could grow quite a bit before they become significant.In other words, the conclusionsdrawn from the current model are valid for at least small amounts of such correlations. 9.2. ProgressiveRecall The retrievalprocess adoptedin the convergence-zone model is a version of simple recall (Gardner-Medwin, 1976), where the pattern is retrieved based on only direct associationsfrom the retrieval cues. In contrast, progressive recall is an iterative process that uses the retrievedpattern at each step as the new retrieval cue. Progressiverecall could be implementedin the convergence-zonemodel.Supposefeaturesneedto be retrieved in severalmaps.Afterthe firstretrievalattempt,the right featureunit will be clearlyidentifiedin most maps. For the secondretrievaliteration,all theseunits can be used as cues,andit is likelythata patternwillbe retrievedthat is closerto the correctpatternthanthe one obtainedwith just simple recall. This way, progressiverecall would cause an increase in the capacity of the model. Also, 1029 Convergence-Zbne Episdoci Memory: Analysis and Simulations such a retrievalwould probablybe more robustagainst invalid retrieval cues (i.e. cues that are not part of the patternto be retrieved).The dynamicsof the progressive recall process are difficult to analyze (see Gibson & Robinson(1992)for a possibleapproach)and expensive to simulate,and simplerecall was thus used in this first implementationof the convergence-zonemodel. Above, a theoreticallower bound for the capacityof simplerecall withina givenerrortolerancewas derived, and the averagecapacitywas estimatedexperimentally. Two other types of capacitycan also be definedfor an associativememorymodel(Amari, 1988).The absolute capaci~ refers to the maximumnumberof patternsthat the networkcan representas equilibriumstates,and the relative capacity is themaximumnumberof patternsthat can be retrievedby progressiverecall.The lowerbound for the simplerecallderivedin this paperis also a lower bound for the absolutecapacity,and thus also a lower bound for the relative capacity, which may be rather difficultto derivedirectly. 10. RELATEDWORK Associativememoryis one of the earliestand still most activeareas of neuralnetworkresearch,and the convergence-zonemodel needs to be evaluatedfrom this perspective.Althoughthe architectureis mostlymotivated by the neurosciencetheory of perceptualmaps, hippocampalencoding,and convergencezones,it is mathematically most closely related to statistical associative memories and the sparse distributed memory model. Contrastingthe architecturewith the Hopfieldnetwork and modified backpropagationis appropriatebecause these are the best-knownassociativememorymodelsto date. Eventuallyconvergence-zonememorymightserve as a modelof humanepisodicmemorytogetherwiththe trace featuremap modeldescribedbelow.Althoughit is an abstractmodelof the hippocampalsystem,it is consistentwith the morelow-levelmodelsof the hippocampal circuitry,and complementsthem well. activation to the network, and the unit activations are updated asynchronouslyuntil they stabilize. The finalstableactivationpatternis then taken as the output of the network.In the convergence-zonemodel, on the other hand, retrieval is a four-step version of simple recall: first the activationis propagatedfrom the input maps to the bindinglayer, thresholded,and then propagated back to all feature maps, where it is thresholded again. This algorithmcan be seen as a computational abstractionof an underlyingasynchronousprocess.In a more low-levelimplementation,thresholdingwould be achieved through inhibitory lateral connections. The neuronswould update their activationone at a time in random order, and eventually stabilize to a state that representsretrievalof a pattern. Thecapacityfor the Hopfieldnetworkhasbeen shown theoreticallyto be N/41nN(Amit, 1989; Hertz et al., 1991; Keeler, 1988; McEliece et al., 1986) and experimentallyabout 0.15N (Hopfield, 1982). For the convergence-zonemodelsucha simpleclosed-formformula is difficultto derive,becausethe modelhas many more parameters and correlationsthat complicatethe analysis.However,as was shownabove,a lowerbound canbe derivedfora givensetofparameters.Suchbounds and alsoexperimentalsimulationsshowthatthe capacity for the modelcan be ordersof magnitudehigherthanthe numberof units,whichis ratheruniquefor an associative memoryneuralnetwork. However,it shouldbe noted that in the convergencezone model, each pattern is much smaller than the network.In a Hopfieldnetwork of size N each pattern containsNbits of information,whilein the convergencezonemodeleach patternconsistsof onlyt features.Each featurecan be seen as a numberbetween 1 andj correspondingto its locationin the featuremap. To represent such a number,210g~bits are needed,and a featurepattern thus containst210g~bits of information.Compared to the Hopfield model and other similar associative memorymodels,the informationcontentof each pattern hasbeentradedoff forthe capacityto storemoreof them in a networkof equal size. 10.L The HopfieldModel The Hopfieldnetwork (Hopfield,1982)was originally developedto modelthe computationalpropertiesof neurobiologicalsystemsfrom the perspectiveof statistical mechanics(Amit et al., 1985a,b; Kirkpatrick& Shernngton, 1988;Peretto & Niez, 1986).The Hopfieldnetworkis characterizedby full connectivity,exceptfroma unit to itself.Patternscan be storedone at a time,but the storagemechanismis rather involved.To storean additionalpatternin a networkof, say,Nunits, theweightsof all the N X (N – 1) connectionshave to be changed.In contrast,the convergence-zonememoryis moresparsein that only t X m < n <<~weightshave to be modified. A pattern is retrieved from the Hopfield network through progressive recall. The cues provide initial 10.2.Backpropagationand RelatedModels Several models of associativememory have been proposedthatarebasedon backpropagationor similarincremental learning rules (Ackley et al., 1985; Anderson et al., 1977;Knapp & Anderson, 1984;McClelland& Rumelhart, 1986a, b). However, these models suffer from catastrophicinterference,which makes it difficult to apply them to modelinghuman associativememory (Grossberg,1987;McCloskey& Cohen, 1989;Ratcliff, 1990).If the patterns are to be learned incrementally, withoutrepeatingthe earlier patterns,the later patterns in the sequencemust erase the earlier associationsfrom memory. Several techniqueshave been proposed to alleviate 1030 forgetti~g,includingusing weightswith differentlearning ratqs (Hinton & Plaut, 1987), gradually including new eixamples and phasing out earlier ones (Hether@gton& Seidenberg,1989),forcingsemidistributed hiqlden-layerrepresentations(French, 1991),concentra~g changeson novelparts of the inputs(Kortge, 1990), iusing units with localized receptive fields (KmschMce, 1992),and addingnew units and weightsto encode ~newinformation(Fahlman, 1991; Fahlman & Lebiere~1990). In these models, one-shot storage is still notlpossible,althoughthe numberof requirediterations is !reduced,and old informationcan be relearned very fast. At this point it is also unclear whetherthese architechueswould scale up to the numberof patterns appropriatefor humanmemory. 10.3.S~tistical AssociativeMemoryModels The convergence-zonemodel is perhaps most closely related to the correlation matrix memory (Kohonen, 1971, 1P72;see also, Anderson, 1972;Cooper, 1973). In this model there are a number of receptors (correspondin~ to feature maps in the convergence-zone model) i?hatare comected to a set of associators(the bindingilayer).The receptorsare dividedinto key fields wheret.lperetrievalcue is specified,and datafieldswhere the re~eved pattern appears. Each key and data field correqxlmdsto a feature map in the convergence-zone model.Insteadof one value for each field,a wholefeaturemaprepresentsthe field,modelingvalue-unitencoding in I biological perceptual systems. There is no distinctjbnbetween key and data fields either; every feature map can functionas a key in the convergencezone model. othe~ related statisticalmodels include the learning matrix @teinbuch,1961)and the associativenet (Willshawet Id., 1969),whichareprecursorsof thecorrelation matrixmodel.Thesehad a uniformmatrixstructureconnectinginputsto outputsin a singlestep.Suchmodelsare simpleandeasyto implementin hardware,althoughthey do not have a very high capacity(Faris& Maier, 1988; Palm, 1980,1981).Otherassociativematrixmodelshave relied on progressiverecall, and thereforeare similarin spirittoltheHopfieldnetwork.The networkof stochastic neumnslof Little & Shaw (1975)and the modelsof the hippocampus of Gardner-Medwin (1976) and Marr (1971) fall into this category.Progressiverecall gives them a potentially higher capacity, which with high connectivity exceeds that of the Hopfield network (Gibsod & Robinson, 1992). They are also more plausibk in that they do not requireN*internalconnections (where N is the number of units in the network), althoughthe capacitydecreasesrapidlywith decreasing connectivity. M. Moll and R. Miikkulainen 10.4.SparseDistributedMemory * TheSparseDistributedMemorymodel(SDM;Kanerva, 1988) was originally developed as a mathematical abstractionof an associativememory machine. Keeler (1988) developeda neural-networkimplementationof the idea and showedthat the SDM comparesfavorably with the Hopfield model; the capacity is larger and the patterns do not have to include all units in the network. It is possibleto give the convergence-zonememory model an interpretationas a special case of the SDM model:A fully-connectedtwo-layernetworkconsisting of a combinedinput/outputlayer and a hiddenlayer. In alternatingsteps, activityis propagatedfrom the input/ outputlayerto the hiddenlayerand back.Seenthis way, every unit in the input/outputlayer correspondsto one feature map in the convergence-zonemodel, and the hiddenlayer correspondsto the bindinglayer. It can be shownthat the capacityof the SDM is independentof the sizeof theinputioutputlayer.Moreover,if the size of the input/outputlayer is fixed, the capacity increaseslinearlywiththe sizeof the hiddenlayer.These results suggestthat similarpropertiesapply also to the convergence-zoneepisodicmemorymodel. 10.5.TraceFeatureMaps The Trace Feature Map model of Miikkulainen(1992, 1993)consistsof a self-organizingfeaturemap wherethe spaceof allpossibleexperiencesis firstlaidout.Themap is laterallyfullyconnectedwith weightsthat are initially inhibitory.Traces of experiencesare encodedas attractors usingthese lateralconnections.When a partialpattern is presented to the map as a cue, the lateral connections move the activation pattern around the nearestattractor. The Trace FeatureMap was designedas an episodic memorycomponentof a storyunderstandingsystem.The main emphasiswas not on capacity,but on psychologically valid behavior. The basins of attraction for the tracesinteract,generatingmany interestingphenomena. For example,the later traceshave largerattractorbasins andare easierto retrieve,anduniquetracesare preserved even in an otherwiseoverloadedmemory.On the other hand,becauseeach basin is encodedthroughthe lateral comectionsof severalunits,the capacityof the modelis several times smaller than the number of units. Also, there is no mechanismfor encodingtruly novel experiences; only vectors that are already representedin the map can be stored. In this sense, the Trace Feature Map model can be seen as the cortical componentof the human long-termmemory system.It is responsible for many of the effects,but incorporatingnovel experiencesintoits existingstructureis a lengthyprocess,as it appearsto be in humanmemorysystem(Halgren,1984; McClellandet al., 1995). ———–.—.—- -. 1031 Convergence-Zone Episdoci Memory: Analysis and Simulations 10.6.Modelsof the Hippocampus A largenumberof modelsof the hippocampalformation and its role in memoryprocessinghave been proposed (Alvarez & Squire, 1994; Gluck & Myers, 1993; McNaughton & Morris, 1987; Marr, 1971; Murre, 1995;O’Reilly& McClelland,1994;Read et al., 1994; Schmajuk& DiCarlo, 1992;Sutherland& Rudy, 1989; Teyler & Discenna,1986;Treves & Rolls, 1991,1994; Wickelgren, 1979). They include a more detailed description of the circuitry inside hippocampus,and aim at showinghow memorytraces could be createdin such a circuitry.The convergence-zonemodel operates at a higherlevelof abstractionthan thesemodels,and in this senseis complementaryto them. Man (1971)presenteda detailedtheoryof the hippocampalformation,includingnumericalconstraints,capacity analysis, and interpretationat the level of neural circuitry.The input is based on local input fibersfrom the neocortex,processedby an input and outputlayer of hippocampalneuronswith collateralconnections.Recall is based on recurrentcompletionof a pattern.The convergence-zonememory can be seen as a high-level abstractionof Marr’s theory, with the emphasison the convergence-zonestructurethat allowsfor highercapacity than Marr predicted. Several authors have proposeda role for the hippocampus similar to the convergence-zoneidea (Alvarez & Squire, 1994;McClellandet al., 1995;Murre, 1995; Teyler & Discenna,1986;Treves & Rolls, 1994;Wickelgren, 1979).In these models,hippocampusitself does not store a completerepresentationof the episode,but acts as an indexingarea, or compressedrepresentation, that binds togetherparts of the actual representationin the neocorticalareas.Treves& Rolls(1994)alsopropose backprojectioncircuitry for accomplishingsuch recall. Convergence-zonememoryis consistentwith suchinterpretations,focusing on analyzingthe capacity of such structures. One of the assumptions of the convergence-zone memory, motivated by recent results by Wilson & McNaughton (1993), is that the binding encoding is sparse and random.The model by O’Reilly& McClelland (1994)showshow the hippocampalcircuitrycould form such sparse,diverseencodings.They exploredthe tradeoffbetweenpattern separationand completionand showedthat the circuitrycouldbe setup to performboth of these tasks simultaneously.The entorhinal-dentate– CA3 pathwaycould be responsiblefor formingrandom encodingsof traces in CA3, and the separationbetween storageand recall could be due to overalldifferencein the activitylevel in the system. 11. FUTUREWORK Futurework on the convergence-zoneepisodicmemory modelwill focuson three areas.First,the modelcan be extendedin severalwaystowardsa moreaccuratemodel of the actualneuralprocesses.For instance,lateralinhibitory connectionsbetween units within a feature map couldbe addedto selecttheunitwiththe highestactivity. A similarextensioncouldbe appliedto thebindinglayer; only insteadof a singleunit, multipleunits shouldstay activein theend.Lateralconnectionsin thebindinglayer couldalso be usedto partiallycompletethe bindingpattern even beforepropagationto the featuremaps.As the next step, the bindinglayer could be expandedto take into accountfinerstructurein the hippocampus,including the encoding and retrieval circuitry proposed by O’Reilly & McClelland (1994). A variation of the Hebbian learning mechanism (Hebb, 1949; Miller & MacKay, 1992) could then be used to implementthe storageand recall mechanisms.In additionto providing insight into the hippocampal memory system, such researchcould lead to a practicalimplementationof the convergence-zonememory,and perhapseven to a hardware implementation. Second,a numberof potentialextensionsto the model could be studiedin more detail. It might be possibleto take sparseconnectivityinto accountin the analysis,and obtain tighter lower boundsin this more realisticcase. Recurrencecould be introducedbetween feature maps and the bindinglayer, and capacitycould be measured under progressiverecall. Possibilitiesfor extendingthe modelto toleratemore overlapbetweenpatternsshould also be studied. Third, the model could be used as a steppingstone towardsa morecomprehensivemodelof humanepisodic memory,includingmodulesforthe hippocampusandthe neocorticalcomponent.As discussedabove,the convergence-zonemodel seems to fit the capabilitiesof the hippocampalcomponentwell, whereas somethinglike trace feature maps could be used to model the cortical component.It wouldbe necessaryto observeand characterizethe memoryinterferenceeffects of the components and compare them with experimentalresults on human memory. However, the main challengeof this researchwouldbe on the interactionof the components, that is, how the hippocarnpalmemorycould transferits contentsto the corticalcomponent.At thispointit is not clearhowthisprocesscouldtakeplace,althoughseveral proposalsexist(Alvarez& Squire,1994;Halgren,1984; McClellandet al., 1995; Milner, 1989; Murre, 1995; Treves & Rolls, 1994). Computationalinvestigations could prove instrumentalin understandingthe foundationsof this remarkablesystem. 12. CONCLUSION Mathematical analysis and experimental simulations show that a large number of episodescan be stored in the convergence-zonememory with reliable contentaddressableretrieval.For the hippocampus,a sufficient capacitycan be achievedwith a fairly smallnumberof 1032 M. Moll and R. Miikkulainen units and connections.Moreover,the convergencezone itselfrequiresonlya fractionof thehardwarerequiredfor perceptualrepresentation.Theseresultsprovidea possible explanationfor why human memory is so efficient with su~ha highcapacity,and whymemoryareasappear small aompared to the areas devoted to low-level perceptualprocessing.It also suggeststhat the computationaluhitsof the hippocampusand the perceptualmaps can be quite coarse, and gives a computationalreason why the maps and the hippocampusshouldbe sparsely connected. The model makes use of the combinatoricsand the clean-up properties of coarse coding in a neurallyinspireclarchitecture.The practical storage capacity of the modelappearsto be at leasttwo ordersof magnitude higher than that of the Hopfieldmodel with the same number of units, while using two orders of magnitude fewerconnections.On the otherhand,the patternsin the convergence-zonemodelare smallerthanin theHopfield network. Simulations also show that psychologically valid error behaviorcan be achievedif the bindingpatterns are made more descriptive:the erroneouspatterns are closeto the correctones.The convergence-zoneepisodic memory is a step towards a psychologicallyand neurophysiologicallyaccuratemodelof humanepisodic memory,the foundationsof which are only now beginning to be understood. REFERENCES Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning rdgoritbmfor Boltzmann machines. Cognitive Science, 9, 147–169. Alon, N., & Spencer, J. H. (1992). The Probabilistic Method. New York: Wiley. Alvarez, P., & Squire, L. R. (1994). Memory consolidation and the medial temporrd lobe: A simple network model. Proceedings of the National Academy of Sciences of the USA, 91; (pp. 7041-7045). Amaral, D. G., Ishizuh N. & Claibome, B., (1990). Neurons, numbers and the hippocarnpzdnetwork. In J. Storm-Mathisen, J. Zirmner & O. P. Ottcraen (Eda), Understanding the Brain Through the Hippocamptis, vol. 83 of Progress in Brain Research (pp. 1–1 1). New York: Elsevier. Amari, S.-I. (1977). Neural theory of association and concept fonnafion. Biological Cybernetics, 26, 175-185. Arnari, S.-I. (1988). Associative memory and its statistical neurodynamical anatysis. In H. Haken (Ed.), Neural and Synergetic Computers (pp. 85–99). Berlin: Springer Verlag. Amit, D. J. (1989). Modeling Brain Function: The World of Attractor Neural Networks. Cambridge: Cambridge University press. Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Spin-glass models of neural networka. Physical Review A, 32, 1007-1018. Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55, 1530-1533. Anderson, J. A. (1972). A simple neural network generating an interactive memory. Mathematical Biosciences, 14, 197–220. Andersoq, J. A., Silverstein, J. W., Ritz, S. A., & Jones, R.S. (1977). Distinctive features, categorical perception and probability learning: some applications of a neurrd model. Psychological Review, 84,413-451. Bain, L. J., & EngelhardG M. (1987). Introduction to Probability and Mathematical Statistics. Boston, MA: PWS publishers. Boss, B., Peterson, G., & Cowan, W. (1985). On the nmnbersof neurons in the dentate gmys of the rat. Brain Research, 338, 144-150. Boss, B., Turlejski, K., Stanfield, B., & Cowan, W. (1987). On the numbers of neurons in fields CA1 and CA3 of the hippocrnnpus of Sprague-Dawley and Wistar rats. Brain Research, 406, 280–287. Cooper, L. N. (1973). A possible organization of animal memory and learning. In B. Lundquist and S. Lundquist (Ed.), Proceedings of the Nobel Symposium on Collective Properties of Physical Systems (pp. 252-264). New York: Academic Press. Darnaaio, A. R. (1989). The brain binds entities and events by multiregionrd activation from convergence zones. Neural Computation, 1, 123-132. Darnasio, A. R. (1989). Time-locked mukiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition. Cognition, 33, 25–62. Fahlrnan, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lippmann, J. E. Moody & D. S. Touretzky (Eds), Advances in Neural Information Processing Systems, 3 (pp. 190–205). San Mateo, CA: Morgan Kaufmann. Fahlrnan, S. E., & Lebierc, C. (1990). The cascade-correlation learning architecture. In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems, 2 (pp. 524-532). San MatCo,CA: Morgan Kaufmarm. Faris, W. G., & Maier, R. S. (1988). Probabilistic analysis of a learning matrix. Advances in Applied Probability, 20, 695–705. French, R. M. (1991). Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proceedings of the 13th Annual Conference of the Cognitive Science Society (pp. 173-178). Hillsdale, NJ: Erlbamn. Gardner-Medwin, A. R. (1976). The recall of events through the learningof associations between their parts. Proceedings of the Royal Society of Lnndon B, 194 375–402. Gibson, W. G., & Robinson, J. (1992). Statistical analysis of the dynamics of a sparae associative memory. Neural Networks, 5, 645-661. Gluck, M. A., & Myers, C. E. (1993). Hippocampaf mediation of stimulus representation: A computational theory. Hippocampus, 3, 491–516. Grossberg, S. (1983). Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition and Motor Control. Boston, Dordrecht, Holland: Reidel. Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11, 23–63. Halgren, E. (1984). Human hippocarnpal and amygdrda recording and stimulation: Evidence for a neural model of recent memory. In L. Squire & N. Butters (Eds), The Neuropsychology of Memory (pp. 165-182). New York: Guilford. Hasselmo, M. E., Rolls, E. T., & Baylis, G. C. (1989). The role of expression and identity in the face-selective responses of neurons in the temporal visual cortex of the monkey. Behavioral Brain Research, 32, 203-218. Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: Wiley. Heit, G., Smith, M. E., & Halgren, E. (1989). Neural encoding of individual words and faces by the human hippocarnpus and arnygdrda.Nature, 333, 773-775. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Reading, MA: Addison-Wesley. Hetherington, P. A., & Seidenberg, M. S. (1989). Is there “catastrophic interference” incomrectionist networks? In Proceedings of the Ilth Annual Conference of the Cognitive Science Society (pp. 26-33). Hillsdrde, NJ: Erlbamn. Hinton, G. E., & Anderson, J. A. (Eds) (1981). Parallel Models of Associative Memory. Hillsdale, NJ: Erlbaum. Hinton, G. E., & Plaut, D. C. (1987). Using fast weights to deblur old memories. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 177–186). Hillsdale, NJ: Erlbamn. Convergence-Zone Episdoci Memory: Analysis and Simulations Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences USA, 79, 2554–2558. Hopfield, J.J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences USA, 81, 3088–3092. Kairiss, E. W., & Miranker, W. L. (1997). Cortical memory dynamics. Biological Cybernetics, in press. Kandel, E. R. (1991). Nerve cells and behavior. In E. R. Karrdel,J. H. Schwartz & T. M. Jessell (Eds), Principles of Neural Science (pp. 18-32). New York: Elsevier. Kanerva, P. (1988). Sparse distributed nsemo~. Cambridge, MA: MIT Press. Keeler, J. D. (1988). Comparison between Kanerva’s SDM and Hopfield-type neural networks. Cognitive Science, 12, 299-329. Kirkpatrick, S., & Sherrington, D. (1988). Infinite range models of spinglasses. Physical Review B, 17, 4384-4403. Knapp, A. G., & Anderson, J. A. (1984). Theory of categorization based on distributed memory storage. Journal of Experimental Psychology: Learrring. Memoq and Cognition, 10, 616–637. Knudsen, E. I., du Lac, S., & Esterly, S. D. (1987). Computational maps in the brain. In W. M. Cowan, E. M. Shooter, C. F. Stevens &R. F. Thompson (lids),AnnualReview ofNeuroscience (pp. 41–65). Palo Alto, CA: Annual Reviews. Kohonen, T. (1971). A class of randomly organized associative memories. Acta Polytechnic Scarrdinavica,EL25. Kohonen, T. (1972). Correlation matrix memories. IEEE Transactions on Computers, C-21, 353–359. Kohonen, T. (1977). Associative Memory: A System-Theoretical Approach. Berlin, Heidelberg, New York: Springer. Kohonen, T. (1989). SeZf-OrganizationmsdAssociative Memory, Third Edn. Berlin, Heidelberg, New York: Springer. Kohonen, T., & Mikisara, K. (1986). Representation of sensory information in self-organizing feature maps. In J. S. Denker (Ed.) Neural Networks for Computing op. 271–76). New York: American Institute of Physics. Kortge, C. A. (1990). Episodic memory in connectionist networks. In Proceedings of the 12th Annual Conference of the Cognitive Science Society (pp. 7~–771). HillsdaJe, NJ: Erlbaum. Kmscbke, J. K. (1992). ALCOVE: An exemplar-based conneetionist model of category learning. Psychological Review, 99, 22–44. Little, W. A., & Shaw, G. L. (1975). A statistical theory of short and long term memory. Behavioral Biology, 14, 115–133. Marr, D. (1971). Simple memory: a theory for archicortex. Philosophical Transactions of the Royal Society of London B, 262, 23– 81. McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of comrectiorrist models of learning and memory. Psychological Review, 102, 419– 457. McClelland, J. L., & Rumelhart, D. E. (1986a). Anmesiaand distributed memory. In D. E. Rumelhart & J. L. McClelland (Eds), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 2, Psychological and Biological Models (pp. 503–527). Cambridge, MA: MIT Press. McClelland, J. L., & Rumelhart, D. E. (1986b). A distributed model of human learning and memory. In J. L. McClelland &D. E. Rumelhart (Eds), Parallel Distn’buted Processing: Explorations in the Microstructure of Cognition, vol. 2, Psychological and Biological Models (pp. 170–215). Cambridge, MA: MIT Press. McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. The Psychology of Learning and Motivation, 24, 104-169. McEliece, R. J., Posner, E. C., Rodemich, E. R., & Verrkatesh, S. S. (1986). The capacity of the Hopfield associative memory. IEEE Transactions on Information Theory, 33, 461–482. 1033 McNaughton, B. L., & Morris, R. G. M. (1987). Hippocampal synaptic enhancement and information storage within a distributed memory system. Trends in Neuroscience, 10, 408–415. Merzenich, M. M., Nelson, R. J., Stryker, M. P., Cynader, M.S., Schoppmarm, A., & Zook, J. M. (1984). Somatosensory corticaJ map changes following digit amputation in adult monkeys. Journal of Comparative Neurology, 224, 591-605. Miikkulainen, R. (1992). Trace feature map: A model of episodic associative memory. Biological Cybernetics, 67, 273–282. Miikkulairren, R. (1993). Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory. Cambridge, MA: MIT Press. Miller, K. D., & MacKay, D. J. C. (1992). The role of constrains in Hebbiarr learning. Neural Computation, 6, 100-126. Milner, P. (1989). A cell assembly theory of bippocarnpal amnesia. Neuropshychologia, 27, 23–30. Mrrrre, J. M. J. (1995). A model of amnesia. Manuscript. O’Reilly, R. C., & McClelland, J. L. (1994). HippocampaJ conjunctive encoding, storage, and recall: Avoiding a trade-off. Hippocampus, 4, 661–682. Palm, G. (1980). On associative memory. Biological Cybernetics, 36, 19–31. Palm, G. (1981). On the storage capacity of an associative memory with randomly distributed storage elements. Biological Cybernetics, 39, 125-127. Peretto, P., & Niez, J. (1986). Collective properties of neural networks. In E. Bienenstock, F. Fogelman Souli6 & G. Weisbuch (E&), Disordered systems and biological organization (pp. 171-185). Berlin, Heidelberg, New York: Springer. RatcJiff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97, 285-308. Read, W., Nenov, V. I., & Halgren, E. (1994). Role of inhibition in memory retrieval by hippocampal area CA3. Neuroscience and Biobehavioral Reviews, 18, 55–68. Ritter, H. J. (1991). Asymptotic level density for a class of vector quantization processes. IEEE Transactions on Neural Networks, 2, 173-175. Rolls, E. T. (1984). Nerrronsinthe cortex of the temporal lobe and in the amygdala of the monkey with responses selective for faces. Human Neurobiology, 3, 209-222. Schmajuk, N. A., & DiCarlo, J. J. (1992). Stimulus configuration, classical conditioning, and Irippocampal function. Psychological Review, 99, 268-305. Sejnowski, T. J., & Churchland, P. S. (1989). Brain and cognition. InM. I. Posner (Ed.), Foundations of Cognitive Science (Chapter 8, PP. 31$3s6). Cambridge, MA: MIT Press. Squire, L. R. (1987). Memory and Brain. Oxford, UK; New York: Oxford University Press. Squire, L. R. (1992). Memory and the hippocarnpus: A synthesis from findingswith rats, monkeys, andhurnans. Psychological Review, 99, 195–231. Squire, L. R., SIrimamura,A. P., & Amarrd, D. G. (1989). Memory and the hippocampus. In J. H. Byrne & W. O. Berry (E&), Neural Models of PlasticiQ (pp. 208–239). New York: Academic Press. Steinbuch, K. (1961). Die Lemmatrix. Kybemetic, 1, 36-45. Sutherland, R. W., & Rudy, J. W. (1989). Configural association theory: The role of the hippoctunpal formation in learning, memory and amnesia. Psychobiology, 17, 129–144. Teyler, T. J., & Discenna, P. (1986). The Irippocarnpalmemory indexing theory. Behavioral Neuroscience, 100, 147–154. Treves, A., & Rolls, E. T. (1991). What determines the capacity of autoassociative memories in the brain? Network, 2, 371–398. Treves, A., & Rolls, E. T. (1994). Computational analysis of the role of the hippocarnpus in memory. Hippocarnpus,4, 374-391. Tulving, E. (1972). Episodic and semantic memory. InE. Tulving & W. Donaldson (Eds), organization of memory (pp. 381–403). New York: Academic Press. 1034 M. Moll and R. Miikkulainen Tulving, E. (1983). E2ements of Episodic Memory. Oxford, UK; New York: Oxford University Press. Wickelgren, W. A. (1979). Chunking and consolidation: A theoretical synthesis of semantic networks, configuring, S-R versus cognitive learning, normal forgetting, the amnesic syndrome, and the hippocampal arousal system. Psychological &view, 86, 44-60. Willshaw, D. J., Buneman, O. P., & Longuet-Higgins, H. C. (1969). Non.holographic associative memory. Natire, 222, 960-%2. Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippocampal ensemble code for space. Science, 261, 1055–1058. Using (A.5), an expression for E(Zi) can be derived: i E(Z,)=k~,E(Y~)=,~lm(l -~)’ -’=n(l - (1- ~)i). (A.6) This equation indicates that initially, when no patterns are stored, Zi = O,and that Zi converges ton as i goes to infinity. The variance of Zi can be computed in a similar way. First a recurrent expression for the variance of Yiis derived: Var(Yt)=Ez-l IVarfYilZi - ~)]+ Varzi-, IE(YilZi_ 1)1 APPENDIXA: PROBABILITYTHEORY BACKGROUNDAND PROOFS In this appendix, concepts from probability theory that are necessary for understanding the main text are briefly reviewed, and details of the probabilistic formulation and mmtingale analysis are presented (for more background on probability theory and statistics, see e.g. Alon & Spencer, 1992 or Bain & Engelhardt, 1987). ‘(x =‘1‘*)np) e-’ w (1 –/$-6 = () TOobti the variance of Zi, the covariance of Zi-l ~d Yineeds to k computed: =Ez,-, [E(Zi- ~Yil~- ~)]–E(Zi- ~)E(~) In the analysis of Sections 3 and4, two probability density functions are used. Tbe first one is the binomial distribution with parameters n andp, denoted as B(n#), and it gives the outcome of n independent trials where ~e two possible alternatives have probability p and 1 – p. This distribution has the expected value np and variance np(l – p). The other one is the hypergeometric distribution with parameters n, m and N, denoted as HYP(n,mM, and representing the number of elements in common between two independently-chosen subsets of n and m elements of a common superset of N elements. The distribution has the expected value ~ and variance Y(1– ~) H. The Chemoff bounds can be used to estimate how likely a binomially distributed variable, X, is to have a value within a given distance 6 from its mean: () (A.7) COV(Zi - ~,Yi)=E(Zi - 1~) – E(Zi - l)E(yi) Al. Distributionsand Bounds P(xs(1– a)np)s =%+(+:) ’-’)(:):+HV’JHV-J , 0<6<1, e’ ‘,6>0 (1 +6)1+6 ( m(n – Zi- I ) =E Zi- ~ ~ – E(Zi - ~)E(Yi) ) =mE(Zi-, ) – ~[(E(Zi - ~))z+ Var(Zi- ~)]– E(~- ,)E(Yi) =— ~Var(Zi-l). (A.8) The variance of Zi cm now be easily derived: Var(Zi) = Var(Z’- ~+ Yi) = Var(Zi- ~)+ Var(Yi)+ 2C0V(Zi_ 1,‘i) = (Al) >Ivar(yk)+z$,c.v(zk-,,y =,$1 Var(Yk)- ~j~ “ Var(Z~) (A.2) = Var(Yi)+ Even if the trials are not independent (and therefore Xis not binomially distributed), in some cases the sequence of variables XO,..., X“ can be analyzed as a martingale, and bounds similar to Chemoff bounds derived using Azrrma’s inequalities (see Appendix A.3). 1– ~ () =n(l-:)i(l-~(l- Var(Zi-1) :)i) +n(?l – 1) 1 – m(:(-–ml; ( A.2. Detailsof the ProbabilisticFormulation Asin,%etion 3, let Zibethe size of the binding constellation of a featore unit after i patterns have been stored on it, and let Yi be its increase after storing the itb pattern on it. Then i Zi= ~~,Yk. (A.3) Let Yi, i > 1 be hypergeometically distributed with parameters m, n — zi-l), ad n: 1) )’ ‘rhebase of every exponential in this expression is smaller tharr 1, so the variance goes to Oas i goes to infinity. This makes sense in the model. If many patterns are stored, it becomes very likely that Zi has a value close to n, and hence its variance should go to zero. So far it has been assumed that i is an ordinary variable. If it is replaced by a stochastic variable I that is bioomially distributed with parameters p and 1~, the previously computed expected values and variances must be made conditional on 1.To compute the mrconditiomd expected values and variances, the following lemma is needed: LEMMAAl. ~X - B(n,p), thenE((l–a~)=(1–ap)n. ‘(y=’’zi-l=zi-’)=( H(::Y)Y(‘A4) Proof. :):) E((l –aF)= ,.. (l –a~ and let ZI = YI = m with probability 1. Tben n i—1 ‘(y’)=E(m(n-:-l)) =rn - ~E(z._l)=tn- — — (1– ~) E(Yi-,) =m(l – ~)’-]. = ~,~lE(Yk) q n .r=o x n #(l-p)”-x () x ) (p - ap~(l –p)n-x = ((p – ap) + (1 -p))” (A.5) (A.9) =(1 – ap)” Convergence-Zone Episdoci Memory: Analysis and Simulations Let Zdenote the unconditional value of Zi. The desired results follow immediately from Lemma A.1 and the linearity of expected values: “2’=”(1-( %)’) (A.10) Var(Z) = Varl[E(ZIII)] +E1[Var(Zrll)] =n2Var ‘z-z’-”= +-z’x’-+)ki-v+:) 1– ~ 1 +.(. – In order to apply Azmna’s inequality, the Lipscbitz condition l~E–Z~-l I = 1 must to be satisfied. The binding units are chosen one at a time. and thev . mav. either alreadv be Dart of the constellation or add one more tit to it. Therefore, tb~ diff~rence between Z’y and Z’v-l is rdways either Oor 1: Case 1: Z’v –Z’, - ~=0 [( )] +E “(1 - ~)r-nz(l [ 1035 - ~)” — — )1 – m – 1) 1) 1– m(2n n(n – 1) ( fi-v –( ) .—Z’v n — ‘ 1–: n =s1. (A.19) Case 2: Z’V=Z’, _ 1+ 1 ‘n(l-;)p(l-n(l-%)p) ( +n(n – 1) 1 – m(2n – m – 1) p n(n – 1)$ ) (All) ‘~-z~-l’=l-(n-z’+:)ki-”( ’-(’-i)) () + . (A.12) + (n – m)(n –m– (1-@-ml-; 1) 1 – ( — — )’-1) m(2n – m – 1) ‘–1 n(n – l)f ) I-%+) ki-v+w”+’l 1( n–1 n n–Z’v l–~ n )( n) —— ki-v — b–v – Z’v–l — l–~ n) –(n <1. (A.13) From (A.1O)and (A.12) it follows that on the average, ~is larger tbarrZ. Initially the difference is exactly m, and the difference goes to zero asp goes to infinity. This means that initially, when few patterns are stored, the right units are very likely to be retrieved, since they have the advantage of receiving activation from at least m binding units. However, when more patterns are stored, rdmost every unit in a feature map receives the same amount of activation. Units that have been used most often are most likely to be retrieved. It is difficult to derive an exact expression for the expected value and variance of XC(the intersection of c retrieval cues) because the binding constellations are correlated. Since the same partial patterns might have been stored several times, certain combinations of retrieval cues can give rise to spurious activity in the binding layer, which is difficult to take into account in the analysis. For this reason, the analysis is carried out under the assumption that the chance of partial patterns is negligible, which is reasonable in most cases and easy to check. 1–! n “2)=m+(n-m)(’-( 1-% Y)> ‘ar(2)=@-m’(1-%)p-’ 1 ki–v+ Let ~ be defined as Z-ZIIZ >1. Then, ~ will always be at least m. Expressions sirniku to (All) and (A.12) can be derived for Z: (A.20) In both cases we have [~E– ~E-, I <1 and hence Azuma’s inequality can be applied to obtain a bound for Z. A.5. The IntersectionMartingale Let X; be defined as in Appendix 4.3: XE=X, + ““ (., –v)(rrz –X’v) ., —v . HXf n. —v j (.l – v)nz n, —v ‘ (A.21) First, X~,;.uxPE XE is shown to be a martingrde. The expected increase of X’v is ; -,;;. The conditional expected value of X,B given X:-l is A.3. Martingales Martingales give us a way to analyze the outcome of n trials with limited dependencies. A martingale is a sequence of random variables XO,..., Xn such that E(X’IXO, ..., Xi-, ) =Xi - ~, 1S i < n. (A.14) If a martingale satisfies the Lipschitz condition, that is IXi–Xi- ~I = 1, 1S i = n, E(X; IX;- ~=$- (A.15) rr.2—X’j-, (.l – v)rr~ = ~(x’j-l + n, –v+ ~)+ — n, – v s n. —nl (n, –nl)nz + (.l – v)nz .—~ n, – v + 1x j- 1+ (n, – v)(rr,– v+ 1) ., —v (A.16) .— ., —n, n, – v + then the following inequalities hold: F’(Xis XO– Xti)s e-A’n ~)=E(X; IX’V- ~=x’j- ,) r (n, –nl)nz 1x‘- 1+ (n, – v)(rr,– v + 1) + (., – v)(rrl– v)nz +(.l –v)nz (., –v)(n, –v+ These equations are called Azmna’s inequalities and they can beusedto bound the final value of the sequence within a chosen confidence level. rr~—nl = —X’j– n, – v + 1 =2-,. A.4. The BindingConstellationMartingale Let ~E be defined as in Section 4.2: n – (n – Z’.) () ~ 1– ; ki-. . (A.18) (A.22) Since E(X~lX~-~)=X~- ~, the sequence X:, ..., X~l is a martingale. To a ply Azuma’s inequality, it is necessary to show that IX! –X< ~I = 1 forv = 1,...,nl. Unlike for the martingale of Appendix A.4, this is not generally true, but true only when certain constraints on nl, .2 md n, are satisfied. If these parameters are fixed, the difference can be seen as a function of v, X’. and X’,-1. The function is monotonic in these variables, and to find its maximum, it is sufficient to look only ~=z’.+(n-z’+(+ki-v) = 1) (n, – v + l)rr* ~+ rr$– v + 1 1036 M. Moll and R. Miiklzulainen at the extreme values of v, X’, snd X .–l. The maximum in turn determines the consh-aintson nl, n2 and n.. As in Appendix A.4, the proof consists of two main cases: (1) X’, = X’y-l and (2) X’v = X’v-l + 1. Each of these is divided into several srrbcases.The following tables show the cases, the absolute difference in each case between X.E and X~-l, and the constraint that follows from thedifference. case1: X’v = x’v-~. In this case, using (A.21), IX! –X;- ~I can be rewritten as ere are three subcases listed in the table below. (.2–X1,)(., -n,) ~ (n, –,)(.” –v+ 1). Subcaae v= 1,x’”=o v = n,, X’v = max(O,nl + nz – n,) Difference n2(% w) – rt=(n8– n, >0 1) n2 if X’u = O n, – nl + 1 n$ —nl if X’ti = nl + n2 – n. n. – nl + 1 n2 — v= n,, X’v = min(nl,nz) Resultingconstraint nl n$ – nl + 1 v n2 < ns nl + nz — 1s n~ no constraint if X’v = nl no constraint Oif X’. = nz case 2: x’, = xd_l + 1. Now IX: –Xv-, I can be rewritten as (“s~~’~~~j,~j~~)x’v).Again there are three subcases listed in the table below. To conclude, if nl + nz – 1 = n,, then [X?– X:+, I = 1 and Azuma’s inequality can be applied. Subcase Difference v = 1,X’v =1 (n, – nl)(n, –nz) n,(n. – 1) v = nl, X’v = max(l ,nl + nz – n$) n.– nl – nz+ 1 IfX’v=1 n, – nl + 1 Resultingconstraint no constraint Oif X’. = nl + nz – n, v= n,, X’v = min(nl,nz) n8— n2 if X’v = nl n. – n~ + 1 72$—nl if X’v = nz n. – nl + 1 no constraint .–—.-