From: AAAI Technical Report SS-96-03. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. Viewpoint Dependence in HumanSpatial Memory Timothy P. McNamara Vaibhav A. Diwadkar Department of Psychology, 301 Wilson Hall Vanderbilt University, Nashville, TN 37240 Experimental Abstract Wesummarize twolines of research that investigated whetherhumanspatial memoriesare viewpoint dependent (e.g., viewer-centered referenceframes) viewpointindependent (e.g., scene-centeredreference frames).In oneseries of experiments,participants madejudgments of relative direction after viewinga Sl~ti~_I layout fromoneor twoperspectives.The findingsindicatedthat multipleviewsof a spatial layout producedmultiple viewpointdependent representations in memory.Thesefindings were corroboratedby the results of experiments on scene recognition. Theseexperimentsshowed,again, that multipleviewsof a scene producedmultiple viewpointdependentrepresentations in memory,and that a novelviewof a familiar scenewasrecognized by normalizingit to the mostsimilar viewin memory.A preliminary computationalmodelof scenerecognitionformalizesseveralof these concepts. Whenpeoplelearn a newspatial environment,they must encodethe locations of objects in memory with respect to one or moreframesof reference. Referenceframesmaybe determined by the location of the viewerwith respect to the space. Anexampleof such a viewpointdependent referenceframeis onethat specifies locationin termsof retinal coordinates,whichchangefromviewto view. Alternatively,referenceframesmaybe determined independentlyof the viewer’sperspective.Fixedaxes centeredonthe sceneitself constitute oneexampleof a viewpointindependent referenceframe. Thegoal of the researchdescribedbelowis to determine whetherspatial memories are viewpointdependentor viewpointindependent.Wehavetaken two approachesto this problem.In one line of research, wehaveexamined people’sabilities to retrieve andto transformin imaginationtheir knowledge of a spatial layout. In the other line of research, wehaveexamined people’sabilities to recognizescenesthat they haverecently learned. Each of these projectswill be describedin turn. In addition,we will describe a preliminarycomputationalmodelof human scenerecognition. 68 Investigations Thereis evidencethat memories of small spaces, suchas maps,are viewpointdependent,but memoriesof larger, navigablespaces, such as a roomor a college campus,are viewpointindependent(e.g., Evans&Pezdek,1980). Mostof these studies haveconfounded size of spacewith learning experiences(but see Presson, DeLange, Hazelrigg, 1989). Morerecent experimentshave shown that memories of large andsmallspatial layoutsare encodedin a viewpointdependentmannerwhenpeople are limitedto a single viewof the space(e.g., Rieser,1989; Roskos-Ewoldsen,McNamara,& Carl 1995; Shelton & McNamara,1996). For example,Roskos-Ewoldsen et al. (1995) had subjects viewfour-point paths, such as the onein Figure 1, froma single perspective.Thepaths wereeither very small (6" by 9") or room-sized 2 3 (8’ by 12’). After seeing path briefly, subjects made judgments of relative direction from memory (e.g., "Point to as if standingat 1 facing2."). Theresults showedthat subjects werefaster andmore 4 accurate whentheir imagined facing directions corresponded to their perspectivewhenthey 1 studiedthe path(e.g., for the Figure 1. reader, at I facing 2) than when their imagined facing directions did not correspond to their perspectiveat the timeof learning(e.g., at 2 facing 1). This "alignment"effect was the samesize for smallandfor large spatial layouts. An alignmenteffect wouldnot be expectedif locationswere encodedwith respect to viewpointindependentreference frames° Datacollectedin our laboratoryindicate that twoviews of a spatial layout producetwoviewpointdependent representationsof that space, not a single viewpoint independentrepresentation (Shelton & McNamam, 1996). Subjectssawa collectionof objects laid out on the floor of a large roomfromtwoperspectives, whichdiffered by 90° (see Figure2). Theobserversstudied the layout from eachperspectiveuntil they couldpoint to andnameall of the objects with their eyes closed. Subjectsthen made judgments of relative direction basedon their memories of the layout (e.g., "Imagineyouare at the bookfacingthe shoe. Point to the jar."). 3O A 0 lamp wood ~’ 25clock _.1 shoe pan g 2o’ :| jar book 9Position Viewing 0 90 135 180 225 270 315 ImaginedHeading(deg) Figure4. Figure2. In our secondline of research, wehaveexplored viewpointdependence in the context of scenerecognition. In oneseries of experiments,subjects werefn’st shown configurationsof five dots displayedin depth(usinglinear perspective) on a computermonitor. Twoviewsof the configurationweredisplayed,differing by 75° . These viewsweredisplayedin alternation; on mosttrials, subjects reporteda strong senseof apparentmotion(i.e., the configurationappearedto be rotating backandforth in space). Subjectswerethen showna test scene andhad to decidewhetherit wasthe configurationof dots they had just viewed("old") or a differentconfigurationof dots ("new").Theindependentvariable wasthe distance angularrotation betweenthe test viewandthe "closest" studyview(def’med,of course,onlyfor old test scenes). Accuracywashigh andunaffectedby angulardistance. Therelation betweenresponselatencyand angnlnr distancc is presentedin Figure5. Resultsare plotted separatelyfor test viewson the minorarc "in between"the two study views(interpolations)andtest viewson the majorarc "beyond"the two study views (exwapolations).For example,if 0° and75° correspondto studyviews, then 15°, 30°, 45°, &60° °are interpolations, whereas90°-345 are extrapolations.This distinction is importantin some modelsof visual object recognition. Oneimportantresult wasthat responsesto test scenescorresponding to the two studyviews(0° &75°; distance=0in Figure5) wereas fast as, or faster than, responsesto all other views.This result indicatesthat both studyviewswererepresentedin memory.A second importantf’mdingwas that response latencyincreasedlinearly for extrapolationsbut wasfiat for interpolations(also see Tan’, 1995).Thelinear relation for extrapolationsindicates that a newviewof an Imagined headingsvaried from0° to 315° in 45° steps; by convention,0° and90° corresponded to the two viewsthat subjects ac~mllysaw. Pointing accuracyandresponse latencyare plotted in Figures3 and4 as a functionof the imaginedheading(these graphscollapse across pointing direction;error bars are the standarderror of the meanas estimatedfromthe ANOVA). It is clear that the headings of 0° and90° are privileged:Angularerror is lowerand responsetime is faster for these headingsthan for other headings.Preliminaryresults indicatethat spatial memoriesare viewpointdependentevenwhensubjects are allowedto movethroughthe space whenit is learned. 4O ~ 45 3530252015"’l .... i .... i .... I .... I .... I .... 0 45 90 135 180 225 270 315 ImaginedHeading(deg) Figure3. 69 1450 old scenemustbe normalizedto the best possible representationin memory, and that normalizationis increasinglyworedifficult as the newviewdepartsin angular distancefrom the representations in memory. We attribute the fiat functionfor interpolationsto the ~" 1400 - mo onmOo,u ,O,’ ,=o ,.,,. _,.,4 ,~ 1.15-4 . ~[ °.95K "r.,9 0.0-1 Y" o. 1 mlt ]~, ~ mO" 1250- i - - ,,,o0 45 .,..,.,..,..,.,.,. 90135180225270015360 ’ II Test View (deg) Fi0ure 8. II "e- Interp°lati°n ’"--:-- IIII 0"8-~ 0.75 ~ o ~ o ~ ~ R ~ =o o ~ 1400. 1380 -:] "6"1360.,:1 ~ 1340t.~ AngularDistance(d’-eg; "- " T// Another .seriesof experiments hasexamined the "~ 1300--] recognition of realscones. Theparticipants studied, fzom / " ..// jf 1260 -~ / " 4/" i 1240 1280-t_.1 QI" ~ 1220 1200 I ~ 15 45 0 30 AngularDistance(deg) 1 a singleperspective, a collectionof six fandliarobjects resting ona circular desktop. Thesubjects then learned to recognizethe scenefromthis studyviewand three additional"training" views.In twotraining blocks, subjectshadto discriminatepictures of newunfamiliar scenesfrompictures of the familiar scenetaken fromany of four perspectives(0°, 45°, 90°, &270°;0° wasthe view subjects actually saw). Followingthe training blocks, subjectsreceiveda "surprise"test blockin whichall 24 viewsofthe scene(0-345° in 15° steps) wereusedas targets in a recognitionteal Recognitionaccuracyin the final block wasover 98%. Figure6 containsmeanresponselatencies for eachof the 24 test views.Thegeneralpattern is that responselatency increaseswith angulardistancefroma training view.Test viewsnear 180° violate this linear paUern(seealso Murray,1995).Thelinearity of these data is revealed dramaticallyin Figure7, whichreplots the data in Figure 6 as a functionof the angulardistancebetweena test view andthe nearestof the four training views(this graph excludesthe nonlinearsection between135° and225°). Theslope of the function in Figure7 wasthe samefor interpolations(e.g., 0-45; 45-90;&270-0)andfor extrapolations(e.g., 90-270). Figure7. This finding contrasts with the outcomeof the previously discussedresults usingdot arrays as stimuli. Although thereare manydifferencesbetweenthese experiments,an importantoneis that subjects in the formerbut not the latter experiments couldinfer spatial structure from apparentmotionin the studyor training phaseof the experimenL Weare testing this hypothesis in ongoing experiments. 70 Model Development Weare developing a computational modelof scene recognition. At present, the modeldevelopmentis in its earliest stages and has been applied only to the experiments using dot arrays as stimuli. Althoughthe overall structure of the modelis not likely to change, manyimportant components(e.g., the representations of the scenes) will evolve substantially over the next few months. The modelis based on the regularizatiou networks described by Pogglo and Girosi (1990). These networks use generalized radial basis functions (GRBF)to recognize objects based on their similarity to stored exemplars. In out model, a view of a scene is represented as a nonmelric and a metric vector. The use of both nonmetric and metric representations is consistent with findings in the spatial memoryliterature (e.g., Huttenlecher, Hedges, & Duncan, 1992; McNamara,1986; Tversky, 1981). The coordinates of the nonmetricvector are 1 or 0 dependingon whetheror not the correspondingpair of locations are in the same "cluster". This assessmentis based on the representation of the configuration in the imageplane (and hence, conespoudsloosely to retinal coordinates). The diameter of a cluster is a free parameterbut is currently set to approximately 1/4 the maximum distance in a scene. The coordinates of the metric vectorare the Euclideandistances between locations in the image plane. These representations are, of course, viewpoint dependent for oblique but not orthogonal views. Weare also considering representations that are view dependentfrom orthogonal views (i.e., from straight above). The networktakes as input the two vectors of a scene, and then computesthe overall similarity betweenthis scene and all study scenes stored in memory.Each study scene is represented by two GRBF basis units in the "hidden" layer, correspondingto the nonmetricand the metric representations of that study scene. Eachbasis unit computesthe distance from its center (whichcorresponds to its learned or preferred view) to the input vector. The activity of the basis unit is an exponentialfunction of this dislance, and is maximalwhenthe input vector is the sameas the learned view. The overall output is a weighted sumof the outputs of all basis units in memory. This output is thresholded to yield a "yes-no" recognition decision. In the present application, there are 20 input units (I0 each for the nonmetricand the metric vectors), four basis units (a nonmetricand metric unit for each of study scenes in memory),and one output unit. The networkdoes an excellent job of discriminating views of studied scenes (both familiar and novel views) from views of new scenes. Overall, the modelis correct on 92%of the trials; the false alarm and miss rates are 6% and 2%,respectively. In the humandata, the colTesgK}ndingpercentages were 87%,6%, and 7%. At present, we are examiningalternative representations of ¯ scenes and extending the modelto account for response time. Summary & Conclusions Whenpeople learn the locations of objects in a spatial layout, they seem to encodelocation with respect to reference frames that dependon point of view. Multiple views of a spatial layout produce multiple viewpoint dependent rewesentations in memory,not a single viewpoint independentrepresentation. These results are consistent with recent findings in the domainof visual object recognition (e.g., Edelman&Bulthoff, 1992; Tan’, 1995). Apparently, intraobject spatial relations and interobject spatial relations are represented and processed in similar ways by the humanbrain. Acknowledgments The research reported in this paper was supported in part by National Science Foundation Grant SBR-9222002. References Edelman, S., &Bulthoff, H. (1992). Orientation dependencein the recognition of familiar and novel views of three-dimensional objects. Vision Research, 32, 2385-2400. Evans, G. W., & Pezdek, K. (1980). Cognitive mapping: Knowledge of real-world distance and location information. Journal of Experimental Psychology: HumanLearning and Memory, 6, 13-24. Huttenlocher, J., Hedges, L. V., & Duncan,S. (1991). Categoriesand particulars: Prototypeeffects in estimating spatial location. Psychological Review, 98, 352-376. McNamara, T. P. (1986). Mental representations spatial relations. Cognitive Psychology, 18, 87121. Murray, J. E. (1995). Imagining and namingrotated natural objects. PsychologicalBulletin and Review, 2, 239-243. Poggio, T., &Girosi, F. (1990). Regularization algorithmsfor learning that are equivalent to multilayer networks. Science, 247, 978-982. Presson,C. C., DeLange, N., & I-Iazelrigg, M.D. (1989).Orientationspecificity in spatial memory: Whatmakesa path different from a mapof the path? Journalof Experimental Psychology:Learning, Memory,and Cognition, 15, 887-897. Rieser, J. J. (1989).Accessto knowledge of spatial structure at novelpoints of observation.Journal of ExperimentalPsychology:Learning, Memory, and Cognition, 15, 1157-1165. Roskos-Ewoldsen, B., McNamara, T. P., & Can’, W.S. (1995). Mentalrepresentationsof large andsmall spatial layoutsare viewpointdependent. Manuscriptsubmittedfor publication. Shelton,A., &McNamara, T. P. (1996). Multiple views of spatial memory.Manuscript. Tart, M. J. (1995). Rotatingobjects to recognizethem: Acase study on the role of viewpointdependency in the recognitionof three-dimensional objects. PsychonomicBulletin & Review, 2, 55-82. Tversky,B. (1981). Distortions in memory for maps. Cognitive Psychology,13, 407-433. 72