Viewpoint Dependence in Human Spatial Memory

advertisement
From: AAAI Technical Report SS-96-03. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved.
Viewpoint
Dependence in HumanSpatial
Memory
Timothy P. McNamara
Vaibhav A. Diwadkar
Department of Psychology, 301 Wilson Hall
Vanderbilt University, Nashville, TN 37240
Experimental
Abstract
Wesummarize
twolines of research that investigated
whetherhumanspatial memoriesare viewpoint
dependent
(e.g., viewer-centered
referenceframes)
viewpointindependent
(e.g., scene-centeredreference
frames).In oneseries of experiments,participants
madejudgments
of relative direction after viewinga
Sl~ti~_I layout fromoneor twoperspectives.The
findingsindicatedthat multipleviewsof a spatial
layout producedmultiple viewpointdependent
representations in memory.Thesefindings were
corroboratedby the results of experiments
on scene
recognition. Theseexperimentsshowed,again, that
multipleviewsof a scene producedmultiple
viewpointdependentrepresentations in memory,and
that a novelviewof a familiar scenewasrecognized
by normalizingit to the mostsimilar viewin
memory.A preliminary computationalmodelof
scenerecognitionformalizesseveralof these
concepts.
Whenpeoplelearn a newspatial environment,they must
encodethe locations of objects in memory
with respect to
one or moreframesof reference. Referenceframesmaybe
determined
by the location of the viewerwith respect to
the space. Anexampleof such a viewpointdependent
referenceframeis onethat specifies locationin termsof
retinal coordinates,whichchangefromviewto view.
Alternatively,referenceframesmaybe determined
independentlyof the viewer’sperspective.Fixedaxes
centeredonthe sceneitself constitute oneexampleof a
viewpointindependent
referenceframe.
Thegoal of the researchdescribedbelowis to determine
whetherspatial memories
are viewpointdependentor
viewpointindependent.Wehavetaken two approachesto
this problem.In one line of research, wehaveexamined
people’sabilities to retrieve andto transformin
imaginationtheir knowledge
of a spatial layout. In the
other line of research, wehaveexamined
people’sabilities
to recognizescenesthat they haverecently learned. Each
of these projectswill be describedin turn. In addition,we
will describe a preliminarycomputationalmodelof human
scenerecognition.
68
Investigations
Thereis evidencethat memories
of small spaces, suchas
maps,are viewpointdependent,but memoriesof larger,
navigablespaces, such as a roomor a college campus,are
viewpointindependent(e.g., Evans&Pezdek,1980).
Mostof these studies haveconfounded
size of spacewith
learning experiences(but see Presson, DeLange,
Hazelrigg, 1989). Morerecent experimentshave shown
that memories
of large andsmallspatial layoutsare
encodedin a viewpointdependentmannerwhenpeople are
limitedto a single viewof the space(e.g., Rieser,1989;
Roskos-Ewoldsen,McNamara,& Carl 1995; Shelton &
McNamara,1996).
For example,Roskos-Ewoldsen
et al. (1995) had
subjects viewfour-point paths, such as the onein Figure
1, froma single perspective.Thepaths wereeither very
small (6" by 9") or room-sized
2
3
(8’ by 12’). After seeing
path briefly, subjects made
judgments
of relative direction
from memory
(e.g., "Point to
as if standingat 1 facing2.").
Theresults showedthat
subjects werefaster andmore
4
accurate whentheir imagined
facing directions corresponded
to their perspectivewhenthey
1
studiedthe path(e.g., for the
Figure 1.
reader, at I facing 2) than when
their imagined
facing directions
did not correspond
to their perspectiveat the timeof
learning(e.g., at 2 facing 1). This "alignment"effect was
the samesize for smallandfor large spatial layouts. An
alignmenteffect wouldnot be expectedif locationswere
encodedwith respect to viewpointindependentreference
frames°
Datacollectedin our laboratoryindicate that twoviews
of a spatial layout producetwoviewpointdependent
representationsof that space, not a single viewpoint
independentrepresentation (Shelton & McNamam,
1996).
Subjectssawa collectionof objects laid out on the floor
of a large roomfromtwoperspectives, whichdiffered by
90° (see Figure2). Theobserversstudied the layout from
eachperspectiveuntil they couldpoint to andnameall of
the objects with their eyes closed. Subjectsthen made
judgments
of relative direction basedon their memories
of
the layout (e.g., "Imagineyouare at the bookfacingthe
shoe. Point
to the jar.").
3O
A
0
lamp
wood
~’ 25clock
_.1
shoe
pan
g 2o’
:|
jar
book
9Position
Viewing
0
90 135 180 225 270 315
ImaginedHeading(deg)
Figure4.
Figure2.
In our secondline of research, wehaveexplored
viewpointdependence
in the context of scenerecognition.
In oneseries of experiments,subjects werefn’st shown
configurationsof five dots displayedin depth(usinglinear
perspective) on a computermonitor. Twoviewsof the
configurationweredisplayed,differing by 75° . These
viewsweredisplayedin alternation; on mosttrials,
subjects reporteda strong senseof apparentmotion(i.e.,
the configurationappearedto be rotating backandforth in
space). Subjectswerethen showna test scene andhad to
decidewhetherit wasthe configurationof dots they had
just viewed("old") or a differentconfigurationof dots
("new").Theindependentvariable wasthe distance
angularrotation betweenthe test viewandthe "closest"
studyview(def’med,of course,onlyfor old test scenes).
Accuracywashigh andunaffectedby angulardistance.
Therelation betweenresponselatencyand angnlnr distancc
is presentedin Figure5. Resultsare plotted separatelyfor
test viewson the minorarc "in between"the two study
views(interpolations)andtest viewson the majorarc
"beyond"the two study views (exwapolations).For
example,if 0° and75° correspondto studyviews, then
15°, 30°, 45°, &60° °are interpolations, whereas90°-345
are extrapolations.This distinction is importantin some
modelsof visual object recognition. Oneimportantresult
wasthat responsesto test scenescorresponding
to the two
studyviews(0° &75°; distance=0in Figure5) wereas
fast as, or faster than, responsesto all other views.This
result indicatesthat both studyviewswererepresentedin
memory.A second importantf’mdingwas that response
latencyincreasedlinearly for extrapolationsbut wasfiat
for interpolations(also see Tan’, 1995).Thelinear
relation for extrapolationsindicates that a newviewof an
Imagined
headingsvaried from0° to 315° in 45° steps; by
convention,0° and90° corresponded
to the two viewsthat
subjects ac~mllysaw. Pointing accuracyandresponse
latencyare plotted in Figures3 and4 as a functionof the
imaginedheading(these graphscollapse across pointing
direction;error bars are the standarderror of the meanas
estimatedfromthe ANOVA).
It is clear that the headings
of 0° and90° are privileged:Angularerror is lowerand
responsetime is faster for these headingsthan for other
headings.Preliminaryresults indicatethat spatial
memoriesare viewpointdependentevenwhensubjects are
allowedto movethroughthe space whenit is learned.
4O
~
45
3530252015"’l .... i .... i .... I .... I .... I ....
0 45 90 135 180 225 270 315
ImaginedHeading(deg)
Figure3.
69
1450
old scenemustbe normalizedto the best possible
representationin memory,
and that normalizationis
increasinglyworedifficult as the newviewdepartsin
angular distancefrom the representations in memory.
We
attribute the fiat functionfor interpolationsto the
~" 1400 -
mo
onmOo,u
,O,’
,=o
,.,,.
_,.,4
,~ 1.15-4
.
~[
°.95K
"r.,9
0.0-1
Y"
o. 1
mlt
]~,
~
mO" 1250-
i -
-
,,,o0 45
.,..,.,..,..,.,.,.
90135180225270015360
’ II
Test
View (deg)
Fi0ure
8.
II "e- Interp°lati°n
’"--:-- IIII
0"8-~
0.75
~
o ~ o ~ ~ R ~ =o o ~
1400.
1380
-:]
"6"1360.,:1
~ 1340t.~
AngularDistance(d’-eg; "-
"
T//
Another
.seriesof experiments
hasexamined
the
"~ 1300--]
recognition
of realscones.
Theparticipants
studied,
fzom
/
"
..//
jf
1260
-~ /
"
4/"
i 1240
1280-t_.1
QI"
~
1220
1200 I
~
15
45
0
30
AngularDistance(deg)
1
a singleperspective,
a collectionof six fandliarobjects
resting
ona circular
desktop.
Thesubjects
then learned
to
recognizethe scenefromthis studyviewand three
additional"training" views.In twotraining blocks,
subjectshadto discriminatepictures of newunfamiliar
scenesfrompictures of the familiar scenetaken fromany
of four perspectives(0°, 45°, 90°, &270°;0° wasthe view
subjects actually saw). Followingthe training blocks,
subjectsreceiveda "surprise"test blockin whichall 24
viewsofthe scene(0-345° in 15° steps) wereusedas
targets in a recognitionteal
Recognitionaccuracyin the final block wasover 98%.
Figure6 containsmeanresponselatencies for eachof the
24 test views.Thegeneralpattern is that responselatency
increaseswith angulardistancefroma training view.Test
viewsnear 180° violate this linear paUern(seealso
Murray,1995).Thelinearity of these data is revealed
dramaticallyin Figure7, whichreplots the data in Figure
6 as a functionof the angulardistancebetweena test view
andthe nearestof the four training views(this graph
excludesthe nonlinearsection between135° and225°).
Theslope of the function in Figure7 wasthe samefor
interpolations(e.g., 0-45; 45-90;&270-0)andfor
extrapolations(e.g., 90-270).
Figure7.
This finding contrasts with the outcomeof the previously
discussedresults usingdot arrays as stimuli. Although
thereare manydifferencesbetweenthese experiments,an
importantoneis that subjects in the formerbut not the
latter experiments
couldinfer spatial structure from
apparentmotionin the studyor training phaseof the
experimenL
Weare testing this hypothesis in ongoing
experiments.
70
Model
Development
Weare developing a computational modelof scene
recognition. At present, the modeldevelopmentis in its
earliest stages and has been applied only to the
experiments using dot arrays as stimuli. Althoughthe
overall structure of the modelis not likely to change,
manyimportant components(e.g., the representations of
the scenes) will evolve substantially over the next few
months.
The modelis based on the regularizatiou networks
described by Pogglo and Girosi (1990). These networks
use generalized radial basis functions (GRBF)to recognize
objects based on their similarity to stored exemplars. In
out model, a view of a scene is represented as a nonmelric
and a metric vector. The use of both nonmetric and metric
representations is consistent with findings in the spatial
memoryliterature (e.g., Huttenlecher, Hedges, & Duncan,
1992; McNamara,1986; Tversky, 1981). The coordinates
of the nonmetricvector are 1 or 0 dependingon whetheror
not the correspondingpair of locations are in the same
"cluster". This assessmentis based on the representation
of the configuration in the imageplane (and hence,
conespoudsloosely to retinal coordinates). The diameter
of a cluster is a free parameterbut is currently set to
approximately 1/4 the maximum
distance in a scene. The
coordinates of the metric vectorare the Euclideandistances
between locations in the image plane. These
representations are, of course, viewpoint
dependent
for
oblique but not orthogonal views. Weare also
considering
representations
that are view dependentfrom
orthogonal views (i.e., from straight above).
The networktakes as input the two vectors of a scene,
and then computesthe overall similarity betweenthis
scene and all study scenes stored in memory.Each study
scene is represented by two GRBF
basis units in the
"hidden" layer, correspondingto the nonmetricand the
metric representations of that study scene. Eachbasis unit
computesthe distance from its center (whichcorresponds
to its learned or preferred view) to the input vector. The
activity of the basis unit is an exponentialfunction of this
dislance, and is maximalwhenthe input vector is the
sameas the learned view. The overall output is a
weighted sumof the outputs of all basis units in memory.
This output is thresholded to yield a "yes-no" recognition
decision. In the present application, there are 20 input
units (I0 each for the nonmetricand the metric vectors),
four basis units (a nonmetricand metric unit for each of
study scenes in memory),and one output unit.
The networkdoes an excellent job of discriminating
views of studied scenes (both familiar and novel views)
from views of new scenes. Overall, the modelis correct
on 92%of the trials; the false alarm and miss rates are 6%
and 2%,respectively. In the humandata, the
colTesgK}ndingpercentages were 87%,6%, and 7%. At
present, we are examiningalternative representations of
¯ scenes and extending the modelto account for response
time.
Summary
& Conclusions
Whenpeople learn the locations of objects in a spatial
layout, they seem to encodelocation with respect to
reference frames that dependon point of view. Multiple
views of a spatial layout produce multiple viewpoint
dependent
rewesentations
in memory,not a single
viewpoint independentrepresentation. These results are
consistent with recent findings in the domainof visual
object recognition (e.g., Edelman&Bulthoff, 1992; Tan’,
1995). Apparently, intraobject spatial relations and
interobject spatial relations are represented
and processed
in similar ways by the humanbrain.
Acknowledgments
The research reported in this paper was supported in part
by National Science Foundation Grant SBR-9222002.
References
Edelman, S., &Bulthoff, H. (1992). Orientation
dependencein the recognition of familiar and
novel views of three-dimensional objects. Vision
Research, 32, 2385-2400.
Evans, G. W., & Pezdek, K. (1980). Cognitive mapping:
Knowledge
of real-world distance and location
information. Journal of Experimental
Psychology: HumanLearning and Memory, 6,
13-24.
Huttenlocher, J., Hedges, L. V., & Duncan,S. (1991).
Categoriesand particulars: Prototypeeffects in
estimating spatial location. Psychological
Review, 98, 352-376.
McNamara,
T. P. (1986). Mental representations
spatial relations. Cognitive Psychology, 18, 87121.
Murray, J. E. (1995). Imagining and namingrotated
natural objects. PsychologicalBulletin and
Review, 2, 239-243.
Poggio, T., &Girosi, F. (1990). Regularization
algorithmsfor learning that are equivalent to
multilayer networks. Science, 247, 978-982.
Presson,C. C., DeLange,
N., & I-Iazelrigg, M.D.
(1989).Orientationspecificity in spatial
memory:
Whatmakesa path different from a
mapof the path? Journalof Experimental
Psychology:Learning, Memory,and Cognition,
15, 887-897.
Rieser, J. J. (1989).Accessto knowledge
of spatial
structure at novelpoints of observation.Journal
of ExperimentalPsychology:Learning, Memory,
and Cognition, 15, 1157-1165.
Roskos-Ewoldsen,
B., McNamara,
T. P., & Can’, W.S.
(1995). Mentalrepresentationsof large andsmall
spatial layoutsare viewpointdependent.
Manuscriptsubmittedfor publication.
Shelton,A., &McNamara,
T. P. (1996). Multiple views
of spatial memory.Manuscript.
Tart, M. J. (1995). Rotatingobjects to recognizethem:
Acase study on the role of viewpointdependency
in the recognitionof three-dimensional
objects.
PsychonomicBulletin & Review, 2, 55-82.
Tversky,B. (1981). Distortions in memory
for maps.
Cognitive Psychology,13, 407-433.
72
Download