Addressee- and Object-centered Frames of Reference

advertisement
From: AAAI Technical Report SS-96-03. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved.
Addressee- and Object-centered Frames of Reference
in Spatial Descriptions
Michael F. Schober
NewSchool for Social Research
Department of Psychology, AL-340
Graduate Faculty
65 Fifth Avenue
New York, NY10003
schober@newschool.edu
Abstract
Animportantquestionfor somespatial languagesystemsis howto
select the framesof reference that underlie the productionand
interpretationof spatial expressions.Thepsychological
experiments
describedhere examinewhetherthe usual classification schemefor
referenceframesshouldbe modifiedto includea referenceframefor
the addresseeof a spatial expression,or whetherconsideringthe
addresseeto be like anotherobjectin the environment
is appropriate.
Theexperimentscomparehowquicklypeoplecan describethe same
location from the perspective of another parson and from the
perspectiveof an object (a chair). Resultssuggestthat taking
object’sperspectivecanbe easier than takinganotherparson’s.This
has implicationsfor (andraises questionsabout)howsystemsshould
represent conversational partners and howoften they should
instsntiatereferenceframesduringprocessing
of spatial expressions.
system shares her frame of reference, and to produce and
interpret spatial descriptions accordingly.
Introduction
But computational theories tend to focus on details of the
productionor interpretation of spatial terms once a reference
framehas already beenselected. Less attention has beenpaid
to howframesof referenceshouldbe selected in the first place.
But if systemsapplications are to reflect the full repertoire of
humanspatial language use, reference frame selection will
become a much more complicated issue. This will be
especially true in any case wherethe originator of a spatial
description (systemor user) does not share the vantagepoint
the addressee (system or user). This could happen, for
instance, in a multi-user spatial application wheredifferent
communicating
users have different points of view. Or it could
happen when a mobile user communicateswith an off-site
system about a real-world scene (see, for example, Davis’
1989BackSeat Driver system, in whichthe reference frame of
the user, a car driver, must be recomputedas the driver’s
vantage point and distance from other objects change).
Multiple coordinatesystemswill needto be representedin such
systems, and there will be reference frameselection issues in
both productionand interpretation of spatial language.
For manycurrent applications, this is the appropriatestrategy,
because reference frame selection isn’t a problemfor them.
For example,if the systemand user are guaranteedto havethe
same vantage point on a scene, assuminga viewer-centered
frameof referenceis safe. Theuser is likely to assumethat the
For such applications it will be important to understandmore
about howpeople choose reference frames in conversations
with other people, and what they deemtheir choices to be.
This is not because systems must be modeled on human
processingstrategies; they needn’t be, althoughsometimesthis
can be desirable. Rather,the usability of intelligent interfaces
Manyspatial descriptions cannot be producedor interpreted
correctly without being assigned a reference frame. For
example,"on the left’ mightrefer either to a speaker’sleft, her
addressee’sletL or the left of an object in the environment
like
a car or a chair. Theexact spatial relation cannot be inferred
without knowingwhichframe of reference is being used. This
is well knownto AI theorists and system designers, who
regularly include the frame of reference as a necessary
componentfor computationsinvolving spatial language (see,
e.g., Gapp 1995; Herskovits 1986, Lang, Carstensen &
Simmons1991, amongmanyothers).
92
delmndscritically on howwell the interface designers have
beenable to (~firnate humancognitiveand linguistic behaviors.
In earlier work on humanconversations about locations
(Sehober1993, 1995) I haveproposedthat one major factor
reftaelaceframeselectionis effort. Differentlevels of effort are
likely to be associated with taking different perspectives. I
have also proposedthat people have moreoptions in choosing
reference flames than theorists usually suppose.
Here I briefly review these proposals, and then describe some
further studies (Sohober&Bloom1995) that test one aspect
them. ThenI discuss unresolvedissues raised by the results
and potential computationalimplications.
Classification
of reference frames
The standard wayto classify reference framesis as deictic,
intrinsic, and ex~nsic (see, for example,Carlson-Radvansky
& Irwin 1993; Herskovits 1986; Lever 1989; Retz-Schmidt
1988, amongmanyothers). The same spatial description can
expressdiffea-eaat locations dependingon whichof these frames
it reflects. Deiotic expressions reflect the speaker’s or
viewer’s perspective; they have also been called egocentric
and speaker-centered.Intrinsic expressionsreflect the intrinsic
tops, bottoms,leRs, rights, fronts and backsof objects in an
environmentthat have these features (lamps have tops and
bottoms,but no lefts or rights; chairs haveall these features).
Intrinsic expressions havealso been called object-centered.
Extrinsic expressions reflect unchanging environmental
attributes like gravity and compasspoints. So "up" means
"awayfrom the pull of gravity" regardless of the frames of
reference of people or objects in the environment.Extrinsic
expressions have also been called absolute and environmentcentered.
There are problems with this scheme. Oneproblemhas to do
with cross-lingnistic validity: As Levinson(1996) points out,
somelanguages do not include all these options. Another
problemis )hat somespatial descriptions can be producedand
interpreted without reference to any of these coordinate
systems. These include expressions like "between," "near,"
and "in the middle,"whichLever(1989) calls local references
without a coordinate system and I have called neutral with
respect to frames of reference (Schober 1995). Another
problemis that someexpressions are ambiguouswith respect
to coordinatesystems. That is, whena speaker’sleft and right
coincidewiththe intrinsic left andright of an object in a scene,
an expressionthat uses "left" or "right" can’t be clearly marked
as indicating the speaker’sor the objeet’s frameof reference.
The problemI will be concernedwith here has to do with the
classification of addressees--thatis, conversationalpartners
that speakers are currently addressing. (Addresseescontrast
with other conversational participants whoare not currently
being addressedand with listeners like overhearerswhoare not
part of the current conversation, see Schober&Clark 1989).
Mysimple observation is that in humanspatial languageuse
there is always a speaker (or writer) whooriginates the
utterance. There is always an addressee or audience of some
sort, whetherit is a single personin the immediate
environment
93
who can provide feedback of understanding or a vaguer
unspecified future audience. Andthere is alwaysa reason for
the spatial communication,
like giving directions or localizing
an object for somepurpose.Spatial expressionsdo not arise in
a vacuum,and a wholecomplexof contextual factors figures
into their productionand interpretation.
In the simplest two-personface-to-face communication,then,
there are always two humancoordinate systems involved from
the start, no matter what frame of reference the speakers are
using in their spatial descriptions. Andthese two coordinate
systems can never coincide perfectly, although they can be
close enoughto be treated as identical for current purposes(see
Herskovits1986). For convenienceI will call the first person
the speaker and the secondperson the addressee. Thestandard
schemeclassifies the addressee’s frameof reference as either
deictic or intrinsic, and researchers havedoneboth.
Thosewhoclassify the addressee’sperspectiveas deictic (e.g.,
Herskovits 1986; Retz-Schmidt1988) rely on the fact that
speakers and addressees are both humanobservers of scenes.
Thosewhoclassify the addressee’svantagepoint as intrinsic.
classify all non-egocentricdescriptions as intrinsic (e.g.,
Fillmore 1982; Lever 1989), and this includes addressees.
Addressees are just like other objects in a speaker’s
environment
that havetheir ownfronts, backs, lefts, rights, tops
and bottoms.
In a sense, both characterizationsare right, and in a sense they
are beth wrong.Addresseesdo share the critical propertywith
other objects in the environment
of havingtheir owncoordinate
systems that speakers can refer to. And in taking an
a_ __ddressee’s
or an object’s perspective,a speakeris beingnonegocentric by taking the point of viewof an entity other than
herself.
But ~ are more like speakers than other objects in the
environment. Speakers must coordinate frames of reference
with their conversationalpartners, but they don’t haveto take
the needs of inanimate objects into account. Andspatial
descriptions are part of purposeful communicationwith
addressees,not objects.
Anotherway addressees are morelike speakers is that there
are some kinds of spatial descriptions unique to human
coordinatesystems: the kind that project onto the environment
in the visual field. For example,a descriptionlike "onthe left"
can be uttered and interpreted with respect to the speaker’s
ownintrinsic left, as whenthe target object whoselocation is
being described is on the speaker’s immediateleft. Or it can
be uttered and interpreted as a projection of the speaker’s
intrinsic left onto the environment,
as whenthe target object is
in the left half of the speaker’svisual field. Speakerscanutter
these "projected" kinds of expressions from the addressee’s
point of view, describingwhatis on, for instance, the left half
of the addressee’svisual field. But these kinds of descriptions
seemless appropriatefor objects that don’t havevisual fields,
like chairs. It seemsoddto describe somethingin front and to
the leR era chair as merelybeing on the chair’s left.
Myproposal (Schober 1993, 1995) is that addressee-centered
descriptions reflect a full-fledged perspective in their own
fight Thisperspectiveshares featureswithboth speaker-and
object-centered
perspectives,but it is qualitativelydifferent.
This proposalrequires empiricalvalidation. Oneimportant
kind of empiricalvalidation wouldbe a demonstration
that
speakers
treat
addressees
differently
from
other
objects
inthe
environment
duringthe production
andinterpretationof spatial
descriptions.In the studies describedhere (Schober&Bloom
1995),weuse a standardcognitivepsychologicalmethodto
examine
this in the caseof production.If peopletake different
amountsof time to produceobject- and addressee-centered
descriptions,this suggeststhat the mentalprocesses"involved
in their production
are different.It alsolendsplausibilityto the
proposal that addressee-centereddescriptions should be
classifieddifferentlythanobject-centered
descriptions,
andthat
eddressee-centered
descriptionsreflect a frameof referencein
their ownright.
Effort and reference frameselection
(in mylaboratoryweare currently followingup on this by
investigatinghowindividualdifferencesin perspective-taking
strategiesrelate to underlying
mentalrotationabilities).
In the secondstudy, speakersdescribinga far morecomplex
displayavoidedspeaker-centered
utterancesalmostentirely.
Forthis display,descriptionscouldeither be speaker-centered,
addressee-centered,ambiguous,
object-centered, or neutral
with respect to referenceframes.Again,speakerson average
preferredto take their partner’sperspectiveover their own.
Butevenmorepopularwereneutral descriptions, andthese
neutral descriptionsincreasedin frequencyoverthe courseof
the conversations.Neutraldescriptionsdon’t require either
party to shift perspectivesawayfromtheir own,becausethey
don’treflect a coordinatesystemat all. So in a sense, what
speakersin this studyweredoingwasminimizing
effort not
only for their parmersbut, to the extent they could, for
themselves.Theirreferenceframeselectionmininuz’ed effort
for bothparties.
Effort seemsto be oneof the factors involvedin reference
In lifts secondstudy,object-centered
perspectiveswereavoided
frameselection (Schober1993,1995). Different frames
almost completely.But becausethe objects in the display
referencerequire different amountsof effort to produceand
didn’t havevery compelling
intrinsic parts, weshouldn’tmake
comprehend.
Theevidencefromearlier studies of speakers’
too muchof this avoidance.Miller &Johnson-Laird(1976)
spatial descriptions(e.g., Herrmann
et al. 1987)showsthat
speakers
find
iteasier
totake
their
ownperspectives
thanthe claimthat takingan object’sperspectiveis less difficult than
taking anotherperson’s. Thusobject-centereddescriptions
perspectives
oftheir
conversational
partners,
asmeasured
by
latency
toproduction
oflocative
expressions.
Carlson- shouldbe preferred, becausethey requires an intermediate
Redvansky
& Irwin
(1994)
have
shown
that
verifying
a locativelevel of effort for bothparties. Thatis, takingthe object’s
expression
like
"above"
iseasier
when
itistrue
from
multipleperspective is harder for the speaker than speaking
egocentrically, but easier than taking the addressee’s
frames
ofreference,
rather
than
only
one.
perspective. Understanding
a description fromthe object’s
an
Butthe role of effort in referenceframeselectionis complex. perspectiveis harderfor the addresseethan understanding
addressee-centered
description,but easier than understanding
Peopledon’t produceonly descriptionsthat are easiest to
a speaker-centered
perspective.
produce(that is, egocentricdescriptions). Andthey don’t
automaticallyinterpret the spatial expressionsthey hear as
reflectingthe frameof referencethat wouldbe easiest for them In the studies I describehere, weexamineMiller &JohnsonLaird’s claim usinga display that contains an object with
to understand(their ownframeof reference). I proposethat
rather than minimizingtheir owneffort, people makemuch compelling
intrinsic parts. If Miller&Johnson-Laird
are right,
thendescribinga locationfromthe object’sperspectiveshould
more complex judgements. Whenthey hear a spatial
be harderthan speakingegocentrically,but easier thantaking
expression, they take into account the effort their
the addressee’sperspective.This is by no meansnecessarily
conversationalpartner wouldhaveto makein producingthat
true. Rcouldbe just as difficult to take the perspectiveof an
expressionfromthat viewpoint.When
they producea spatial
objectas to take the perspectiveof the addressee,since both
expression, they take into accountnot only howhard they
themselves
will find it to producethe expression,but also how require shifting points of viewawayfromone’s own.Or it
hardtheir addressee
will find the expression
to interpret. That couldbe easier to take the addressee’spoint of viewthan an
is, whilespeaker-centered
utterancesare easiest for speakers object’s, because spatial expressions are designed for
particularaddresseesandnot objects.
to produce,theyare harderfor addresseesto understandthan
addressee-centered
utterances.
In two laboratory studies (Schober, 1993, 1995) I have
investigatedthe perspectivechoicesconversational
participants
makeas theydescribespatial locationsto eachother. In the
first, speakers described a simple display, wheretheir
descriptions could be either speaker-centered,addresseecentered, or ambiguous(in cases wherethe speaker’s and
_ ~h’essee’sviewpointscoincided,the expressionsthemselves
couldhavereflectedeither perspective).
In this study,speakers
averagedfar moreaddressee-centeredthan speaker-centered
utterances.So on averagespeakersminimized
effort for their
partners by taking the addressee’sperspectiveTherewas,
however,
substantialvariability in the strategies pairs chose,
andin somepairs bothparties spokecompletely
egocentrically.
94
The studies
In both studies, weadaptedHemnann
et al. ’s (1987)method,
in whichspeakers describe locations on simple computer
graphicsdisplaysfromtheir ownor fromother perspectives,
and latencies to productionare measured.Speakersviewed
color displays of 3-D rooms on a CRTscreen, and we
measured
latencies to productionof locative expressionsas
theydescribedthe locationof a ball relative to a person(who
representsthe addressee)
or a chair(an objectthat has a front,
back,letI andright). Speakers
viewedexactlythe samescenes
on differentoccasions,but theywererequiredto take different
perspectiveson the different viewings.Thuswecouldcollect
within-subjectscomparisons
of latencies to production.
Experiment I
In the first study,19participantsviewed
a set of colordisplays
ofa 3-Droom(see FigureI for sampledisplays). Eachdisplay
containeda chair, a humanfigure, and a ball (the target
object). Foreachdisplay, participantswereaskedto describe
the locationof the target object usingonly the wordsfront,
back,left, or right.
Theparticipants did this three times for the entire set of
displays: once fromtheir ownperspective, once fromthe
chair’s perspective,andoncefromthe person’sperspective.
Theorderingof displays wasrandomized
andthe orderingof
perspectivescounterbalanced.
Oneachof the 32displays,the positionsandorientationsof the
chair, personandball variedas follows:Thechair,alwaysin
the centerof the room,facedeither to the viewer’sleft or right
(90° or 270°), or towardthe viewer(180°). Theperson,
alwaysagainsta wall, facedinto the roomeither nearlyat the
samevantagepoint as the viewer(0°), onthe left or right wall
(90° or 270°), or on the far wall (180°).Thehall wasplaced
so that the samespatialdescription
(left, right, front, or back)
couldnot be correctlyusedto describeits locationfrommore
than one perspective. For example,if the ball wasto the
viewer’s
left, it would
notsimultaneously
be to the left of either
the chair or the human
figure in the display. This was(1)
wecould be sure that participants wereusing the requested
perspectives,and(2) so wecouldbe sure that only oneframe
of referencewasbeingactivatedat a time.Spatialdescriptions
maybe easier to verify whentheyare true frommorethanone
perspective (Carlson-Radvansky
& Irwin, 1994,1995), and
they mayalso be easier to produce.
For eachtrial, participantswouldpress a keyandan imageof
the room,chair, andpersonwouldappear.Afterpressinga key
again, there wasan audible beep, and the ball appeared
somewhere
in the room.Theparticipant wouldthen describe
the bali’s locationusingthe requestedperspective.
Reactiontime results showed
that speakersreliably foundit
easiest to take their ownpoint of view. This replicates
Herrmann
et al.’s (1987)results. Speakersfoundit harder
take the chair’s perspective,and hardestto take the other
person’s perspective. Peoplefoundit easier to take the
addressee’sandobject’sperspectiveat 90° and270° offsets
thanat the 180° offset.
Peoplewereveryslowto take the perspectiveof the addressee
whenit sharedtheir vantagepoint (speakerslookedover the
shoulderof the compnter-graphic
personin the display). We
don’tbelievethis is a substantivefinding,since our speakers
seemed
to find these displaysconfusing.Asonewouldexpect,
when
participantstookthe addressee’s
perspective,the chair’s
offset hadnoeffect. When
they tookthe chair’s perspective,
the addressee’s
offset hadnoeffect.
Undercertain circumstances,then, the processesinvolvedin
takingthe perspectiveof anotherpersonseemto differ from
95
thoseinvolvedin takingthe perspectiveof someother objectin
the environment.
Thisprovidesinitial supportfor myproposal
that addressee-centereddescriptions reflect a flame of
referencein their ownright. It also neatlysupportsMiller&
Johason-Laird’s
(1976)claimthat object-centered
descriptions
are of intermediate difficulty for speakers--harderthan
speakingegocentrically,but easier thantakingthe addressee’s
perspective.
But there werepotential concernswith Experiment
1. First,
the addresseewasalwaysagainst the wall andthe chair was
alwaysin the centerof the room.Sincechair orientation and
addresseepositionvariedfromtrial to trial, the addressee’s
positionchanged
significantlyandquickly,whilethe chair’sdid
not. Speakersmayhavefoundit harderto take the addressee’s
perspectivein Experiment
1 merelybecauseof this.
Second,
for the particularstimuliusedin the study,the task of
taking the addressee’sperspectivehad a somewhat
different
quality thanthe task of takingthe chair’s perspective.When
the ball wasonthe chair’sleft, it wasonthe chair’simmediate
left, whilewhenthe ball wasto the left of the human
figure, it
wasin the left half of the human
figure’s field of vision, but
also out in front. This could be whythe speakers In
Experiment
1 wereslowerto take the addressee’sperspective.
Third, weweren’tconvincedthat speakersin Experiment1
really treated the human
figure as an addressee.Perhapsthey
merelyconceivedof it as a morecomplicatedobject, like a
statue of a person. If this wasthe case, then our results
wouldn’tshowthat addressee-centered
descriptionsreflect a
distinct frameof reference. Theywouldmerelyshowthat our
speakerstreated our CRTrepresentationof the addresseeas a
morecomplicated
object than the chair.
Experiment 2
To deal with these concerns, we ran Experiment2. 16
participants followedexactly the sameprocedureas for
Experiment
I, but with certain differences. Eachof the 336
displayscontained
either the chair or the person,but not both.
Boththe chair andthe personappearedin the center of the
roomandagainst the wall equallyoften, so that wecouldbe
find out whether our results from Experiment1 merely
reflectedthe fact that the chair andaddresseehadappearedin
different positionsin the room.Alsoto addressthis concem,
on 12 of the 14 blocks of trials, the chair or the person
appearedin the samepredictablespot fromdisplayto display.
On2 of the 14trials, they moved
aroundas in Experiment
1.
Toensurethat speakersreally treatedthe representation
of the
addresseeas an addressee,wetold participants that another
student whowouldsee the displays fromthe humanfigure’s
perspectivewouldbe listening to their descriptions. That
studentwouldthenbe requiredto locatethe ball on the basis of
thosedescriptions.
Weusedonly displaysof situations whereleft or right would
be appropriate,andweusedonly situations wherethe target
objectwason the chair’s or person’simmediate
left or right.
Thuswecould ensurea fair comparisonbetweenaddresseeandobject-centereddescriptions,since no descriptionswould
Figure 1: Sampledisplays, ExperimentI
be the sort "projected"into the environment.
Wedecidedit
wasreasonableto omit the front and backsituations from
Experiment
1 to avoid presentingthe participants with .an
unreasonable
number
of stimuli. This wasjustified becausem
Experiment1 participants had exactly the samepattern of
results for producing
left andright as for front andback.We
also omitted0-degreeoffset displays, becauseExperiment
1
participants had foundthese confusing.(See Figure2 for
sampledisplays).
Theresults replicated those of Experiment1. Onceagain,
addressee-centereddescriptions took reliably longer than
object-centereddescriptions.Thisshowsthat the findingsin
Experiment1 didn’t result only fromthe problemswith its
stimuli.
Peoplefoundit mucheasier to take either perspectivewhenthe
addresseeor object remainedin a predictable spot. The
difference betweentaking the addressee’s and object’s
perspectives wasmarginallygreater whenthe addresseeor
object moved
fromtrial to trial. Thedifferenceswereless
extremewhenthe addresseeor objectremained
in a predictable
spot. But addressee-centered
descriptions werenonetheless
moredifficuR,whetherthe referenceobject (the chair or the
person)moved
fromtrial to trial or not.
Together,these experimentsshowthat taking the spatial
perspectivesof anotherpersonis different than taking the
perspective of a non-human
object (a chair). The results
supportMiller&Johnson-Laird’s
(1976)proposalthat taking
an object’s perspectiveis of intermediatedifficulty, more
difficult thanspeakingegocentrically,andless difficult than
takinga conversational
partner’sperspective.Theresults also
argueagainst classifyingaddressee-centered
descriptionsas
Otherdeictic or intrinsic. Theyseemto havespecialproperties
all their own.
experiments
the participants wererequiredto speakfromthe
chair’s or the person’sperspectiverepeatedlyfor a blockof
displays.Buttheydosuggestthat in situationswherespeakers
choose to (or must) minimizeeffort, object-centered
descriptionswill be moredifficult for themto producethan
egocentric
ones,but less difficult thanaddressee-centered
ones.
This mayultimatelyaffect howtheychoosereferenceframes.
3. Exactly howare conversational partners mentally
represented?Arethey like other objects but morecomplex,or
are they special? Theresults here presentvery preliminary
evidencethat addresseesare mentallyrepresentedin a special
way. This is consistent with Clark & Marshall’s (1981)
proposal that each particular addressee is distinctly
represented,andthat informationaboutparticular addressees
is active and salient duringlanguageprocessingwith those
addressees.Butif this is so, it raises a puzzle.If addressees
are really speciallyavailableduringutteranceproduction,then
whyshould speakers take longer to produce addresseecentereddescriptionsthan object-centereddescriptions?It
seemsreasonablethat there shouldbe extra processesinvolved
in addressee-centered
description,andthese shouldtake extra
time (Henmann1989). But shouldn’t the addressee
availableduringobject-centered
descriptionsas well, sincethe
utterancesare presumably
designedfor the addressee?Andso
shouldn’tobject-centereddescriptionstake evenlongerthan
addressee-centereddescriptions, becauseboth the object’s
perspective and the addressee’s perspective must become
active?
Ourdata don’t provide an answerto these questions. One
possibility is that in taking an object’s perspectivethe
addressee’spoint of viewdoesn’thaveto be considered,and
thusit doesn’thaveto be fully activated.
4. If effort is a significant component
of referenceframe
selection,thenan importantissue will be howto representit.
Wherever
in a systemreferenceframeselectionhappens,there
will needto be a component
that weightsparticular reference
Computational Implications
framesfor howdifficult theyare to produce
andinterpretin the
will haveto takeinto account
Webeginto see that spatial perspective-taking
in conversation currentsituation. Thisweighting
contextualfactors like particular arrangements
of objectsin
is a complicatedaffair. Thedetails of human
perspectivescenesandparticular displacements
of speakerandaddressee
taking behavior haveimplications for modelsof spatial
vantagepoints. Afurther complication
to this picture is that
language,andhere are someof them.Just howserious these
different peoplemayhavedifferent weight’.mgs
for effort. A
implicationsare depends,of course,onthe application.
studyI amcurrentlyengaged
in analyzingis suggestingthat
people’smentalrotation abilities affect their strategies in
1. If the addressee-centered
flameof referencereally is a
with their assessments
referenceframein its ownfight, thensystemswhereaddressees selectingperspectives,in combination
of
their
conversational
partners’
abilities.
If this is the case,
(systemor user) havedifferent vantagepointsthan speakers
(user or system)will haveto includean addressee-centered then user modelsmayneed to represent a componentthat
referenceframein its repertoire. Usersspeakingto systems weightsreferenceframelikelihoodfor particularindividuals.
shouldbe understood
whethertheyspeakegocentricallyor take
an ~,t,tressee-ceuteredperspective.Systems
involvingmultiple 5. Oneprovocativepiece of data here is that the difference
betweentaking the addressee’sandobject’s perspectiveswas
users withdifferentvantagepointswill haveto representthe
less extremewhenthe addresseeor object remainedin a
pointsof viewof eachuser andinterpretutterancesin light of
predictablespot. Theappropriatenessof generalizingthis
thosereferenceframes.
result beyondthe studies describedhere dependson whether
spatial descriptionsin the situations to be modeled
are more
2. [fan effort-basedaccountof perspectivechoiceis right, the
conditionin whichthe referenceobject
data describedheresuggestthat takingan object’sperspective like the experimental
fromtrial to trial or the stationarycase.Bothsituations
canbe easier thantakinganotherperson’sperspective.These moved
can
happen
in the real world, although our experimental
data do not showthat peopleprefer takingan object-centered
situationwasartificial in the regularityandpredictabilitywith
perspective to an addressee-centeredone, since in our
97
Figure 2: Sampledisplays, Experiment2
98
which experimental participants
descriptions.
Davis, J.R. 1989. BackSeat Driver: Voice Assisted
AutomobileNavigation. M.Sc. Thesis, Dept. Of
Architecture, MIT.
had to produce their
Thesestudies also raise questions for the future that have
significant implications for systemsthat try to modelchanging
environments.Whendo speakers instantiate a reference frame
or coordinate system? Certainly they must do it before they
producetheir spatial expressions (Herskovits, 1986). But how
often do they update them? Once a reference frame is
instantiated is it kept until speakers get evidence that
circumstances have changed? Or do speakers reconfLrrn
reference framesregularly, especially if intervening spoken
material hasn’t kept a reference frameactive? Onespeculation
censistentwiththis result is that speakersinstantiate a frameof
reference for a fn’st spatial description, and it can be more
difficult for themto do this for an addresseethan for an object
(at least the one we used). But oncethe frameof reference has
beeninstantiated, it needn’tbe reinstantiated for subsequent
descriptions, as long as the reference object or addressee
doesn’t changevantage points.
Finally, I shouldnote that effort is not the onlyfactor involved
in reference frame selection. Speakers are affected by the
reference frames they have already chosen and the reference
frames their conversational partners have chosen(see Garrod
& Anderson 1987). Speakers are also affected by what
reference frames their conversational partners give them
evidence of having understood, and they will reframe their
descriptions from a different perspective when they are
misunderstood(Schober 1993). This should remindus that
humanconversation repair systems act as a safety net for
misunderstanding,and it is a reasonablestrategy in modeling
spatial languageto invest significant computationalresources
in the diagnosis and repair of misunderstanding(Brennan
Hulteen 1995; Schober 1996).
Fillmore, C.J. 1982. Towardsa Descriptive Frameworkfor
Spatial Deixis. In R.J. Jarvella &W.Klein, eds., Speech,
Place, andAction(pp. 31-59). Chichester: Wiley.
Gapp, K.-P. 1995. ObjectLocalization: Selection of
OptimalReference Objects. In Proceedingsof the 2nd
International Conferenceon Spatial InformationTheory,
COSIT95.
Garrod, S., and Anderson, A. 1987. Saying What You
Meanin Dialogue: A Study in Conceptual and Semantic Coordination. Cognition 27:181-218.
Herrmann,T., Bttrkle, B., and Nirmaier, H. 1987. Zur
hOrerbezogenenRaumreferenz: HOrerposition und
Lokalisationsanfwand[On Listener-Oriented Spatial
Reference:Listener Position and LocalizationEffort].
Sprache & Kognition 3:126-137.
Herrmann,T. 1989. PartnerbezogenenObjektlocalisation-ein Neues Sprachpsychologisehes Forschungsthema
[Partner-oriented Localization of objects--a New
Psycholinguistic ResearchTopic], Bericht Nr. 25,
Forsehergruppe "Spreehen und Spraehverstehen im sozialen
Kontext," University of Mannheim.
Herskovits, A. 1986. Languageand Spatial Cognition: An
Interdisciplinary Study of the Prepositionsin English.
Cambridge,UK:CambridgeUniversity Press.
Lang, E., Carstensen, K.-U., and Simmons,G. 1991.
ModelingSpatial Knowledgeon a Linguistic Basis (Lecture
Notes in ComputerScience 481). Berlin: Springer-Verlag.
LeveR, W.J.M.1989. Speaking: FromIntention to
Articulation. Cambridge,MA:MITPress.
References
Brennan,S.E., and Hulteen, E.A. 1995. Interaction and
Feedbackin a SpokenLanguageSystem: A Theoretical
Framework. Knowledge-BasedSystems 8:143-151.
Levinson, S. 1996. Frames of Reference and Molyneux’s
Question: Cross-Linguistic Evidence. In P. Bloom,M.A.
Peterson, L. Nadel, and M. Garrett, eds., Spaceand
Language. Cambridge: MITPress. Forthcoming.
Carlson-Radvansky,L.A., and Irwin, D.E. 1993. Framesof
Reference in Vision and Language:Whereis Above?
Cognition 46:223-244.
Miller, G.A., and Johnson-Laird, P.N. 1976. Languageand
Perception. Cambridge,MA:HarvardUniversity Press.
Carlson-Radvansky,L.A., and Irwin, D.E. 1994. Reference
FrameActivation During Spatial TermAssignment. Journal
of Memoryand Language, 33,646-671.
Retz-Schmidt, G. 1988. Various Views on Spatial
Prepositions. AIMagazine9(2):95-105.
Schober, M.F. 1993. Spatial Perspective-Taking in
Conversation. Cognition 47:1-24.
Carlson-Radvansky, L.A. 1995. Cooperation AmongSpatial
ReferenceFrames.Poster session presented at the 36th
AnnualMeetingof the PsychonomieSociety, Los Angeles,
CA.
Schober, M.F. 1995. Speakers, Addressees, and Framesof
Reference: WhoseEffort is Minimizedin Conversations
About Locations? Discourse Processes 20:219-247.
Clark, H.H., and Marshall, C.R. 1981. Definite Reference
and MutualKnowledge.In A.H. Joshi, B. Webber,and I.A.
Sag, eds., Elementsof Discourse Understanding(pp. 1063). Cambridge,UK:CambridgeUniversity Press.
Schober, M.F. 1996. HowAddressees Affect Spatial
Perspective Choicein Dialogue. In P.L. Olivier and K.-P.
Gap, eds., Representation and Processing of Spatial
99
Expressions. Hillsdale, N J: LawrenceErlbaum.
Forthcoming.
Schober, M.F., and Bloom,J.E. 1995. The Relative Ease of
Producing Egocentric, Addressee-Centered,and ObjeetCenteredSpatial Descriptions. Poster session presented at
the 36th AnnualMeetingof the PsychonomicSociety, Los
Angeles, CA.
Schober, M.F., and Clark, H.H. 1989. Understandingby
Addressees and Overhearers. Cognitive Psychology
21:211-232.
100
Download