From: AAAI Technical Report SS-96-03. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. Addressee- and Object-centered Frames of Reference in Spatial Descriptions Michael F. Schober NewSchool for Social Research Department of Psychology, AL-340 Graduate Faculty 65 Fifth Avenue New York, NY10003 schober@newschool.edu Abstract Animportantquestionfor somespatial languagesystemsis howto select the framesof reference that underlie the productionand interpretationof spatial expressions.Thepsychological experiments describedhere examinewhetherthe usual classification schemefor referenceframesshouldbe modifiedto includea referenceframefor the addresseeof a spatial expression,or whetherconsideringthe addresseeto be like anotherobjectin the environment is appropriate. Theexperimentscomparehowquicklypeoplecan describethe same location from the perspective of another parson and from the perspectiveof an object (a chair). Resultssuggestthat taking object’sperspectivecanbe easier than takinganotherparson’s.This has implicationsfor (andraises questionsabout)howsystemsshould represent conversational partners and howoften they should instsntiatereferenceframesduringprocessing of spatial expressions. system shares her frame of reference, and to produce and interpret spatial descriptions accordingly. Introduction But computational theories tend to focus on details of the productionor interpretation of spatial terms once a reference framehas already beenselected. Less attention has beenpaid to howframesof referenceshouldbe selected in the first place. But if systemsapplications are to reflect the full repertoire of humanspatial language use, reference frame selection will become a much more complicated issue. This will be especially true in any case wherethe originator of a spatial description (systemor user) does not share the vantagepoint the addressee (system or user). This could happen, for instance, in a multi-user spatial application wheredifferent communicating users have different points of view. Or it could happen when a mobile user communicateswith an off-site system about a real-world scene (see, for example, Davis’ 1989BackSeat Driver system, in whichthe reference frame of the user, a car driver, must be recomputedas the driver’s vantage point and distance from other objects change). Multiple coordinatesystemswill needto be representedin such systems, and there will be reference frameselection issues in both productionand interpretation of spatial language. For manycurrent applications, this is the appropriatestrategy, because reference frame selection isn’t a problemfor them. For example,if the systemand user are guaranteedto havethe same vantage point on a scene, assuminga viewer-centered frameof referenceis safe. Theuser is likely to assumethat the For such applications it will be important to understandmore about howpeople choose reference frames in conversations with other people, and what they deemtheir choices to be. This is not because systems must be modeled on human processingstrategies; they needn’t be, althoughsometimesthis can be desirable. Rather,the usability of intelligent interfaces Manyspatial descriptions cannot be producedor interpreted correctly without being assigned a reference frame. For example,"on the left’ mightrefer either to a speaker’sleft, her addressee’sletL or the left of an object in the environment like a car or a chair. Theexact spatial relation cannot be inferred without knowingwhichframe of reference is being used. This is well knownto AI theorists and system designers, who regularly include the frame of reference as a necessary componentfor computationsinvolving spatial language (see, e.g., Gapp 1995; Herskovits 1986, Lang, Carstensen & Simmons1991, amongmanyothers). 92 delmndscritically on howwell the interface designers have beenable to (~firnate humancognitiveand linguistic behaviors. In earlier work on humanconversations about locations (Sehober1993, 1995) I haveproposedthat one major factor reftaelaceframeselectionis effort. Differentlevels of effort are likely to be associated with taking different perspectives. I have also proposedthat people have moreoptions in choosing reference flames than theorists usually suppose. Here I briefly review these proposals, and then describe some further studies (Sohober&Bloom1995) that test one aspect them. ThenI discuss unresolvedissues raised by the results and potential computationalimplications. Classification of reference frames The standard wayto classify reference framesis as deictic, intrinsic, and ex~nsic (see, for example,Carlson-Radvansky & Irwin 1993; Herskovits 1986; Lever 1989; Retz-Schmidt 1988, amongmanyothers). The same spatial description can expressdiffea-eaat locations dependingon whichof these frames it reflects. Deiotic expressions reflect the speaker’s or viewer’s perspective; they have also been called egocentric and speaker-centered.Intrinsic expressionsreflect the intrinsic tops, bottoms,leRs, rights, fronts and backsof objects in an environmentthat have these features (lamps have tops and bottoms,but no lefts or rights; chairs haveall these features). Intrinsic expressions havealso been called object-centered. Extrinsic expressions reflect unchanging environmental attributes like gravity and compasspoints. So "up" means "awayfrom the pull of gravity" regardless of the frames of reference of people or objects in the environment.Extrinsic expressions have also been called absolute and environmentcentered. There are problems with this scheme. Oneproblemhas to do with cross-lingnistic validity: As Levinson(1996) points out, somelanguages do not include all these options. Another problemis )hat somespatial descriptions can be producedand interpreted without reference to any of these coordinate systems. These include expressions like "between," "near," and "in the middle,"whichLever(1989) calls local references without a coordinate system and I have called neutral with respect to frames of reference (Schober 1995). Another problemis that someexpressions are ambiguouswith respect to coordinatesystems. That is, whena speaker’sleft and right coincidewiththe intrinsic left andright of an object in a scene, an expressionthat uses "left" or "right" can’t be clearly marked as indicating the speaker’sor the objeet’s frameof reference. The problemI will be concernedwith here has to do with the classification of addressees--thatis, conversationalpartners that speakers are currently addressing. (Addresseescontrast with other conversational participants whoare not currently being addressedand with listeners like overhearerswhoare not part of the current conversation, see Schober&Clark 1989). Mysimple observation is that in humanspatial languageuse there is always a speaker (or writer) whooriginates the utterance. There is always an addressee or audience of some sort, whetherit is a single personin the immediate environment 93 who can provide feedback of understanding or a vaguer unspecified future audience. Andthere is alwaysa reason for the spatial communication, like giving directions or localizing an object for somepurpose.Spatial expressionsdo not arise in a vacuum,and a wholecomplexof contextual factors figures into their productionand interpretation. In the simplest two-personface-to-face communication,then, there are always two humancoordinate systems involved from the start, no matter what frame of reference the speakers are using in their spatial descriptions. Andthese two coordinate systems can never coincide perfectly, although they can be close enoughto be treated as identical for current purposes(see Herskovits1986). For convenienceI will call the first person the speaker and the secondperson the addressee. Thestandard schemeclassifies the addressee’s frameof reference as either deictic or intrinsic, and researchers havedoneboth. Thosewhoclassify the addressee’sperspectiveas deictic (e.g., Herskovits 1986; Retz-Schmidt1988) rely on the fact that speakers and addressees are both humanobservers of scenes. Thosewhoclassify the addressee’svantagepoint as intrinsic. classify all non-egocentricdescriptions as intrinsic (e.g., Fillmore 1982; Lever 1989), and this includes addressees. Addressees are just like other objects in a speaker’s environment that havetheir ownfronts, backs, lefts, rights, tops and bottoms. In a sense, both characterizationsare right, and in a sense they are beth wrong.Addresseesdo share the critical propertywith other objects in the environment of havingtheir owncoordinate systems that speakers can refer to. And in taking an a_ __ddressee’s or an object’s perspective,a speakeris beingnonegocentric by taking the point of viewof an entity other than herself. But ~ are more like speakers than other objects in the environment. Speakers must coordinate frames of reference with their conversationalpartners, but they don’t haveto take the needs of inanimate objects into account. Andspatial descriptions are part of purposeful communicationwith addressees,not objects. Anotherway addressees are morelike speakers is that there are some kinds of spatial descriptions unique to human coordinatesystems: the kind that project onto the environment in the visual field. For example,a descriptionlike "onthe left" can be uttered and interpreted with respect to the speaker’s ownintrinsic left, as whenthe target object whoselocation is being described is on the speaker’s immediateleft. Or it can be uttered and interpreted as a projection of the speaker’s intrinsic left onto the environment, as whenthe target object is in the left half of the speaker’svisual field. Speakerscanutter these "projected" kinds of expressions from the addressee’s point of view, describingwhatis on, for instance, the left half of the addressee’svisual field. But these kinds of descriptions seemless appropriatefor objects that don’t havevisual fields, like chairs. It seemsoddto describe somethingin front and to the leR era chair as merelybeing on the chair’s left. Myproposal (Schober 1993, 1995) is that addressee-centered descriptions reflect a full-fledged perspective in their own fight Thisperspectiveshares featureswithboth speaker-and object-centered perspectives,but it is qualitativelydifferent. This proposalrequires empiricalvalidation. Oneimportant kind of empiricalvalidation wouldbe a demonstration that speakers treat addressees differently from other objects inthe environment duringthe production andinterpretationof spatial descriptions.In the studies describedhere (Schober&Bloom 1995),weuse a standardcognitivepsychologicalmethodto examine this in the caseof production.If peopletake different amountsof time to produceobject- and addressee-centered descriptions,this suggeststhat the mentalprocesses"involved in their production are different.It alsolendsplausibilityto the proposal that addressee-centereddescriptions should be classifieddifferentlythanobject-centered descriptions, andthat eddressee-centered descriptionsreflect a frameof referencein their ownright. Effort and reference frameselection (in mylaboratoryweare currently followingup on this by investigatinghowindividualdifferencesin perspective-taking strategiesrelate to underlying mentalrotationabilities). In the secondstudy, speakersdescribinga far morecomplex displayavoidedspeaker-centered utterancesalmostentirely. Forthis display,descriptionscouldeither be speaker-centered, addressee-centered,ambiguous, object-centered, or neutral with respect to referenceframes.Again,speakerson average preferredto take their partner’sperspectiveover their own. Butevenmorepopularwereneutral descriptions, andthese neutral descriptionsincreasedin frequencyoverthe courseof the conversations.Neutraldescriptionsdon’t require either party to shift perspectivesawayfromtheir own,becausethey don’treflect a coordinatesystemat all. So in a sense, what speakersin this studyweredoingwasminimizing effort not only for their parmersbut, to the extent they could, for themselves.Theirreferenceframeselectionmininuz’ed effort for bothparties. Effort seemsto be oneof the factors involvedin reference In lifts secondstudy,object-centered perspectiveswereavoided frameselection (Schober1993,1995). Different frames almost completely.But becausethe objects in the display referencerequire different amountsof effort to produceand didn’t havevery compelling intrinsic parts, weshouldn’tmake comprehend. Theevidencefromearlier studies of speakers’ too muchof this avoidance.Miller &Johnson-Laird(1976) spatial descriptions(e.g., Herrmann et al. 1987)showsthat speakers find iteasier totake their ownperspectives thanthe claimthat takingan object’sperspectiveis less difficult than taking anotherperson’s. Thusobject-centereddescriptions perspectives oftheir conversational partners, asmeasured by latency toproduction oflocative expressions. Carlson- shouldbe preferred, becausethey requires an intermediate Redvansky & Irwin (1994) have shown that verifying a locativelevel of effort for bothparties. Thatis, takingthe object’s expression like "above" iseasier when itistrue from multipleperspective is harder for the speaker than speaking egocentrically, but easier than taking the addressee’s frames ofreference, rather than only one. perspective. Understanding a description fromthe object’s an Butthe role of effort in referenceframeselectionis complex. perspectiveis harderfor the addresseethan understanding addressee-centered description,but easier than understanding Peopledon’t produceonly descriptionsthat are easiest to a speaker-centered perspective. produce(that is, egocentricdescriptions). Andthey don’t automaticallyinterpret the spatial expressionsthey hear as reflectingthe frameof referencethat wouldbe easiest for them In the studies I describehere, weexamineMiller &JohnsonLaird’s claim usinga display that contains an object with to understand(their ownframeof reference). I proposethat rather than minimizingtheir owneffort, people makemuch compelling intrinsic parts. If Miller&Johnson-Laird are right, thendescribinga locationfromthe object’sperspectiveshould more complex judgements. Whenthey hear a spatial be harderthan speakingegocentrically,but easier thantaking expression, they take into account the effort their the addressee’sperspective.This is by no meansnecessarily conversationalpartner wouldhaveto makein producingthat true. Rcouldbe just as difficult to take the perspectiveof an expressionfromthat viewpoint.When they producea spatial objectas to take the perspectiveof the addressee,since both expression, they take into accountnot only howhard they themselves will find it to producethe expression,but also how require shifting points of viewawayfromone’s own.Or it hardtheir addressee will find the expression to interpret. That couldbe easier to take the addressee’spoint of viewthan an is, whilespeaker-centered utterancesare easiest for speakers object’s, because spatial expressions are designed for particularaddresseesandnot objects. to produce,theyare harderfor addresseesto understandthan addressee-centered utterances. In two laboratory studies (Schober, 1993, 1995) I have investigatedthe perspectivechoicesconversational participants makeas theydescribespatial locationsto eachother. In the first, speakers described a simple display, wheretheir descriptions could be either speaker-centered,addresseecentered, or ambiguous(in cases wherethe speaker’s and _ ~h’essee’sviewpointscoincided,the expressionsthemselves couldhavereflectedeither perspective). In this study,speakers averagedfar moreaddressee-centeredthan speaker-centered utterances.So on averagespeakersminimized effort for their partners by taking the addressee’sperspectiveTherewas, however, substantialvariability in the strategies pairs chose, andin somepairs bothparties spokecompletely egocentrically. 94 The studies In both studies, weadaptedHemnann et al. ’s (1987)method, in whichspeakers describe locations on simple computer graphicsdisplaysfromtheir ownor fromother perspectives, and latencies to productionare measured.Speakersviewed color displays of 3-D rooms on a CRTscreen, and we measured latencies to productionof locative expressionsas theydescribedthe locationof a ball relative to a person(who representsthe addressee) or a chair(an objectthat has a front, back,letI andright). Speakers viewedexactlythe samescenes on differentoccasions,but theywererequiredto take different perspectiveson the different viewings.Thuswecouldcollect within-subjectscomparisons of latencies to production. Experiment I In the first study,19participantsviewed a set of colordisplays ofa 3-Droom(see FigureI for sampledisplays). Eachdisplay containeda chair, a humanfigure, and a ball (the target object). Foreachdisplay, participantswereaskedto describe the locationof the target object usingonly the wordsfront, back,left, or right. Theparticipants did this three times for the entire set of displays: once fromtheir ownperspective, once fromthe chair’s perspective,andoncefromthe person’sperspective. Theorderingof displays wasrandomized andthe orderingof perspectivescounterbalanced. Oneachof the 32displays,the positionsandorientationsof the chair, personandball variedas follows:Thechair,alwaysin the centerof the room,facedeither to the viewer’sleft or right (90° or 270°), or towardthe viewer(180°). Theperson, alwaysagainsta wall, facedinto the roomeither nearlyat the samevantagepoint as the viewer(0°), onthe left or right wall (90° or 270°), or on the far wall (180°).Thehall wasplaced so that the samespatialdescription (left, right, front, or back) couldnot be correctlyusedto describeits locationfrommore than one perspective. For example,if the ball wasto the viewer’s left, it would notsimultaneously be to the left of either the chair or the human figure in the display. This was(1) wecould be sure that participants wereusing the requested perspectives,and(2) so wecouldbe sure that only oneframe of referencewasbeingactivatedat a time.Spatialdescriptions maybe easier to verify whentheyare true frommorethanone perspective (Carlson-Radvansky & Irwin, 1994,1995), and they mayalso be easier to produce. For eachtrial, participantswouldpress a keyandan imageof the room,chair, andpersonwouldappear.Afterpressinga key again, there wasan audible beep, and the ball appeared somewhere in the room.Theparticipant wouldthen describe the bali’s locationusingthe requestedperspective. Reactiontime results showed that speakersreliably foundit easiest to take their ownpoint of view. This replicates Herrmann et al.’s (1987)results. Speakersfoundit harder take the chair’s perspective,and hardestto take the other person’s perspective. Peoplefoundit easier to take the addressee’sandobject’sperspectiveat 90° and270° offsets thanat the 180° offset. Peoplewereveryslowto take the perspectiveof the addressee whenit sharedtheir vantagepoint (speakerslookedover the shoulderof the compnter-graphic personin the display). We don’tbelievethis is a substantivefinding,since our speakers seemed to find these displaysconfusing.Asonewouldexpect, when participantstookthe addressee’s perspective,the chair’s offset hadnoeffect. When they tookthe chair’s perspective, the addressee’s offset hadnoeffect. Undercertain circumstances,then, the processesinvolvedin takingthe perspectiveof anotherpersonseemto differ from 95 thoseinvolvedin takingthe perspectiveof someother objectin the environment. Thisprovidesinitial supportfor myproposal that addressee-centereddescriptions reflect a flame of referencein their ownright. It also neatlysupportsMiller& Johason-Laird’s (1976)claimthat object-centered descriptions are of intermediate difficulty for speakers--harderthan speakingegocentrically,but easier thantakingthe addressee’s perspective. But there werepotential concernswith Experiment 1. First, the addresseewasalwaysagainst the wall andthe chair was alwaysin the centerof the room.Sincechair orientation and addresseepositionvariedfromtrial to trial, the addressee’s positionchanged significantlyandquickly,whilethe chair’sdid not. Speakersmayhavefoundit harderto take the addressee’s perspectivein Experiment 1 merelybecauseof this. Second, for the particularstimuliusedin the study,the task of taking the addressee’sperspectivehad a somewhat different quality thanthe task of takingthe chair’s perspective.When the ball wasonthe chair’sleft, it wasonthe chair’simmediate left, whilewhenthe ball wasto the left of the human figure, it wasin the left half of the human figure’s field of vision, but also out in front. This could be whythe speakers In Experiment 1 wereslowerto take the addressee’sperspective. Third, weweren’tconvincedthat speakersin Experiment1 really treated the human figure as an addressee.Perhapsthey merelyconceivedof it as a morecomplicatedobject, like a statue of a person. If this wasthe case, then our results wouldn’tshowthat addressee-centered descriptionsreflect a distinct frameof reference. Theywouldmerelyshowthat our speakerstreated our CRTrepresentationof the addresseeas a morecomplicated object than the chair. Experiment 2 To deal with these concerns, we ran Experiment2. 16 participants followedexactly the sameprocedureas for Experiment I, but with certain differences. Eachof the 336 displayscontained either the chair or the person,but not both. Boththe chair andthe personappearedin the center of the roomandagainst the wall equallyoften, so that wecouldbe find out whether our results from Experiment1 merely reflectedthe fact that the chair andaddresseehadappearedin different positionsin the room.Alsoto addressthis concem, on 12 of the 14 blocks of trials, the chair or the person appearedin the samepredictablespot fromdisplayto display. On2 of the 14trials, they moved aroundas in Experiment 1. Toensurethat speakersreally treatedthe representation of the addresseeas an addressee,wetold participants that another student whowouldsee the displays fromthe humanfigure’s perspectivewouldbe listening to their descriptions. That studentwouldthenbe requiredto locatethe ball on the basis of thosedescriptions. Weusedonly displaysof situations whereleft or right would be appropriate,andweusedonly situations wherethe target objectwason the chair’s or person’simmediate left or right. Thuswecould ensurea fair comparisonbetweenaddresseeandobject-centereddescriptions,since no descriptionswould Figure 1: Sampledisplays, ExperimentI be the sort "projected"into the environment. Wedecidedit wasreasonableto omit the front and backsituations from Experiment 1 to avoid presentingthe participants with .an unreasonable number of stimuli. This wasjustified becausem Experiment1 participants had exactly the samepattern of results for producing left andright as for front andback.We also omitted0-degreeoffset displays, becauseExperiment 1 participants had foundthese confusing.(See Figure2 for sampledisplays). Theresults replicated those of Experiment1. Onceagain, addressee-centereddescriptions took reliably longer than object-centereddescriptions.Thisshowsthat the findingsin Experiment1 didn’t result only fromthe problemswith its stimuli. Peoplefoundit mucheasier to take either perspectivewhenthe addresseeor object remainedin a predictable spot. The difference betweentaking the addressee’s and object’s perspectives wasmarginallygreater whenthe addresseeor object moved fromtrial to trial. Thedifferenceswereless extremewhenthe addresseeor objectremained in a predictable spot. But addressee-centered descriptions werenonetheless moredifficuR,whetherthe referenceobject (the chair or the person)moved fromtrial to trial or not. Together,these experimentsshowthat taking the spatial perspectivesof anotherpersonis different than taking the perspective of a non-human object (a chair). The results supportMiller&Johnson-Laird’s (1976)proposalthat taking an object’s perspectiveis of intermediatedifficulty, more difficult thanspeakingegocentrically,andless difficult than takinga conversational partner’sperspective.Theresults also argueagainst classifyingaddressee-centered descriptionsas Otherdeictic or intrinsic. Theyseemto havespecialproperties all their own. experiments the participants wererequiredto speakfromthe chair’s or the person’sperspectiverepeatedlyfor a blockof displays.Buttheydosuggestthat in situationswherespeakers choose to (or must) minimizeeffort, object-centered descriptionswill be moredifficult for themto producethan egocentric ones,but less difficult thanaddressee-centered ones. This mayultimatelyaffect howtheychoosereferenceframes. 3. Exactly howare conversational partners mentally represented?Arethey like other objects but morecomplex,or are they special? Theresults here presentvery preliminary evidencethat addresseesare mentallyrepresentedin a special way. This is consistent with Clark & Marshall’s (1981) proposal that each particular addressee is distinctly represented,andthat informationaboutparticular addressees is active and salient duringlanguageprocessingwith those addressees.Butif this is so, it raises a puzzle.If addressees are really speciallyavailableduringutteranceproduction,then whyshould speakers take longer to produce addresseecentereddescriptionsthan object-centereddescriptions?It seemsreasonablethat there shouldbe extra processesinvolved in addressee-centered description,andthese shouldtake extra time (Henmann1989). But shouldn’t the addressee availableduringobject-centered descriptionsas well, sincethe utterancesare presumably designedfor the addressee?Andso shouldn’tobject-centereddescriptionstake evenlongerthan addressee-centereddescriptions, becauseboth the object’s perspective and the addressee’s perspective must become active? Ourdata don’t provide an answerto these questions. One possibility is that in taking an object’s perspectivethe addressee’spoint of viewdoesn’thaveto be considered,and thusit doesn’thaveto be fully activated. 4. If effort is a significant component of referenceframe selection,thenan importantissue will be howto representit. Wherever in a systemreferenceframeselectionhappens,there will needto be a component that weightsparticular reference Computational Implications framesfor howdifficult theyare to produce andinterpretin the will haveto takeinto account Webeginto see that spatial perspective-taking in conversation currentsituation. Thisweighting contextualfactors like particular arrangements of objectsin is a complicatedaffair. Thedetails of human perspectivescenesandparticular displacements of speakerandaddressee taking behavior haveimplications for modelsof spatial vantagepoints. Afurther complication to this picture is that language,andhere are someof them.Just howserious these different peoplemayhavedifferent weight’.mgs for effort. A implicationsare depends,of course,onthe application. studyI amcurrentlyengaged in analyzingis suggestingthat people’smentalrotation abilities affect their strategies in 1. If the addressee-centered flameof referencereally is a with their assessments referenceframein its ownfight, thensystemswhereaddressees selectingperspectives,in combination of their conversational partners’ abilities. If this is the case, (systemor user) havedifferent vantagepointsthan speakers (user or system)will haveto includean addressee-centered then user modelsmayneed to represent a componentthat referenceframein its repertoire. Usersspeakingto systems weightsreferenceframelikelihoodfor particularindividuals. shouldbe understood whethertheyspeakegocentricallyor take an ~,t,tressee-ceuteredperspective.Systems involvingmultiple 5. Oneprovocativepiece of data here is that the difference betweentaking the addressee’sandobject’s perspectiveswas users withdifferentvantagepointswill haveto representthe less extremewhenthe addresseeor object remainedin a pointsof viewof eachuser andinterpretutterancesin light of predictablespot. Theappropriatenessof generalizingthis thosereferenceframes. result beyondthe studies describedhere dependson whether spatial descriptionsin the situations to be modeled are more 2. [fan effort-basedaccountof perspectivechoiceis right, the conditionin whichthe referenceobject data describedheresuggestthat takingan object’sperspective like the experimental fromtrial to trial or the stationarycase.Bothsituations canbe easier thantakinganotherperson’sperspective.These moved can happen in the real world, although our experimental data do not showthat peopleprefer takingan object-centered situationwasartificial in the regularityandpredictabilitywith perspective to an addressee-centeredone, since in our 97 Figure 2: Sampledisplays, Experiment2 98 which experimental participants descriptions. Davis, J.R. 1989. BackSeat Driver: Voice Assisted AutomobileNavigation. M.Sc. Thesis, Dept. Of Architecture, MIT. had to produce their Thesestudies also raise questions for the future that have significant implications for systemsthat try to modelchanging environments.Whendo speakers instantiate a reference frame or coordinate system? Certainly they must do it before they producetheir spatial expressions (Herskovits, 1986). But how often do they update them? Once a reference frame is instantiated is it kept until speakers get evidence that circumstances have changed? Or do speakers reconfLrrn reference framesregularly, especially if intervening spoken material hasn’t kept a reference frameactive? Onespeculation censistentwiththis result is that speakersinstantiate a frameof reference for a fn’st spatial description, and it can be more difficult for themto do this for an addresseethan for an object (at least the one we used). But oncethe frameof reference has beeninstantiated, it needn’tbe reinstantiated for subsequent descriptions, as long as the reference object or addressee doesn’t changevantage points. Finally, I shouldnote that effort is not the onlyfactor involved in reference frame selection. Speakers are affected by the reference frames they have already chosen and the reference frames their conversational partners have chosen(see Garrod & Anderson 1987). Speakers are also affected by what reference frames their conversational partners give them evidence of having understood, and they will reframe their descriptions from a different perspective when they are misunderstood(Schober 1993). This should remindus that humanconversation repair systems act as a safety net for misunderstanding,and it is a reasonablestrategy in modeling spatial languageto invest significant computationalresources in the diagnosis and repair of misunderstanding(Brennan Hulteen 1995; Schober 1996). Fillmore, C.J. 1982. Towardsa Descriptive Frameworkfor Spatial Deixis. In R.J. Jarvella &W.Klein, eds., Speech, Place, andAction(pp. 31-59). Chichester: Wiley. Gapp, K.-P. 1995. ObjectLocalization: Selection of OptimalReference Objects. In Proceedingsof the 2nd International Conferenceon Spatial InformationTheory, COSIT95. Garrod, S., and Anderson, A. 1987. Saying What You Meanin Dialogue: A Study in Conceptual and Semantic Coordination. Cognition 27:181-218. Herrmann,T., Bttrkle, B., and Nirmaier, H. 1987. Zur hOrerbezogenenRaumreferenz: HOrerposition und Lokalisationsanfwand[On Listener-Oriented Spatial Reference:Listener Position and LocalizationEffort]. Sprache & Kognition 3:126-137. Herrmann,T. 1989. PartnerbezogenenObjektlocalisation-ein Neues Sprachpsychologisehes Forschungsthema [Partner-oriented Localization of objects--a New Psycholinguistic ResearchTopic], Bericht Nr. 25, Forsehergruppe "Spreehen und Spraehverstehen im sozialen Kontext," University of Mannheim. Herskovits, A. 1986. Languageand Spatial Cognition: An Interdisciplinary Study of the Prepositionsin English. Cambridge,UK:CambridgeUniversity Press. Lang, E., Carstensen, K.-U., and Simmons,G. 1991. ModelingSpatial Knowledgeon a Linguistic Basis (Lecture Notes in ComputerScience 481). Berlin: Springer-Verlag. LeveR, W.J.M.1989. Speaking: FromIntention to Articulation. Cambridge,MA:MITPress. References Brennan,S.E., and Hulteen, E.A. 1995. Interaction and Feedbackin a SpokenLanguageSystem: A Theoretical Framework. Knowledge-BasedSystems 8:143-151. Levinson, S. 1996. Frames of Reference and Molyneux’s Question: Cross-Linguistic Evidence. In P. Bloom,M.A. Peterson, L. Nadel, and M. Garrett, eds., Spaceand Language. Cambridge: MITPress. Forthcoming. Carlson-Radvansky,L.A., and Irwin, D.E. 1993. Framesof Reference in Vision and Language:Whereis Above? Cognition 46:223-244. Miller, G.A., and Johnson-Laird, P.N. 1976. Languageand Perception. Cambridge,MA:HarvardUniversity Press. Carlson-Radvansky,L.A., and Irwin, D.E. 1994. Reference FrameActivation During Spatial TermAssignment. Journal of Memoryand Language, 33,646-671. Retz-Schmidt, G. 1988. Various Views on Spatial Prepositions. AIMagazine9(2):95-105. Schober, M.F. 1993. Spatial Perspective-Taking in Conversation. Cognition 47:1-24. Carlson-Radvansky, L.A. 1995. Cooperation AmongSpatial ReferenceFrames.Poster session presented at the 36th AnnualMeetingof the PsychonomieSociety, Los Angeles, CA. Schober, M.F. 1995. Speakers, Addressees, and Framesof Reference: WhoseEffort is Minimizedin Conversations About Locations? Discourse Processes 20:219-247. Clark, H.H., and Marshall, C.R. 1981. Definite Reference and MutualKnowledge.In A.H. Joshi, B. Webber,and I.A. Sag, eds., Elementsof Discourse Understanding(pp. 1063). Cambridge,UK:CambridgeUniversity Press. Schober, M.F. 1996. HowAddressees Affect Spatial Perspective Choicein Dialogue. In P.L. Olivier and K.-P. Gap, eds., Representation and Processing of Spatial 99 Expressions. Hillsdale, N J: LawrenceErlbaum. Forthcoming. Schober, M.F., and Bloom,J.E. 1995. The Relative Ease of Producing Egocentric, Addressee-Centered,and ObjeetCenteredSpatial Descriptions. Poster session presented at the 36th AnnualMeetingof the PsychonomicSociety, Los Angeles, CA. Schober, M.F., and Clark, H.H. 1989. Understandingby Addressees and Overhearers. Cognitive Psychology 21:211-232. 100