PERSPECTIVE AND ACTION UNDERSTANDING

advertisement
Perspective and Action Understanding 1
Running head: PERSPECTIVE AND ACTION UNDERSTANDING
Perspective-taking Promotes Action Understanding and Learning
Sandra C. Lozano
Bridgette Martin Hard
Barbara Tversky
Stanford University
Perspective and Action Understanding 2
Abstract
People often learn actions by watching others. In this paper, we propose and test the hypothesis
that perspective-taking promotes encoding a hierarchical representation of an actor’s goals and
subgoals—a key process for observational learning. Observers segmented videos of an object
assembly task into coarse and fine action units. They described what happened in each unit
from either the actor’s, their own, or another observer’s perspective and later performed the
assembly task themselves. Participants who described the task from the actor’s perspective
encoded actions more hierarchically during observation and learned the task better.
KEYWORDS: perspective-taking, observational learning, action understanding, intentional
inference; hierarchical encoding
Perspective and Action Understanding 3
Perspective-taking Promotes Action Understanding and Learning
One way to learn how to do things is to watch others do them. A first step in learning by
watching is to infer the goals of the actions being performed. Even small children can infer goals,
and use those goals as the basis for imitating others’ behavior (e.g., Meltzoff, 1995). Inferring
goals becomes more complex for real world tasks that consist of long, interrelated sequences of
actions, such as making a cake or assembling a piece of furniture. To learn such tasks, observers
need to infer how actions are organized into goal-subgoal hierarchies. (Hard, Lozano, &
Tversky, in press; Whiten, 2002; Zacks, Tversky, & Iyer, 2001).
Here, we explore whether the ability to infer goal-subgoal organization in action is
influenced by perspective-taking. We take our notion of perspective-taking from Galinsky, Ku,
and Wang (2005, p. 110), who define it as “the process of imagining the world from another’s
vantage point or imagining oneself in another’s shoes.” Put differently, taking another person’s
perspective implies establishing overlap between one’s own mental representations and the
mental representations of the other person (e.g., Davis, Conklin, Smith, & Luce, 1996; Galinsky
& Moskowitz, 2000; Vorauer & Cameron, 2002).1 We report a series of studies that test whether
perspective-taking promotes understanding of goal-subgoal organization in observed behavior,
thereby promoting observational learning. But first, we assemble the pieces of evidence
underlying our reasoning.
Action is Planned and Encoded as a Hierarchy of Goals and Subgoals
People plan actions hierarchically according to an overarching goal that is decomposed
into subgoals that are, in turn, decomposed into even smaller subgoals (Newell & Simon, 1972).
These plans are instantiated into hierarchically organized behaviors, like making a bed, or even
playing a violin (e.g., Lashley, 1951). Imitative behavior also shows evidence of hierarchical
Perspective and Action Understanding 4
organization: when people and even other primates imitate others’ behavior, they do so in a way
that suggests they’ve encoded that behavior as a hierarchy of goals and subgoals (e.g., Byrne &
Russon, 1998; Travis, 1997; Whiten, 2002).
In fact, people encode hierarchical organization when observing behavior in real time,
even when they are not intending to learn that behavior. Evidence for this comes from several
sources, notably using a segmentation task, in which people observe a video of goal-oriented
behavior, pressing a key to indicate when, in their judgment, one action is completed and the
next begins (Newtson, 1973). The action boundaries that people identify are referred to as
breakpoints. For a wide range of goal-directed behaviors, people reliably segment units of action
corresponding to the completion of goals and subgoals by the actor (Baldwin, Baird, Saylor, &
Clark, 2001; Hard, Zacks, & Tversky, 2006; Newtson, 1973; Zacks, Tversky, & Iyer, 2001).
Zacks, Tversky, and Iyer (2001) found that when asked to segment action sequences into coarse
and fine units on separate viewings, observers select units that are hierarchically nested: the
boundaries of coarse units coincide with the boundaries of fine units well above chance. When
observers are asked to report what happens in each coarse or fine unit as they segment, in many
cases they even give descriptions of how sets of fine units can be summarized into a coarse unit
(Hard, Lozano, & Tversky, in press).
The nature of these descriptions, combined with the consistency of hierarchical
organization within and across observers, has been taken to reflect observers’ attempts to
understand observed behavior by encoding its hierarchical organization. Other paradigms (e.g.,
Hard, Tversky, & Lang, in press; Martin, 2006; Zacks, Tversky, & Iyer, 2001), including studies
of brain activation during passive viewing (Zacks, Braver, Sheridan, et al., 2001), corroborate
these claims. One possible benefit of hierarchical encoding in terms of goals and subgoals is an
Perspective and Action Understanding 5
action representation with the structure of an action plan, which in turn, facilitates performance
of the action sequence by observers. Supporting this possibility, the degree of hierarchical
organization in segmentation predicts accuracy of observational learning (Hard, Lozano, &
Tversky, in press).2
People Describe Action from the Actor’s Perspective
Although people naturally and frequently describe the world from their own point of
view (e.g., Hart & Moore, 1973; Levelt, 1989; Piaget & Inhelder, 1956; Shelton & McNamara,
1997), there are notable exceptions. One of these is in describing observed actions (Lozano,
Hard, & Tversky, in press). For example, in a recent study, observers gave play-by-play reports
of an action sequence performed by an actor who faced them (Hard, Lozano, & Tversky, in
press). Close examination of these reports revealed that when participants included specific
spatial information, such as the locations of objects or which of the actor’s hands performed an
action, it was using the actor’s spatial reference frame rather than their own. For example,
participants were more likely to say “she puts the block on her left” than “she puts the block on
my right.” This suggests that when observers describe actions, they seem to put themselves in the
actor’s shoes.
There is evidence that observers of action put themselves in the actor’s shoes, not only in
their descriptions of actions, but also at the neural level: observing action activates many of the
same brain mechanisms involved in planning and executing action (e.g., Grafton, Arbib, Fadiga,
& Rizzolatti, 1996; Iacoboni, 2005; Iacoboni, Woods, Brass, Bekkering, Maziotta, & Rizzolatti,
1999). As some have put it, observing others’ actions produces motor simulation, in which an
individual internally copies those actions (Fadiga, Craighero, & Olivier, 2005; c.f., Rizzolatti,
Perspective and Action Understanding 6
Fadiga, Fogassi, & Gallese, 1999). These findings have been taken to mean that a component of
understanding others’ actions is mapping them to actions of the self.
Outside the domain of action understanding, increased self-other overlap during
perspective-taking3 can have useful social consequences, such as increasing feelings of liking,
rapport, empathy, and sympathy toward others (e.g., Batson, 1991; Chartrand & Bargh, 1999;
Cheng & Chartrand, 2003). The self-other overlap that takes place when people observe actions
has been proposed to promote action understanding, perhaps by facilitating inferences about the
goals and intentions of others (e.g., Arbib & Rizzolatti, 1996; Rizzolatti & Arbib, 1998), or by
generating predictions that guide the perception of ongoing behavior (Wilson & Knoblich, 2005).
As yet, there is little direct evidence supporting these proposals, however.
There is some recent evidence that perspective-taking is related to action understanding,
specifically to hierarchical encoding of goal-subgoal structure. In a study described earlier,
where participants provided play-by-play descriptions as they segmented a video of an object
assembly task, participants who spontaneously described actions from the actor’s perspective
both showed more hierarchical organization in their segmentation and assembled the object
better (Hard, Lozano, & Tversky, in press). These findings were unexpected and only showed a
correlation between perspective-taking and hierarchical encoding. Thus, it remains an open
question as to whether perspective-taking leads to better action understanding or whether the
relationship works in the opposite direction.
Testing the Role of Perspective-taking in Action Understanding and Learning
Taking the actor’s perspective might allow better encoding of the hierarchical
organization of the actor’s intentions, that is, the goals and subgoals of the task. By promoting
hierarchical encoding, perspective-taking should also promote observational learning. Together,
Perspective and Action Understanding 7
these predictions form the hypothesis investigated here. In a series of studies, observers
segmented a video of an object assembly task at coarse and fine levels, describing what
happened in each segment as they segmented. In the first study, half the observers were
instructed to describe the action from their own perspective, using their own body as a reference
frame, and half from the actor’s perspective, using the actor’s body as a reference frame. As
predicted, those who described actions from the actor’s perspective encoded actions more
hierarchically and learned them better than those who described actions from their own
perspective. A follow-up study confirmed that observers naturally describe action from an
actor’s perspective, leading them to hierarchically encode and learn actions better than observers
instructed to describe from a self-perspective. But surprisingly, instructions to describe from the
actor’s perspective enhance hierarchical encoding and learning above and beyond what people
do naturally. The third study rejected the possibility that taking any perspective other than one’s
own can facilitate action understanding and performance.
Study 1a: Describing from a Self versus Actor’s Perspective
The first study tested whether describing actions from an actor’s perspective improves
hierarchical encoding and learning. Participants watched a video of a person assembling an
object, a TV cart, pressing a key to indicate when they thought one action segment ended and
another began. They did this twice, once for the largest units that made sense and once for the
smallest units that made sense. As they segmented, they described what happened, either from
their own perspective (e.g., “she puts the board on my right”) or from the perspective of the actor
(e.g., “she puts the board on her left”). After describing and segmenting the video twice,
participants were asked to assemble the TV cart. Half the participants had been told of the
Perspective and Action Understanding 8
assembly task, but half had not, so that effects of task awareness on performance could also be
evaluated.
Method
Participants and Design
Forty Stanford University undergraduates participated in exchange for course credit. A 2
x 2 x 2 x 2 Mixed Factorial design was used. Segmentation level (fine, coarse) was varied within
participants; and assigned perspective (actor, self), segmentation order (coarse-fine, fine-coarse),
and awareness of the later assembly task (aware, unaware) were varied between participants.
Stimuli and Materials
All participants viewed two videos, one for practice and one for test. The practice video,
used for practicing the segmentation procedure, was created by Zacks, Tversky, and Iyer (2001)
and showed a woman assembling a saxophone. The test video involved a woman assembling a
TV cart made by Talon Systems Inc®. The TV cart measured 17” x 25” x 21” in size, and
consisted of two sideboards, a lower shelf, an upper shelf, a support board, pegs for attaching the
support board, screws, screwdriver, and wheels (see top half of Figure 1). The actor faced the
camera during filming, and performed approximately equal numbers of actions with her left and
right hand, following the script described in Appendix A.
All videos were presented on a 21-inch, flat screen computer monitor. Response times
were recorded with a keyboard attached to a Macintosh G4 computer, using a program written in
PsyScope 1.2.5 (Cohen, MacWhinney, Flatt, & Provost, 1993). Verbal descriptions were
recorded with a hand-held tape recorder, and TV cart assembly performances were recorded with
a digital video camera.
Perspective and Action Understanding 9
Procedure
At the beginning of the study, participants received a brief introduction to segmentation.
They were told that to make sense of their experiences, people break them down into events of
varying sizes, some small, some large. Participants were never explicitly informed that large and
small events could be hierarchically related, so that we could observe how and when hierarchical
encoding occurred spontaneously and as a function of the experimental manipulations.
Participants did receive everyday examples of both large and small events; but these events were
in no way related to each other. The exact instructions that participants received are in Appendix
B.
Following this introduction, participants were told that they would see two videos of a
person assembling an object. Their job was to divide these videos into separate units by pressing
the spacebar every time one meaningful action ended and another began. Each time they pressed
the spacebar, participants were instructed to briefly describe, from the perspective they had been
randomly assigned (actor or self), what happened in the segment they had just observed. To
demonstrate how to do this, participants were shown a still frame from the practice video that
depicted a woman placing a saxophone on a table. They were given examples of how this picture
could be described either from an actor- or self-perspective and were then reminded of which of
the two perspectives they should describe.
To practice segmenting and describing actions, participants viewed a practice video of
saxophone assembly and were instructed to mark whatever units felt natural and meaningful to
them. Participants then viewed and segmented the test video, which was 6 minutes 35 seconds
long and showed an actor assembling the TV cart in the same room where participants were
being tested. Half of the participants were instructed to indicate the smallest units that seemed
Perspective and Action Understanding 10
natural and meaningful to them (fine-coarse); the other half were instructed to indicate the
largest units that seemed natural and meaningful (coarse-fine). Participants then segmented the
test video a second time according to the opposite unit-size instructions. Viewing the videos
twice was necessary to create a measure of hierarchical encoding (Hard, Lozano, & Tversky, in
press; Hard, Tversky, & Lang, in press; Zacks, Tversky, & Iyer, 2001). For both viewings,
participants were reminded to describe all action units they indicated in terms of their assigned
perspective. Participants in the aware condition were also told that after segmenting and
describing the test video twice, they would have to assemble a TV cart themselves. The
experimenter was not present during the segmentation task.
After performing the segmentation task, participants were presented with all assembly
materials needed to build the TV cart. The assembly materials were placed in a central, neutral
position on the same table used by the actor in the test video. Participants were instructed that
they could perform the task however they wanted; their task was simply to assemble the TV cart
as quickly and accurately as possible. Participants received no further instructions and any
suggestion of which perspective to use during assembly was explicitly avoided. Their
performance was videotaped from the same visual angle as the test video, and the experimenter
was not present during assembly.
Results
Overview of Analyses
For all results in the present and subsequent studies, dependent measures were submitted
to a factorial Analysis of Variance (ANOVA), with all independent variables (e.g., assigned
perspective, awareness of the later assembly task, segmentation order) as factors, unless
otherwise indicated. For all analyses, an alpha level of less than .05 was used as the criterion for
Perspective and Action Understanding 11
significance. Nonsignificant effects are noted with a marking of ns. As estimates of effect size, a
partial eta squared value (ηp²) is reported for significant ANOVA effects, and a Cohen’s d is
reported for significant t-test effects.
Does Perspective Affect Hierarchical Encoding?
Hierarchical encoding was evaluated in two ways. The first measure, enclosure, assessed
the hierarchical organization of participants’ segmentation pattern, using a technique developed
by Hard, Lozano, and Tversky (in press). Enclosure is defined as the proportion of coarse
breakpoints that fall after their closest fine breakpoint in time. When this proportion is high, then
most of the coarse breakpoints take account of or enclose all the relevant fine units. If a coarse
breakpoint precedes the breakpoint of the final fine segment, it violates strict hierarchical
encoding. In the present study, most but not all the coarse breakpoints in fact fell after the closest
fine breakpoint (M = 5.33, SEM = 0.97), instead of before it (M = 3.07, SEM = 0.76), pairedt(39) = 4.07, d = 0.66, replicating previous findings by Hard, Lozano, and Tversky (in press).
Further rationale behind enclosure scores is provided in Appendix C, along with a more detailed
explanation of how they are calculated.
The second measure of hierarchical encoding assessed participants’ descriptions for
hierarchical structure. This second measure provided a more transparent assessment of
hierarchical encoding, and it was also used to validate the enclosure measure. During fine
segmentation, although observers were instructed to identify fine-level actions, some of them
offered summaries that grouped the preceding set of fine-level actions (e.g., “She inserted the
first screw,” “She inserted the second screw,” etc.) into a coarse-level action (e.g., “She attached
the top shelf”). The number of verbal summaries each participant gave during fine segmentation
was used as a verbal measure of hierarchical encoding. Replicating findings of Hard, Lozano,
Perspective and Action Understanding 12
and Tversky (in press), the number of verbal summaries correlated positively with enclosure
scores, r(38) = .41, validating enclosure scores as a measure of hierarchical encoding.
By both measures of hierarchical encoding, describing from an actor-perspective was
superior to describing from a self-perspective. As the upper part of Figure 2 shows, participants
who described from an actor-perspective had higher enclosure scores than self-perspective
participants, F(1, 32) = 11.53, MSE = 0.28, ηp² = .40; and they used more verbal summaries (M =
2.45 to 1.00, SEM = 0.38 to 0.31), F(1, 32) = 10.92, MSE = 0.28, ηp² = .33.
Awareness of the later assembly task had no effect on hierarchical encoding and no
interactions were other variables. Segmentation order did affect hierarchical encoding, however.
Replicating findings from Hard, Lozano, and Tversky (in press), participants who segmented in
coarse-fine order had higher enclosure scores (M = .71, SEM = .05) than did participants who
segmented in fine-coarse order (M = .59, SEM = .03), F(1, 32) = 4.77, MSE = 0.28, ηp² = .15.
The influence of segmentation order on verbal summaries was less clear. There was a significant
interaction between segmentation order and perspective instructions, F(1, 32) = 10.92, ηp² = .25.
Participants using an actor-perspective summarized more often if they segmented in coarse-fine
order (M = 3.30, SEM = 0.30) than in fine-coarse order (M = 1.60, SEM = 0.31), but participants
using a self-perspective summarized more often if they segmented in fine-coarse order (M =
1.60, SEM = 0.20) rather than in coarse-fine order (M = 0.40, SEM = 0.10).
Does Perspective Affect Learning?
Videotapes of assembly performance were coded for errors and assembly time. Errors
were counted even if the participant later corrected them. Errors could take three forms:
attaching pieces in the wrong order (e.g., building the entire TV cart before trying to insert the
middle support board), attaching pieces that should not be connected to each other (e.g.,
Perspective and Action Understanding 13
attaching wheels to the top shelf), or attaching a piece in the wrong orientation (e.g., attaching
the top shelf upside down). On average, participants made 2.20 errors (SEM = 0.22) and
completed the TV cart assembly in 10.00 minutes (SEM = 35.33 s). Assembly errors positively
correlated with assembly time, r(38) = .58, suggesting that participants did not sacrifice accuracy
for speed.
If hierarchical encoding facilitates learning, then measures of hierarchical encoding
(enclosure scores and verbal summaries) should predict learning. Confirming this, enclosure
scores and verbal summaries predicted fewer assembly errors, r(38) = -.50 and -.72, respectively.
If hierarchical encoding facilitates learning, then factors that improve hierarchical encoding
should also improve learning. Confirming this prediction, participants who described the
assembly video from an actor-perspective not only had better hierarchical encoding, but also
made half as many assembly errors, as the lower portion of Figure 2 shows, F(1,32) = 16.65,
MSE = 0.34, ηp² = .43.
Similarly, segmenting in coarse-fine order improved hierarchical encoding and led to
fewer assembly errors than segmenting in fine-coarse order (M = 1.95, SEM = 0.38 vs. M = 2.50,
SEM = 0.38), F(1, 32) = 4.00, MSE = 0.34, ηp² = .19. However, segmentation order interacted
with perspective, F(1, 32) = 7.15, ηp² = .22. When participants used an actor-perspective,
assembly performance did not depend on segmentation order, coarse-fine or fine-coarse (M =
1.70, SEM = 0.30 vs. M = 1.30, SEM = 0.21). When participants used a self-perspective,
assembly performance was better for those who segmented in coarse-fine rather than fine-coarse
order (M = 2.20, SEM = 0.35 vs. M = 3.70, SEM = 0.44).
Does Hierarchical Encoding Mediate Effects of Perspective on Learning?
Perspective and Action Understanding 14
Assigned perspective affected measures of hierarchical encoding and also later assembly
performance. Did perspective affect these two variables independently or did hierarchical
encoding mediate the effect of perspective on learning? To address this question, a mediation
analysis was performed using the mediation techniques of Baron and Kenny (1986), shown in
Figure 3.
According to Baron and Kenny (1986) and Kenny and Judd (1981), several preconditions
must be met to establish mediation. First, the initial variable (assigned perspective) must predict
both the potential mediator (hierarchical encoding, as measured by enclosure scores) and the
outcome variable (assembly errors). As described in the top half of Figure 4, a linear regression
analysis confirmed that assigned perspective4 predicted hierarchical encoding, t(1) = 15.17 and
assembly errors, t(1) = -5.39. Second, the potential mediator (enclosure) must predict the
outcome variable (assembly errors), even when controlling for the initial variable (assigned
perspective). Indeed, hierarchical encoding predicted assembly errors when controlling for
perspective, t(1) = -2.52. Based on these correlations, a Sobel test (Sobel, 1982) showed that
mediation was significant, z = -2.48. Finally, to determine whether hierarchical encoding
completely mediated the effect of perspective on assembly errors, it must be shown that the
initial variable (assigned perspective) no longer predicted the outcome variable (assembly errors)
when controlling for the mediator (enclosure). Controlling for hierarchical encoding, the effect of
assigned perspective on assembly errors was no longer significant, t(1) = -0.24, ns. Thus,
hierarchical encoding fully mediated the effects of assigned perspective on assembly errors.
Does Perspective at Encoding Affect Perspective during Assembly?
All participants performed the assembly task in the same room and at the same table
where the actor was filmed. Thus, videotapes of assembly performances could be coded for the
Perspective and Action Understanding 15
spatial perspective participants adopted during assembly (see Figure 5 for an example of this).
Most participants adopted the same perspective during assembly that they had described during
segmentation: 95% of participants who had described from an actor-perspective assembled the
TV cart by taking the actor’s perspective, meaning that they built the TV cart in the same
orientation and stood on the same side of the table as the actor. Of the participants who had
described the video from a self-perspective, 75% assembled the TV cart from a self-perspective,
meaning that they oriented the TV cart pieces in the opposite direction as the actor and stood on
the opposite (observer) side of the table, Х12 = 23.41. This result confirms a link between
assembly perspective and the way that participants perceived and encoded observed actions.
Neither segmentation order nor awareness of the later assembly task had reliable effects on
participants’ later assembly perspective.
A fourth of self-perspective participants performed the assembly task from an actorperspective. This raises the possibility that self-perspective participants performed the assembly
performance poorly not because they encoded a self-perspective, per se, but because some of
them lacked the insight to perform the assembly task from the same perspective they encoded.
As a first way of addressing this concern, we compared the assembly performances of selfperspective participants who chose to assemble from an actor-perspective with those who chose
to assemble from a self-perspective. If incompatibility between action encoding and later action
execution accounted for the poor assembly performance in the self-perspective condition, then
those who assembled from an incompatible perspective should have made more assembly errors.
Surprisingly, participants who described from a self-perspective but performed assembly from an
actor-perspective outperformed those who maintained a self-perspective, making slightly but not
Perspective and Action Understanding 16
reliably fewer assembly errors (M = 1.67, SEM = 1.20 vs. M = 2.95, SEM = 0.33), t(19) = 1.60,
ns.
As a second way of addressing this concern, we re-analyzed the assembly error data and
excluded participants in the self-perspective condition who chose an actor-perspective for
assembly, as well as the one actor-perspective participant who performed assembly from a selfperspective. Even with these participants excluded, participants who described from an actorperspective made fewer assembly errors (M = 1.44, SEM = 0.20) than those who described from
a self-perspective (M = 3.33, SEM = 0.46), F(1, 25) = 24.50, MSE = 27.52, ηp² = .50. Combined,
these analyses indicate that differences in assembly performance were not due to participants in
the self-perspective condition choosing an incompatible perspective at assembly.
Why is Describing from an Actor-Perspective Beneficial?
The above analyses show that describing actions from the actor’s perspective instead if
one’s own improves hierarchical encoding and imitative learning. How might encoding actions
from the actor’s perspective serve action understanding? Are there other differences between
actor- and self-perspective describers that provide clues? In this section, we explore some
possibilities.
Did perspective affect attention to action in general? Perhaps adopting the actor’s
perspective facilitates hierarchical encoding and learning because it simply focuses attention on
action more generally. To address this possibility, verbal descriptions of actor- and selfperspective describers were compared for differences in the overall type of information encoded.
This analysis showed that descriptions in the two conditions were remarkably similar.
Each participant’s descriptions were broken down into separate clauses and each clause
was classified into three mutually exclusive categories: action statements, depiction statements,
Perspective and Action Understanding 17
or comments. Action statements contained action verbs and described movements of the actor or
objects (e.g., “She moved to the left side of the table” or “She inserted the screw”). Depiction
statements contained verbs of possession or state-of-being verbs and conveyed physical or
structural characteristics of objects (e.g., “The cart has four wheels” or “The screws are silver”).
Comments were any statements that were unrelated to the video itself (e.g., “Oops, I forgot to hit
the spacebar for that last action I described”). For each participant, the percentage of statements
of each type was calculated, and participants in the two perspective conditions were compared.
On average, 84% of participants’ descriptions were action statements, 9% were depiction
statements, and 7% were comments. Actor-perspective and self-perspective participants did not
differ in the mean percentages of action, depiction, or comment statements that they made, (from
all three categories, highest t(38) = 1.10, ns). Thus, the content of descriptions did not differ as a
function of perspective.
Did perspective affect number of segmented units? To determine whether perspective
instructions influenced how coarsely or finely participants segmented the action sequence, the
numbers of fine and coarse units that they identified were examined. During segmentation,
participants identified approximately three times as many fine units (M = 29.90, SEM = 2.84) as
coarse units (M = 8.90, SEM = 1.88), paired-t(39) = 10.54, d = 1.76. This ratio of coarse to fine
units is consistent with previous findings using everyday activities (Zacks, Tversky, & Iyer,
2001), as well as abstract action sequences (Hard, Tversky, & Lang, in press).
Perspective did not affect how many coarse units participants identified, but did influence
the number of fine units. Participants who described from an actor-perspective identified more
fine units (M = 35.10, SEM = 2.67) than participants who described from a self-perspective (M =
24.70, SEM = 2.67), F(1, 32) = 4.30, MSE = 0.17, ηp² = .12. Participants who segmented in fine-
Perspective and Action Understanding 18
coarse order identified more fine units (M = 36.30, SEM = 3.55) than participants who
segmented in coarse-fine order (M = 23.50, SEM = 3.55), F(1, 32) = 6.52, ηp² = .17. No other
effects or interactions on the number of action units segmented were significant.
Can these differences in number of segmented fine units explain why describing from an
actor-perspective led to better hierarchical encoding and learning? Perhaps describing from an
actor-perspective is a more stimulating task than describing from a self-perspective, leading to
more attention and thus to encoding a finer level of detail. The results do not support this
hypothesis, because numbers of segmented fine units did not predict enclosure scores or
assembly errors. Numbers of segmented coarse units and the ratio of coarse to fine units were
equally non-predictive.
Did perspective affect encoding of spatial information? Did perspective instructions
affect how often participants described spatial information or the kind of spatial relations they
attended to? To answer this question, participants’ descriptions were broken down into separate
clauses. Each clause was categorized as containing perspective-relevant information from one of
four mutually exclusive categories: no perspective (e.g., “She inserted the screw”); neutral
perspective (e.g., “She inserted the screw from the side”); actor-perspective (e.g., “She inserted
the screw on her left”); or self-perspective (e.g., “She inserted the screw on my right”). The total
number of descriptions in each category was determined for each participant.5 These analyses
yielded several key insights.
First, participants followed instructions: self-perspective descriptions were more common
in the self-perspective condition (M = 9.85 descriptions, SEM = 1.35) than in the actorperspective condition (M = 0.20, SEM = 0.20), t(38) = 7.07, d = 2.26. In contrast, actorperspective descriptions were more common in the actor-perspective condition (M = 14.45, SEM
Perspective and Action Understanding 19
= 1.82) than in the self-perspective condition (M = 1.70, SEM = 0.45), t(38) = 6.79, d = 2.15. The
top of Figure 6 illustrates these differences in proportions of descriptions. Second, regardless of
the perspective participants were assigned to describe, they were equally likely to describe
spatial information. This was determined by comparing the two groups on the mean number of
statements describing any spatial perspective, be it neutral, actor, or self (for self-perspective
describers, M = 17.65, SEM = 1.94; for actor-perspective describers, M = 23.00, SEM = 2.86),
t(38) = -1.55, ns. This result suggests that any differences in the two groups were due to the
specific perspective that participants encoded, not to how much attention participants were
paying to spatial information in general.
Third, as Figure 6 shows, participants in the self-perspective condition were more likely
to describe from the wrong (i.e., unassigned) perspective than participants in the actorperspective condition. Although it was common for self-perspective participants to incorrectly
use the actor’s perspective at least once (M = 1.44, SEM = 0.41), actor-perspective participants
almost never used a self-perspective (M = 0.20, SEM = 0.13), t(38) = 2.26, d = 3.67. Selfperspective describers were also more likely to qualify their perspective (M = 2.11, SEM = 0.72)
than actor-perspective describers (M = 0.10, SEM = 0.10), t(38) = 2.89, d = 5.00. That is, they
were more likely to state the perspective they were using (e.g., “She inserted the screw on the
right, but that’s only if it’s my right that we’re talking about”). Combined, these results suggest
that describing action from a self-perspective might actually be more difficult than describing
action from the actor’s perspective.
Could the difficulty of describing action from a self-perspective explain why selfperspective describers had difficulty hierarchically encoding and learning actions? Perhaps
describing from a self-perspective was such a demanding task that it interfered with action
Perspective and Action Understanding 20
processing. Additionally, self-perspective describers gave less consistent descriptions from their
assigned perspective. Perhaps this led to “mixed-up” representation of the spatial relations in the
task. The data suggest against both of these possibilities. Mistakes in description perspective
were uncorrelated with hierarchical encoding and with later assembly performance, highest r(38)
= .10, ns.
Self- and actor-perspective also differed in the kind of spatial information they were
likely to describe. Participants were far more likely to describe which of the actor’s hands, left or
right, was used to perform a given action if they described from the actor’s perspective (M =
1.35, SEM = 0.21) instead of their own (M = 0.25, SEM = 0.20), t(38) = -3.77, d = 1.21. Actorand self-perspective describers did not differ in their likelihood of describing locations in space,
as in left or right sides of the table (M = 13.30, SEM = 1.92 vs. M = 11.30, SEM = 1.54), t(38) = .81, ns.
Importantly, the number of descriptions concerning which of the actor’s hands performed
an action predicted better hierarchical encoding, as measured by enclosure, r(38) = .35, and by
verbal summaries, r(38) = .42. References to the actor’s hands also predicted fewer later
assembly errors, r(38) = -.47. The number of descriptions concerning locations on the table
predicted neither measure of hierarchical encoding, r(38) = .01, ns, r(38) = -.02, ns, nor assembly
errors, r(38) = -.15, ns. Given that actor- and self-perspective describers differed both in their
tendency to describe hands and in hierarchical encoding, and that descriptions of hands
correlated with hierarchical encoding, did differences in the tendency to describe hands mediate
effects of perspective on hierarchical encoding? In fact, references to hands fully mediated
effects of perspective on enclosure scores (see the upper half of Figure 7). First, a linear
regression analysis confirmed that assigned perspective predicted both enclosure scores, t(1) =
Perspective and Action Understanding 21
4.64, and hand references, t(1) = 3.77. Second, hand references predicted hierarchical encoding
when controlling for assigned perspective, t(1) = 5.51. According to a Sobel test, mediation was
significant, z = 3.12. Finally, controlling for hand references, the effect of assigned perspective
on hierarchical encoding was no longer significant, t(1) = 0.81, ns.
In sum, these results indicate two potentially revealing differences between actor- and
self-perspective describers. First, describing actions from an actor-perspective appears to be
easier than describing actions from a self-perspective. Second, describing from an actorperspective focuses observers’ attention on the actor’s body. This increased attention on the
actor’s body appears to be beneficial, as it accounts for the fact that actor-perspective describers
had higher hierarchical encoding than self-perspective describers.
Discussion
The results from the present study support the hypothesis that adopting the actor’s
perspective facilitates both action understanding and action learning. In the present study,
participants who described actions from the actor’s perspective, rather than from their own
perspective, encoded the action sequence more hierarchically and later performed the action
sequence faster and with fewer errors. This effect held whether participants were learning the
actions intentionally or incidentally. Furthermore, adopting the actor’s perspective was beneficial
to action learning precisely because it encouraged observers to encode actions hierarchically:
hierarchical encoding mediated effects of description perspective on action learning.
Notably, describing from the actor’s perspective elicited detailed descriptions of the
actor’s body, and, in particular, of which hand the actor was using to perform the actions. The
number of descriptions of the actor’s hands predicted hierarchical encoding and according to a
mediation analysis, accounted for why actor- and self-perspective describers differed in
Perspective and Action Understanding 22
hierarchical encoding. This suggests that focusing on the hands that accomplish the task is
associated with better encoding of the task’s goal-subgoal structure. In sum, actively describing
action from the actor’s perspective provides an effective link from action perception to action
execution, far more effective than describing from one’s own perspective.
Remarkably, adopting the actor’s perspective appeared to be the more natural way to
describe action. Self-perspective describers were more likely than actor-perspective describers to
describe the wrong perspective by mistake and to qualify the perspective they were describing.
Although these description mistakes and qualifications were uncorrelated with hierarchical
encoding and action learning, it remains possible that describing from a self-perspective is
unnatural and therefore interferes with action understanding. Alternatively, describing from an
actor-perspective might enhance it. These possibilities are not mutually exclusive—both might
be true. Study 1b addressed these possibilities by comparing participants in the present study to
participants who were not instructed to adopt a perspective. If describing from a self-perspective
interferes with action understanding, then explicitly describing from a self-perspective should
lead to worse hierarchical encoding and learning than describing freely. If describing from an
actor-perspective enhances action understanding, then describing from an actor-perspective
should lead to better hierarchical encoding and learning than describing freely.
Study 1b: Describing Freely versus Describing from a Self or Actor’s Perspective
Method
Ten Stanford undergraduates from the same population of introductory psychology
students used in Study 1a participated in exchange for course credit. The stimuli, materials, and
procedure were identical to those used in Study 1a, except that participants were not given
instructions to describe from any spatial perspective, nor were they given examples of the
Perspective and Action Understanding 23
different perspectives one might use to describe actions. Because awareness of the later assembly
task had no effect on performance in Study 1a, all participants in the present group were
unaware that they would be performing the assembly task themselves. Study 1b participants
were run within two weeks after completion of Study 1a. The ten participants run in the present
condition were then compared to the 20 unaware participants from Study 1a, yielding a 3 x 2
factorial design, with assigned perspective (actor, self, free) and segmentation order (coarse-fine,
fine-coarse) as between-subjects factors.
Results
Segmentation order did not affect any of the dependent measures reported below, nor did
it interact with any other factors. Thus, all data were collapsed across segmentation order. When
an effect of assigned perspective was reliable, post-hoc analyses using Dunnett’s (two-sided) ttests compared the actor- and self-perspective conditions to the free-describe condition. A
summary of the findings, including means and standard errors, are reported in Table 1.
Differences in Number of Segmented Units
During segmentation, participants identified approximately four times as many fine units
(M = 29.53, SEM = 2.60) as coarse units (M = 7.67, SEM = 0.67), paired-t(29) = 11.55, d = 1.86.
Description perspective did not affect how many coarse units or fine units participants identified,
for both F(2, 27) < 1, ns.
Differences in Hierarchical Encoding
How do self- and actor-perspective describers compare to free describers in hierarchical
encoding? An ANOVA with assigned perspective as a between-subjects factor revealed a
reliable difference among the three conditions in hierarchical encoding, as indexed by both
enclosure scores, F(2, 27) = 21.47, MSE = 0.02, ηp² = .61, and verbal summaries, F(2, 27) =
Perspective and Action Understanding 24
29.24, MSE = 0.77, ηp² = .68. Self-perspective participants showed impaired hierarchical
encoding compared to free-describe participants, with reliably lower enclosure scores, and fewer
verbal summaries. Actor-perspective participants showed enhanced hierarchical encoding
compared to free-describe participants, with reliably higher enclosure scores, and more verbal
summaries.
Differences in Learning
Assigned perspective reliably affected the number of assembly errors participants made
F(2, 27) = 24.54, MSE = 1.04, ηp² = .65. For action learning, as for hierarchical encoding, selfperspective describers showed impaired performance compared to free-describers, and actorperspective describers showed enhanced performance. Self-perspective participants made
reliably more errors than free-describe participants, and actor-perspective participants made
reliably fewer errors than free-describe participants. Across the three conditions, participants
made an average of 2.47 errors (SEM = 0.30) and completed the TV cart assembly in 10 minutes
and 15 seconds (SEM = 4.20 s). Assembly errors positively correlated with assembly time, r(28)
= .72, suggesting that participants did not sacrifice accuracy for speed.
Differences in Description and Assembly Perspectives
Whose perspective did free-describers take when describing the assembly task and when
performing the assembly task themselves? As in Study 1a, participants’ descriptions were
divided into four mutually exclusive categories: no perspective, neutral perspective, actorperspective, or self-perspective. Within the free-describe condition, participants described more
often from an actor-perspective than a self-perspective, (M = 2.40, SEM = 0.40 for number of
actor-perspective statements vs. M = 0.40, SEM = 0.16 for number of self-perspective
statements), paired-t(9) = -6.00, d = 1.67, replicating findings by Hard, Lozano, and Tversky (in
Perspective and Action Understanding 25
press). Also, the more actor-perspective statements that free-describe participants gave, the more
hierarchically they encoded observed actions, as indexed both by enclosure scores, r(8) = .71,
and numbers of verbal summaries provided, r(8) = .54. Adopting the actor’s perspective was not
only more frequent in descriptions of actions, it was more frequent in assembly performance:
free describers were more likely to perform the assembly task from the actor’s perspective (80%)
rather than their own (20%).
How often did free-describers describe from actor- and self-perspectives compared to
participants assigned to those perspectives? Assigned perspective reliably affected the number of
actor-perspective descriptions participants gave, F(2, 27) = 23.92, MSE = 0.13, ηp² = .67, (see
Table 1). Although free-describers tended to adopt the actor’s perspective, they did so reliably
less often than actor-perspective describers, and equally as often as self-perspective describers.
Free-describers were, however, less likely to describe from a self-perspective than selfperspective describers, t(18) = 4.62, d = 2.52. Actor-perspective describers never adopted a selfperspective.6
Analysis of the kind of spatial information participants described showed no differences
across the three groups in how often they described locations in space, as in left or right sides of
the table F(2, 27) = 1.02, ns. The three groups did differ in how often they described which of
the actor’s hands, left or right, was used to perform a given action, F(2, 27) = 13.90, MSE =
1.05, ηp² = .30 (see Table 1). Actor-perspective describers made more hand references than did
free describers, but free describers did not differ from self-perspective participants in how many
hand references they made.
Discussion
Perspective and Action Understanding 26
The present study confirms that observers naturally describe actions from the actor’s
perspective, and that their tendency to do so predicts hierarchical encoding and learning.
Furthermore, instructing participants to describe from their own perspective led to poorer
hierarchical encoding and learning than instructing participants to describe freely. Describing
from a self-perspective might interfere with hierarchical encoding and learning because it is
incompatible with the way action is naturally described: forcing observers to describe in an
unnatural way might be difficult and thus compete with other processes, such as hierarchical
encoding. It is also possible that describing from a self-perspective is incompatible with the way
action is naturally understood. We explore this possibility further in the General Discussion.
Although observers spontaneously adopted the actor’s spatial perspective to describe
actions, they showed enhanced hierarchical encoding and learning when they were explicitly
instructed to adopt the actor’s perspective. This result supports the hypothesis that adopting the
actor’s perspective serves action understanding, specifically inferences about goal-subgoal
structure. This result also has practical implications: calling observers’ attention to the actor’s
perspective is a useful means of improving action understanding and learning.
This leads to a larger question: why is adopting the actor’s perspective beneficial? One
possibility, which we return to in the General Discussion, is that adopting the actor’s perspective
encourages observers to simulate observed action— this increased simulation helps them infer
how actions are organized. But it is also possible that differences in attention or motivation could
explain the superior performance of actor-perspective describers relative to free- and selfperspective describers. Describing from the actor’s perspective might be more engaging than
freely describing or than describing from one’s own perspective, leading to fewer description
errors and to richer encoding of the observed task. In other words, it may be that describing from
Perspective and Action Understanding 27
any perspective other than one’s own, not necessarily from the actor’s perspective, would
improve hierarchical encoding and learning.
Study 2 examined whether action understanding and learning are improved by adopting
any perspective other than one’s own, or by adopting the actor’s perspective specifically. Study 2
addressed this question by showing participants a video portraying an actor and an observer, both
of whom were rotated 90 degrees from the participant viewing the video. Participants described
actions from the perspective of either the actor or the observer in the video and later executed the
action sequence themselves. If adopting any perspective other than one’s own is sufficient to
improve hierarchical encoding and learning, then the two groups should perform equivalently. A
second aim of Study 2 was to generalize several findings from Studies 1a and 1b to a different
assembly task.
Study 2: Describing From the Actor’s versus an Observer in the Scene’s Perspective
Method
Participants and Design
Sixteen Stanford University undergraduates participated in exchange for course credit. A
2 x 2 x 2 x 2 Mixed Factorial design was used. Segmentation level (fine, coarse) was varied
within participants; and assigned perspective (actor, observer), segmentation order (coarse-fine,
fine-coarse), and actor position (left or right) were varied between participants.
Stimuli and Materials
As in Studies 1a and 1b, participants viewed one practice video and one test video. The
practice video showed a female observer watching a female actor make coffee. The test video
contained the same observer and actor, but showed the actor assembling two horses and a heart
using red, yellow, green, and blue Duplo blocks made by Lego® (see Appendix A for a detailed
Perspective and Action Understanding 28
script of assembly). The test video was 3 minutes 28 seconds long. In both videos, the observer
and actor were 180 degrees opposite each other and at a 90-degree angle from the camera (see
the bottom of Figure 1). Two versions of the test video were created, each shown to half the
participants: the actor was on the left side of the table in one video and on the right side of the
table in the other.
Procedure
Prior to testing, participants completed the Vandenberg Mental Rotation Test (MRT), a
measure of spatial ability (Vandenberg & Kuse, 1978). Aside from this, the procedure was
identical to that of Study 1a, except that when describing each video, participants were instructed
to adopt a perspective that was offset by 90 degrees from their viewing perspective (see
Appendix B for the exact instructions that participants received). Participants were randomly
assigned to a perspective, actor or observer, and were instructed to describe all units that they
segmented from that perspective. After performing the segmentation task, participants received
the same instructions for the assembly task used in Studies 1a and 1b.
Results and Discussion
Does Perspective Affect Hierarchical Encoding?
As in Studies 1a and 1b, hierarchical encoding was evaluated by both segmentation
patterns (enclosure scores) and descriptions (verbal summaries). As before, these measures were
correlated, r(14) = .57. As the upper half of Figure 8 shows, participants describing from an
actor- perspective encoded actions more hierarchically than participants describing from an
observer-perspective, according to enclosure scores, F(1, 8) = 8.31, MSE = 0.17, ηp² = .38, and to
verbal summaries (M = 1.63, SEM = 0.32 vs. M = 0.13, SEM = 0.13), F(1, 8) = 19.64, MSE =
Perspective and Action Understanding 29
0.17, ηp² = .44. No other effects or interactions were reliable. Thus, it is adopting the actor’s
perspective specifically, and not any other perspective, that enhances hierarchical encoding.
Does Perspective Affect Learning?
Videotapes of assembly performance were coded for errors (e.g., attaching a block of the
wrong size or color to another block) and assembly time. On average, participants made 7.80
errors (SEM = 1.78) and completed the assembly task in 8.80 minutes (SEM = 105.00 s).
Assembly errors positively correlated with assembly time, r(14) = .60, suggesting that there was
no speed-accuracy tradeoff in performance.
Consistent with findings from Study 1a, participants who described the actions from an
actor-perspective performed the task better than those who described from an observerperspective. As the lower half of Figure 8 shows, participants who described from an observerperspective made about four times as many assembly errors as those who described from an
actor-perspective, F(1, 8) = 13.96, MSE = 0.22, ηp² = .48. Participants who segmented in finecoarse order made over twice as many errors (M = 11.13, SEM = 2.83) as participants who
segmented in coarse-fine order (M = 4.50, SEM = 1.58), F(1, 8) = 7.78, ηp² = .27. No other
effects or interactions were reliable.
Spatial ability, as measured by MRT scores, did not predict assembly time or errors. As
in Studies 1a and 1b, enclosure scores predicted fewer errors on the later assembly task, r(14) = .50. Similarly, using more verbal summaries led to fewer assembly errors, r(14) = -.60.
Does Hierarchical Encoding Mediate Effects of Perspective on Learning?
Can the effects of perspective on assembly errors be explained by changes in hierarchical
encoding? To answer this question, a mediation analysis was again conducted using the
techniques of Baron and Kenny (1986). As described in the bottom half of Figure 4, linear
Perspective and Action Understanding 30
regression confirmed that assigned perspective7 reliably predicted assembly errors, t(1) = -3.13
and hierarchical encoding, as measured by enclosure scores, t(1) = 3.44. Hierarchical encoding
also predicted assembly errors when controlling for assigned perspective, t(1) = -4.29. A Sobel
test confirmed that significant mediation had occurred, z = -2.39. Controlling for hierarchical
encoding, the effect of assigned perspective on assembly errors was no longer significant, t(1) = 0.59. Thus, hierarchical encoding fully mediated the effects of assigned perspective on assembly
errors.
Does Perspective at Encoding Affect Perspective during Assembly?
As in Study 1a, assembly perspective was consistent with encoding perspective. Of the
participants who had described from an actor-perspective, 100% performed the Lego® assembly
task by taking that same (actor’s) perspective, meaning that they oriented the blocks as they had
appeared to the actor in the video and stood on the same side of the table as the actor. Of
participants who described the video from an observer-perspective, 63% assembled from the
observer’s perspective, meaning that they oriented the blocks as they had appeared to the
observer in the video and stood on the observer’s side of the table, Х12 = 9.29.
Notably, the actor’s perspective was the “preferred” perspective overall and was
associated with better assembly performance overall: participants in the observer-perspective
condition who performed assembly from the actor’s perspective made slightly fewer errors (M =
8.33, SEM = 1.15) than those who maintained an observer-perspective during assembly (M =
12.25, SEM = 2.45), ns. Furthermore, when we re-analyzed the assembly error data and excluded
those who described from an observer-perspective but chose an actor-perspective for assembly,
participants who described from an actor-perspective still made fewer assembly errors (M =
3.38, SEM = 1.41) than those who described from a self-perspective (M = 8.08, SEM = 3.61),
Perspective and Action Understanding 31
F(1, 11) = 11.43, MSE = 33.92, ηp² = .51. Collectively, these analyses indicate that differences in
assembly performance were not attributable to self-perspective describers choosing an
incompatible perspective at assembly.
Does Perspective Affect Attention to Action in General?
As in Study 1a, all descriptions were categorized as action, depiction, or comment
statements. On average, 85% of participants’ descriptions were action statements, 12% were
descriptions, and 3% were comments. Consistent with findings from Study 1a, there were no
differences between actor-perspective and observer-perspective participants in the mean
percentages of action, depiction, or comment statements that they used (for all three statement
types, highest t(14) = 1.18, ns.)
Does Perspective Affect Number of Segmented Units?
Participants identified approximately four times as many fine units (M = 36.94, SEM =
4.92) as coarse units (M = 8.69, SEM = 1.06), paired-t(15) = 8.20, d = 5.92. This ratio of fine to
coarse units was slightly larger than that found in Study 1a but is equivalent to that found in
previous research using this same assembly task (Hard, Lozano, & Tversky, in press). In contrast
to Study 1a, actor- and observer-perspective participants did not differ in the number of fine
units segmented (M = 36.25, SEM = 4.86 vs. M = 37.63, SEM = 5.10). There were no effects of
segmentation order on the number of segmented units (M = 36.94, SEM = 4.94 for fine-coarse
vs. M = 34.35, SEM = 4.85 for coarse-fine). This difference from Study 1a is likely attributable
to the task differences associated with assembling a TV cart versus assembling Lego® creations.
Neither the total number of coarse units segmented, the total number of fine units segmented, nor
the ratio of coarse to fine units segmented reliably predicted assembly errors.
Did Perspective Affect Encoding of Spatial Information?
Perspective and Action Understanding 32
As in Study 1a, descriptions were also categorized as no perspective, neutral perspective,
actor-perspective, or observer-perspective. The results of this coding can be seen in the lower
half of Figure 6. As in Study 1a, participants followed instructions: the only observer-perspective
descriptions were given by participants in the observer-perspective condition (M = 9.00, SEM =
2.98). The only actor-perspective descriptions were given by participants in the actor-perspective
condition (M = 8.38, SEM = 2.47). Furthermore, the mean number of spatial descriptions—
descriptions that coded any perspective—was equal for actor- (M = 22.25, SEM = 6.13) and
observer-perspective participants (M = 21.13, SEM = 4.39), t(14) = 0.15, ns. Once again, later
differences in assembly performance were due to the spatial perspective that participants
encoded actions from, not to differences in attention to space more generally.
Similar to Study 1a, observer-perspective participants had more difficulty maintaining
their assigned perspective, six of the eight observer-perspective participants made description
errors (M = 1.50, SEM = 0.87), whereas none of the eight actor-perspective participants did
(Yates’ corrected Х12 = 6.67). Furthermore, five of the eight observer-perspective participants
qualified their perspective (M = 0.63, SEM = 0.26), whereas none of the eight actor-perspective
participants did (Yates’ corrected Х12 = 4.65).8
Also similar to Study 1a, actor-perspective participants focused more on the actor’s body
than observer-perspective participants, giving more descriptions that indicated which hand, left
or right, had performed an action (M = 3.00, SEM = 0.82 vs. M = 0.75, SEM = 0.37), t(14) =
2.50, d = 1.25. Actor- and observer-perspective participants did not differ in the number of times
they described left or right locations on the table (M = 5.38, SEM = 2.29 vs. M = 8.25, SEM =
2.74), t(14) = -0.81, ns. As in Study 1a, the number of references to the actor’s left or right hand
predicted better hierarchical encoding, as measured by enclosure, r(14) = .91, and by verbal
Perspective and Action Understanding 33
summaries, r(14) = .64. The number of descriptions of the actor’s hands also positively
correlated with later assembly errors, r(14) = -.69. Descriptions concerning left and right
locations on the table did not predict hierarchical encoding and did not correlate with assembly
performance.
Finally, just as in Study 1a, references to hands fully mediated effects of perspective on
enclosure scores (see the lower half of Figure 7). A linear regression analysis confirmed that
assigned perspective predicted both enclosure scores, t(1) = 2.73 and hand references, t(1) =
2.50. Hand references predicted enclosure scores when controlling for assigned perspective, t(1)
= 6.18, and according to a Sobel test, mediation was significant, z = 2.50. Finally, controlling for
hand references, the effect of assigned perspective on enclosure scores was no longer significant,
t(1) = .91, ns. Thus, references to hands fully mediated the effects of assigned perspective on
hierarchical encoding.
General Discussion
Learning new skills through observation is not automatic, or everyone would be expert
skiers, dancers, and tennis players. Nevertheless, people do acquire a wide range of complex
skills through observation. This suggests that people are adept at translating tasks they see into
tasks they can do. An important component to this translation appears to be segmenting and
organizing an observed task into a hierarchical representation of goals and subgoals—a
representation that can be implemented as an action plan (Hard, Lozano, & Tversky, in press;
Zacks, Tversky, & Iyer, 2001). Here, we have proposed that taking the perspective of the actor
while observing action facilitates hierarchical encoding of action and thus promotes action
learning.
Perspective and Action Understanding 34
The studies reported here support that hypothesis. In one study, participants observed and
segmented an object assembly task while giving a verbal play-by-play of the actions from the
actor’s or their own perspective. Describing actions from the actor’s perspective instead of their
own led to better encoding of the hierarchical, goal-subgoal organization of those actions and
better subsequent performance of those actions. A follow-up to this study showed that explicitly
describing from an actor’s perspective was superior for action understanding and learning
relative to freely describing, and both were superior to explicitly describing from a selfperspective. A final study showed that describing actions from any perspective other than one’s
own is not beneficial: it is adopting the actor’s perspective specifically that promotes hierarchical
encoding and learning.
What are observers doing in the present studies when they are “taking the actor’s
perspective”? There are a number of possibilities that are not mutually exclusive. It may be that
observers are simply engaging in visuospatial perspective-taking—imagining where objects are
located in space, relative to the actor. It may be that observers are engaging in mentalistic
perspective-taking—imagining what their own goals and subgoals would be if they were
executing the observed task themselves. Finally, it may be that observers are engaging in motoric
perspective-taking—mapping observed actions onto a representation of their own body.
Although all of these possibilities might be true, the data do seem to strongly support the idea
that observers are engaging in motoric perspective-taking, or simulation: when participants
described from the actor’s perspective, they spontaneously described which hand was performing
certain actions. This tendency to describe which hand performed an action was associated with
better hierarchical encoding, and in fact, seemed to account for the fact that actor-perspective
describers encoded action more hierarchically than self- or observer-perspective describers. In
Perspective and Action Understanding 35
contrast, descriptions of the location of an object in space from the actor’s perspective were not
associated with hierarchical encoding.
The fact that observers spontaneously described which of the actor’s hands performed an
action when describing from the actor’s perspective is consistent with findings that motor
simulation occurs as if observers are mapping observed actions anatomically to their own bodies.
In one demonstration of this, observers viewed simple actions, such as moving toward a red dot,
performed by another person’s left or right hand. When observing left hand actions, motor
evoked potentials (MEPs) were larger in observers’ left hands, whereas when observing right
hand actions, MEPs were larger in observers’ right hands (Aziz-Zadeh, Maeda, Zaidel,
Mazziotta, & Iacoboni, 2002). Similar evidence for an anatomical mapping has been found when
people observe actions performed by the feet (Cheng, Tzeng, Hung, Decety, & Hsieh, 2005).
How might motor simulation explain findings from the present studies? When people try
to understand actions, some form of motor simulation might be automatic, such that observers
implicitly relate the actor’s body to their own. This view predicts that describing actions from the
actor’s perspective, especially which hand is performing those actions, should be natural and
easy. In contrast, describing actions from one’s own or another observer’s perspective should be
difficult, and might impair hierarchical encoding by directly competing with it.
Describing actions from a self-perspective impairs action understanding, but describing
actions from the actor’s perspective also enhances it. This could mean that motor simulation can
be enhanced by encouraging observers to put themselves in the actor’s shoes. Consistent with
this view, instructing participants to explicitly adopt the actor’s perspective led to more
descriptions about the actor’s hands than instructing participants to describe freely. Alternatively,
encouraging observers to take the actor’s perspective might change the way they use their motor
Perspective and Action Understanding 36
simulations for understanding observed actions and their organization (c.f., Barsalou, 1999,
2003; Wilson & Knoblich, 2005). Although motor simulation might account for the present
findings, it remains to be seen whether describing actions from the actor’s perspective actually
elicits neural structures involved in planning and executing actions. Future studies using TMS or
fMRI methods could provide valuable insight into the nature of the perspective-taking processes
observed in the present research.
The powerful links shown here between perspective-taking, action understanding, and
action learning thus raise many questions. For example, do benefits of adopting the actor’s
perspective depend on verbalizing that perspective, or are there non-verbal means of perspectivetaking that are equally beneficial? Can taking an actor’s perspective enhance understanding and
performance of other actions, in particular, the actions that are at the core of effective social
behavior? Also, what really happens when observers begin to think about space, and actions
performed in that space from an actor’s point of view? The present data open the intriguing
possibility that spatial perspective-taking provides a window into the actor’s mind, giving
observers insight into an actor’s goals, intentions, and future behaviors.
Perspective and Action Understanding 37
References
Arbib, M. A., & Rizzolatti, G. (1996). Neural expectations: A possible evolutionary path from
manual skills to language. Communication and Cognition, 29, 393-424.
Aziz-Zadeh, L., Maeda, F., Zaidel, E., Mazziotta, J., & Iacoboni, M. (2002). Lateralization in
motor facilitation during action observation: A TMS study. Experimental Brain
Research, 144, 127-131.
Baldwin, D. A., Baird, J. A., Saylor, M. M., & Clark, M. A. (2001). Infants parse dynamic
action. Child Development, 72, 708-717.
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social
psychological research: conceptual, strategic, and statistical considerations. Journal of
Personality and Social Psychology, 51, 1173-1182.
Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Bain Sciences, 22, 577-609.
Barsalou, L. W. (2003). Situated simulation in the human conceptual system. Language and
Cognitive Processes, 18, 513-562.
Batson, C. D. (1991). The altruism question: Toward a social-psychological answer. Hillsdale,
NJ: Lawrence Erlbaum Associates.
Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: A hierarchical approach.
Behavioral and Brain Sciences, 21, 667-709.
Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception–behavior link and
social interaction. Journal of Personality and Social Psychology, 76, 893–910.
Cheng, C. M., & Chartrand, T. L. (2003). Self-monitoring without awareness: Using mimicry as
a nonconscious affiliation strategy. Journal of Personality and Social Psychology, 85,
1170-1179.
Perspective and Action Understanding 38
Cheng, Y. W., Tzeng, O. J. L., Hung, D, Decety, J.,& Hsieh, J. C. (2005). Modulation of spinal
excitability during observation of bipedal locomotion. Neuroreport, 16, 1711-1714.
Cohen, J. D., MacWhinney, B., Flatt, M., & Provost, J. (1993). PsyScope: An interactive graphic
system for designing and controlling experiments in the psychology laboratory using
Macintosh computers. Behavior Research, Methods, Instruments & Computers, 25, 257271.
Davis, M. H., Conklin, L., Smith, A., & Luce, C. (1996). Effect of perspective taking on the
cognitive representation of persons: A merging of self and other. Journal of Personality
and Social Psychology, 70, 713–726.
Fadiga, L., Craighero, L., & Olivier, E. (2005). Human motor cortex excitability during the
perception of others’ action. Current Opinion in Neurobiology, 15, 213-218.
Galinsky, A. D., Ku, G., & Wang, C. S. (2005). Perspective-taking and self-other overlap:
Fostering social bonds and facilitating social coordination. Group Processes and
Intergroup Relations, 8, 109-124.
Galinsky, A. D., & Moskowitz, G. B. (2000). Perspective-taking: Decreasing stereotype
expression, stereotype accessibility, and in-group favoritism. Journal of Personality and
Social Psychology, 78, 708–724.
Grafton, S. T., Arbib, M. A., Fadiga, L., & Rizzolatti, G. (1996). Localization of grasp
representations in humans by positron emission tomography, 2: Observation compared
with imagination. Experimental Brain Research, 112, 103-111.
Hard, B. M., Lozano, S. C., & Tversky, B. (in press). Hierarchical encoding of behavior:
Translating perception into action. Journal of Experimental Psychology: General.
Perspective and Action Understanding 39
Hard, B. M., Tversky, B., & Lang, D. (in press). Segmenting abstract events: Building event
schemas. Memory and Cognition.
Hard, B. M., Zacks, J. M., & Tversky, B. (2006). Inferring structure in behavior: the role of
goals and language. Unpublished manuscript, Stanford University and Washington
University in St. Louis.
Hart, R. A., & Moore, G. T. (1973). The development of spatial cognition. In R. M. Downs & D.
Stea (Eds.), Image and environment (pp. 246-288). Chicago: Aldine.
Iacoboni, M. (2005). Understanding others: Imitation, language, empathy. In S. Hurley & N.
Chater (Eds.), Perspectives on imitation: From neuroscience to social science (Vol. 1,
pp. 77-100). Cambridge, MA: MIT Press.
Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., & Rizzolatti, G. (1999).
Cortical mechanisms of human imitation. Science, 286, 2526–2528.
Kenny, D. A., & Judd, C. M. (1986). Consequences of violating the independence assumption in
analysis of variance. Psychological Bulletin, 99, 422-431.
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral
mechanisms in behavior: The Hixon Symposium (pp. 112-146). Oxford, England: Wiley.
Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Lozano, S. C., Hard, B. M., & Tversky, B. (in press). Putting action in perspective. Cognition.
Martin, B. A. (2006). Reading the language of action: Hierarchical encoding of behavior.
Unpublished doctoral dissertation. Stanford University.
Meltzoff, A. N. (1995). Understanding of the intentions of others: Re-enactment of intended acts
by 18-month-old children. Developmental Psychology, 31, 838-850.
Perspective and Action Understanding 40
Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: PrenticeHall.
Newtson, D. (1973). Attribution and the unit of perception of ongoing behavior. Journal of
Personality and Social Psychology, 28, 28-38.
Piaget, J., & Inhelder, B. (1956). The child’s conception of space. London: Routledge and Kegan
Paul.
Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neuroscience, 21,
188-194.
Rizolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition
of motor actions, Cognitive Brain Research, 3, 131-141.
Rizzolatti, G., Fadiga, L., Fogassi, L., & Gallese, V. (1999). Resonance behaviors and mirror
neurons. Archives Italiennes de Biologie, 137, 85-100.
Shelton, A. L., & McNamara, T. P. (1997). Multiple views of spatial memory. Psychonomic
Bulletin and Review, 4, 102-106.
Sobel, M. E. (1982). Asymptotic intervals for indirect effects in structural equations models. In
S. Leinhart (Ed.), Sociological methodology (pp.290-312). San Francisco: Jossey-Bass.
Travis, L. L. (1997). Goal-based organization of event memory in toddlers. In P. W. van den
Broek, P. J. Bauer, & T. Bourg (Eds.), Developmental spans in event comprehension and
representation: Bridging fictional and actual events (pp. 111-138). Mahwah, NJ:
Lawrence Erlbaum Associates.
Vandenberg, S. G., & Kuse, A. R. (1978). Mental rotations, a group test of three-dimensional
spatial visualization. Perceptual and Motor Skills, 47, 599-604.
Perspective and Action Understanding 41
Vorauer, J. D., & Cameron, J. J. (2002). So close, and yet so far: Does collectivism foster
transparency overestimation? Journal of Personality and Social Psychology, 83, 1344–
1352.
Whiten, A. (2002). The imitator’s representation of the imitated: Ape and child. In A. N.
Meltzoff & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases
(pp. 98-121). Cambridge, UK: Cambridge University Press.
Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics.
Psychological Bulletin, 131, 460-473.
Zacks, J. M., Braver, T. S., Sheridan, M. A., Donaldson, D. I., Snyder, A. Z., Ollinger, J. M.,
Buckner, R. L., & Raichle, M. E. (2001). Human brain activity time-locked to perceptual
event boundaries. Nature Neuroscience, 4, 651-655.
Zacks, J. M., Tversky, B., & Iyer, G. (2001). Perceiving, remembering, and communicating
structure in events. Journal of Experimental Psychology: General, 130, 29-58.
Perspective and Action Understanding 42
Appendix A: Test Video Scripts
The following script describes the steps followed by the actor in the TV cart assembly test video.
All spatial locations mentioned in the script are described relative to the actor. Steps listed in
bold italic font correspond to higher-level actions.
1. The actor places four pegs in a line at the upper left corner of table, using both hands.
2. The actor places screws in a line below the pegs, using both hands.
3. The actor places four wheels in a line below the screws, using both hands.
4. The actor places a screwdriver below the wheels, using both hands.
5. The actor places the one sideboard on the lower right corner of the table and then stacks
the other sideboard on top of the first one, using both hands.
6. The actor places the support board above the stacked sideboards, using both hands.
7. The actor places the bottom shelf above the support board, using both hands.
8. The actor stacks the top shelf on top of the bottom shelf using both hands.
9. The actor picks up the top shelf, flips it upside down, and positions it in the center of the
table, using both hands.
10. The actor has now finished organizing the parts on the table.
11. The actor picks up the first sideboard and positions it upright and perpendicular to the top
shelf on the left side, using both hands.
12. The actor picks up a screw and inserts it to the upper left corner of the first sideboard and
top shelf, using her left hand.
13. The actor picks up another screw and inserts it to the lower left corner of the first
sideboard and top shelf, using her left hand.
Perspective and Action Understanding 43
14. The actor picks up the screwdriver and screws in the screws she just inserted to the first
sideboard and top shelf, starting with the upper left screw and then moving to the lower
left screw, using her left hand.
15. The actor has now finished attaching the first sideboard.
16. The actor picks up two pegs and inserts them to the inside of the first sideboard, using her
left hand.
17. The actor picks up the support board and attaches it to the pegs on the inside of the first
sideboard, using both hands.
18. The actor picks up two pegs and attaches them to the right end of the support board, using
her right hand.
19. The actor picks up the second sideboard and attaches it to the pegs in the right side of the
support board, using both hands.
20. The actor has now finished attaching the support board.
21. The actor picks up a screw and inserts it to the upper right corner of the second sideboard
and top shelf, using her right hand.
22. The actor picks up another screw and inserts it to the lower right corner of the second
sideboard and top shelf, using her right hand.
23. The actor picks up the screwdriver and screws in the screws she just inserted to the
second sideboard and top shelf, starting with the upper right screw and then moving to
the lower right screw, using her right hand.
24. The actor has now finished attaching the second sideboard.
25. The actor picks up the bottom shelf and positions it in between the two sideboards, using
both hands.
Perspective and Action Understanding 44
26. The actor picks up a screw and inserts it to the upper left corner of the first sideboard and
bottom shelf, using her left hand.
27. The actor picks up another screw and inserts it to the lower left corner of the first
sideboard and bottom shelf, using her left hand.
28. The actor picks up the screwdriver and screws in the screws she just inserted to the first
sideboard and bottom shelf, starting with the upper left screw and then moving to the
lower left screw, using her left hand.
29. The actor picks up a screw and inserts it to the upper right corner of the second sideboard
and bottom shelf, using her right hand.
30. The actor picks up another screw and inserts it to the lower right corner of the second
sideboard and bottom shelf, using her right hand.
31. The actor picks up the screwdriver and screws in the screws she just inserted to the
second sideboard and bottom shelf, starting with the upper right screw and then moving
to the lower right screw, using her right hand.
32. The actor has now finished attaching the bottom shelf.
33. The actor picks up a wheel and inserts it to the upper left corner of the first sideboard,
using her left hand.
34. The actor picks up a wheel and inserts it to the upper right corner of the second
sideboard, using her right hand.
35. The actor picks up a wheel and inserts it to the lower right corner of the second
sideboard, using her right hand.
36. The actor picks up a wheel and inserts it to the lower left corner of the first sideboard,
using her left hand.
Perspective and Action Understanding 45
37. The actor has now finished attaching the wheels to the cart.
38. The actor flips the now completed television cart over, so that it is now in an upright
position, using both hands.
39. The TV cart is now complete.
The following script describes the steps followed by the actor in the Lego® assembly test video.
All spatial locations mentioned in the script are described relative to the actor. Steps listed in
bold italic font correspond to higher-level actions.
1. The actor places nine yellow blocks in a vertical line on the left side of the table, using
her left hand.
2. The actor places twelve red blocks in a vertical line to the right of the yellow blocks,
using her left hand.
3. The actor places nine green blocks in a vertical line to the right of the red blocks, using
her right hand.
4. The actor places thirteen blue blocks in a vertical line to the right of the green blocks,
using her right hand.
5. The actor has now finished organizing the blocks on the table.
6. The actor stacks three small blue blocks on top of a small red block to form the first leg
of a horse, using her right hand.
7. The actor stacks three small blue blocks on top of a small red block to form the second
leg of a horse, using her right hand.
8. The actor connects the two legs with a large yellow block, using her right hand.
Perspective and Action Understanding 46
9. The actor stacks three small blue blocks on top of the large yellow block to form the
horse’s neck, using her right hand.
10. The actor places a medium blue block on top of the horse’s neck to form its nose, using
her right hand.
11. The actor places a small yellow block, with a picture of an eyeball on it, on top of the
horse’s nose, using her right hand.
12. The actor places a red block, shaped like a saddle, on top of the large yellow block that
forms the horse’s back, using her right hand.
13. The actor places the completed horse on the right side of the table, using her right hand.
14. The blue horse is now complete.
15. The actor stacks three small green blocks on top of a small red block to form the first leg
of a horse, using her left hand.
16. The actor stacks three small green blocks on top of a small red block to form the second
leg of a horse, using her left hand.
17. The actor connects the two legs with a large yellow block, using her left hand.
18. The actor stacks three small green blocks on top of the large yellow block to form the
horse’s neck, using her left hand.
19. The actor places a medium green block on top of the horse’s neck to form its nose, using
her left hand.
20. The actor places a yellow block, with a picture of an eyeball on it, on top of the horse’s
nose, using her left hand.
21. The actor places a red block, shaped like a saddle, on top of the large yellow block that
forms the horse’s back, using her left hand.
Perspective and Action Understanding 47
22. The actor places the completed horse on the left side of the table, using her left hand.
23. The green horse is now complete.
24. The actor connects four small blue blocks together to form a plus shape, using both
hands.
25. The actor connects five small red blocks together to form a staircase shape, using both
hands.
26. The actor connects five small yellow blocks together to form a staircase shape, using both
hands.
27. The actor connects the yellow staircase on top of the red staircase, using her right hand.
28. The actor connects the blue plus on top of the yellow staircase, so as to form a heart,
using her left hand.
29. The actor places the completed heart in between the two horses, using both hands.
30. The heart is now complete.
Perspective and Action Understanding 48
Appendix B: Video Segmentation Instructions
The following is the introduction to segmentation that all participants received:
“Human experience is very complex. As we go about our day-to-day lives, we encounter
a lot of information that we need to make sense of. One way that we do this is to break down our
experiences into events. For example, when you think about your day, you think about it in terms
of the events that happened, such as eating lunch or going to class. These are examples of events
that you were directly involved in. You can think about all of these events on a variety of scales.
For example, you can think about the day in terms of very small events, like reaching for the
alarm clock, picking up a box of cereal, or dropping your keys on the floor. You can also think of
the day in terms of larger events, such as eating lunch, riding to class, or attending a party. Thus,
we can think about events as being as big or as small as we want.
The following is the introduction to action description that all participants in Studies 1a received.
Instruction differences corresponding to different assigned perspectives appear in bold font:
“In this experiment we’re interested in how people understand events when thinking
about them from someone else’s (their own) perspective. You will watch two videos involving a
person assembling objects. We will ask you to divide this video into separate events. You will do
this by using the SPACEBAR to mark off where you believe one event has ended and another
event has begun. Every time you press the spacebar, please briefly state, in terms of the actor’s
(your own) perspective what happened in the segment you just observed.”
The following is the introduction to action description that all participants in Study 2 received.
Instruction differences corresponding to different assigned perspectives appear in bold font:
Perspective and Action Understanding 49
“In this experiment we’re interested in how people understand events when thinking
about them from an actor’s (an observer’s) perspective. You will watch two videos involving a
person assembling objects. We will ask you to divide each video into separate events. You will
do this by using the SPACEBAR to mark off where you believe one event has ended and another
event has begun. Every time you press the spacebar, please briefly state, in terms of the actor’s
(the observer’s) perspective what happened in the segment you just observed.
Perspective and Action Understanding 50
Appendix C: Calculation of Enclosure Scores
Enclosure is a measure of hierarchical encoding that takes into account the conceptual
relation between the boundaries of coarse units—coarse breakpoints—and the boundaries of fine
units—fine breakpoints. If action is perceived hierarchically, then fine units should represent
substeps of a corresponding coarse unit. For example, the fine units “she built one leg,” “she
built a second leg,” “she attached the two legs to a body,” “she built the neck and head,” are
substeps of the coarse unit “she built the blue horse.” Thus, if we pair a given coarse breakpoint
(e.g., “she built the blue horse”) with the closest fine breakpoint in time, that fine breakpoint
should represent the final substep of that coarse unit (e.g., “she built the neck and head”). When
this relationship between a coarse breakpoint and its closest fine breakpoint holds true, the fine
breakpoint tends to be enclosed by, that is, fall before, the corresponding coarse breakpoint. In
previous studies (Hard, Lozano, & Tversky, in press), and in the current ones, when fine
breakpoints are not enclosed by the corresponding coarse breakpoints, over 75% of the time they
are not hierarchically related to the coarse unit. In previous studies, and in the current ones, the
enclosure pattern is the dominant one: within participants, coarse breakpoints more frequently
follow their closest fine breakpoint than precede it (Study 1a: paired-t(39) = 4.07, d = 0.66;
Study 2: paired-t(15) = 6.06, d = 0.54).
The following is an example of how the enclosure score for each participant is calculated:
Step 1: Below on the left is a chronological list of the points in time (starting from the beginning
of the video in ms) that a participant marked coarse and fine breakpoints. We begin by lining up
each coarse breakpoint with the fine unit it is temporally closest to. The results of this are shown
in the first two columns of the table below.
Perspective and Action Understanding 51
Step 2: Once a coarse breakpoint is lined up with a fine breakpoint, we determine whether that
coarse breakpoint fell temporally before or after the fine breakpoint. The results of this
determination are shown in the final column of the table.
Step 3: We now calculate the numerator of the enclosure score. We do this by first checking for
cases in which multiple coarse breakpoints share (i.e., are closest to) the same fine breakpoint.
For each such case, we determine which of the coarse breakpoints the fine breakpoint is in fact
closest to. Only this pairing will be used in determining the participant’s enclosure score; the
other pairing is excluded. In the example below, we have highlighted shared cases and the
breakpoints within that count toward the final enclosure score. The numerator of the enclosure
score is then equal to the total number of cases in which a coarse breakpoint fell after its nearest
fine breakpoint. Thus, our example participant has an enclosure score numerator equal to 9.
Step 4: Enclosure is calculated by taking the numerator calculated in Step 3 and dividing it by
the total number of coarse units. In our example, the participant has a total of 16 coarse units, so
we calculate the enclosure score to be 9/16 = .56.
Coarse Breakpoints
Fine Breakpoints
15828
33428
44606
57843
66292
71449
75351
83828
86347
90515
93472
144188
155832
190967
1630
12839
28443
37702
50078
57303
66713
70159
75655
82995
86238
91303
98414
102119
Coarse Breakpoints
15828
33428
44606
57843
66292
71449
75351
83828
86347
90515, 93472
Fine Breakpoints
1630
12839
28443
37702
50078
57303
66713
70159
75655
82995
86238
91303
98414
102119
Is coarse
breakpoint before
or after fine
breakpoint?
After
Before
Before
After
Before
After
Before
After
After
Before, After
Perspective and Action Understanding 52
192802
200147
108956
111892
118374
124911
138553
142889
153310
160646
173990
186743
192239
197879
144188
155832
190967, 192802
200147
108956
111892
118374
124911
138553
142889
153310
160646
173990
186743
192239
197879
After
After
Before, After
After
Perspective and Action Understanding 53
Author Note
Sandra C. Lozano, Bridgette Martin Hard, and Barbara Tversky, Department of
Psychology, Stanford University.
We gratefully acknowledge Jane Solovyeva, Herb Clark, Jonathan Winawer, and Angela
Kessell for their helpful comments, and the following grants: Office of Naval Research, Grants
Number NOOO14-PP-1-O649, N000140110717, and N000140210534 to Stanford University.
We also thank Cecilia Heyes and another anonymous reviewer, for their helpful comments and
suggestions.
Please address correspondence concerning this article to Sandra C. Lozano, at the
Department of Psychology, Stanford University, Building 01-420, Jordan Hall, Stanford, CA
94305. Email: scl@psych.stanford.edu.
Perspective and Action Understanding 54
Footnotes
1
Perspective-taking can take many forms, none of which are mutually exclusive. That is,
perspective-taking can involve self-other overlap in the form of emotional, social, mentalistic,
behavioral, or motor representations, or all of the above simultaneously.
2
Hierarchical encoding was operationalized as the proportion of coarse unit boundaries
that fell after their closest fine unit boundary, a measure that correlated highly with hierarchical
descriptions of the action sequence.
3
It remains an open question whether perspective-taking is driven more by seeing the self
in the other or by seeing the other in the self. Regardless of the directionality of self-other
overlap, the downstream consequences of perspective-taking (e.g., liking, rapport, empathy,
sympathy, etc.) seem to be the same (for a discussion of this, see Galinsky et al., 2005).
4
In Study 1a, assigned perspective was dummy coded: self-perspective = 0, actor-
perspective = 1.
5
When participants switched perspectives within a description (e.g., “She put a block on
the left, I mean the right”) only the final utterance (e.g., “the right”) was used to determine the
perspective coding for that description. This rule was applied so that participants who made
perspective errors would not appear to have an inflated number of perspective descriptions.
Descriptions of this type were coded as a perspective error, however.
6
The mean number of self-perspective descriptions for actor-perspective describers was
not submitted to an ANOVA because it had no variance and therefore violated the normality
assumption.
7
In Study 2, assigned perspective was dummy coded: observer-perspective = 0, actor-
perspective = 1.
Perspective and Action Understanding 55
8
Yates-corrected Chi-square tests were adopted here instead of t-tests because actor-
perspective participants made no perspective errors and never qualified their perspective,
resulting in a violation of the normality assumption.
Perspective and Action Understanding 56
Table 1: Effects of Description Perspective on Dependent Measures in Study 1b
Self-Perspective
Free-Describe
Actor-Perspective
Enclosure
.41 (.04)
.59 (.05)
.85 (.05)
Summary Statements
0.40 (0.22)
1.70 (0.26)
3.40 (0.34)
Assembly Errors
4.10 (0.46)
2.40 (0.16)
0.90 (0.27)
Assembly Perspective
20% actor, 80% self
80% actor, 20% self
100% actor, 0% self
Total Fine Units
22.40 (5.96)
32.30 (6.26)
33.90 (8.01)
Total Coarse Units
6.50 (1.15)
7.00 (1.03)
7.70 (1.35)
2.00 (0.65)
2.40 (0.40)
15.00 (2.40)
8.60 (0.64)
0.40 (0.16)
0.00 (0.00)
13.20 (1.20)
12.00 (1.72)
11.50 (1.65)
0.25 (0.10)
0.20 (0.10)
1.48 (0.22)
Actor-Perspective
Statements
Self-Perspective
Statements
Left/Right Side
References
Left/Right Hand
References
Perspective and Action Understanding 57
Figure Captions
Figure 1. Still frames from the object assembly videos used in Studies 1a and 1b (top) and Study
2 (bottom). The top still frame shows the actor with the fully assembled TV cart. The bottom still
frame shows the observer (right) and actor (left) with the fully assembled horses and heart.
Figure 2. Mean enclosure scores (top) and number of assembly errors (bottom) in Study 1a, as a
function of assigned perspective.
Figure 3. The figure illustrates Baron and Kenny’s (1986) mediation technique. Standardized
path coefficients are represented by a, b, c, and c’, where, a represents the association between
IV and mediator; b represents the association between the mediator and the DV (when IV is also
a predictor of DV); c represents the association between IV and DV; and c’ represents the
association between IV and DV when controlling for the mediator.
Figure 4. The figure illustrates the mediation analyses testing whether hierarchical encoding
mediated the relationship between assigned perspective and assembly errors. Values have been
substituted for the corresponding variables described in Figure 3. The top half of the figure
corresponds to the mediation analysis for Study 1a, while the bottom half of the figure
corresponds to the mediation analysis for Study 2.
Figure 5. Illustration of the actor from the video (top), participants assembling from the actor’s
perspective (middle) and participants assembling from a self-perspective (bottom).
Figure 6. Mean proportion of description references made from each of the four spatial
perspective categories, as a function of assigned perspective, for Study 1a (top), and Study 2
(bottom).
Figure 7. Mediation analysis showing that references to the actor’s hands mediated effects of
assigned perspective on hierarchical encoding, as measured by enclosure scores. Values have
Perspective and Action Understanding 58
been substituted for the corresponding variables described in Figure 3. The top half of the figure
corresponds to the mediation analysis for Study 1a, while the bottom half of the figure
corresponds to the mediation analysis for Study 2.
Figure 8. Mean enclosure scores (top) and number of assembly errors (bottom) in Study 2, as a
function of assigned perspective.
0.8
0.6
0.4
0.2
0
Actor
Self
Assigned Perspective
4
Assembly Errors
Enclosure Score
1
0
Actor
Self
Assigned Perspective
Proportion of References
1
0.8
None
Neutral
Self
Actor
0.6
0.4
0.2
0
Actor
Self
Proportion of References
Assigned Perspective
1
0.8
None
Neutral
Observer
Actor
0.6
0.4
0.2
0
Actor
Observer
Assigned Perspective
Enclosure Score
1
0.8
0.6
0.4
0.2
0
Actor
Observer
Assigned Perspective
Assembly Errors
16
12
8
4
0
Actor
Observer
Assigned Perspective
Download