The Meaning of Gestalt for Human Wayfinding

advertisement
Kai Hamburger & Florian Röser
The Meaning of Gestalt for Human Wayfinding –How Much Does
it Cost to Switch Modalities?
1. Introduction
About 120 years ago the Austrian philosopher Christian von Ehrenfels introduced the principle of Übersummativität (superadditivity) as one characteristic
for a Gestalt (von Ehrenfels 1890). The basic premise is that the entirety of an
object has properties that are different from those of its parts, thus an object
which is structurally composed of elementary features cannot be defined only
by some isolated features (this is especially true for landmarks). Over the intervening years this perceptual property has been demonstrated repeatedly (e.g.,
Wertheimer 1912, phi motion; Koffka 1935, simultaneous color contrast; Steinman, Pizlo, et al. 2000; Sharps & Wertheimer 2000). The question of how our
perceptual system creates Übersummativität has become increasingly relevant today since “[vision science is…] witnessing a physiology that aims to understand
these principles [Gestalt factors] in terms of interactions within neural networks”
(Spillmann & Ehrenstein 2004, 1573). Such a Gestalt-Neuroscience will have to
address fundamental questions concerning the nature of the stimulus information that can be used by Gestalt organizing principles (Tse 2004). All this is true
for a perceptual level, where vision scientists have repeatedly shown how we are
tricked by visual illusions (e.g., Robinson 1972; Eagleman 2001; Hamburger
2007). But, what about Gestalt and its meaning on higher, more cognitive levels? How does our perception contribute to cognitive processes, namely spatial
memory and spatial orientation? In order to provide insights into these questions,
we are here concerned with Übersummativität and spatial information, namely
landmarks. In our work we not only want to focus on the meaning of landmarks
(objects) for human wayfinding but also the different modalities this information may be processed in.
We are all quite familiar with the situation of travelling through a new, unknown
or at least unfamiliar environment. Which kind of information prevents us from
getting lost? It is generally accepted that we use so-called landmark information,
objects that pop out from their environment and therefore become salient. Classical definitions of landmarks and landmark salience assume that an object must
have a high visual contrast to the immediate surround in order to be easily distinguishable and therefore to possess a high salience (e.g., Lynch 1960; Presson &
GESTALT THEORY
© 2011(ISSN 0170-057 X)
Vol. 33, No.3/4, 363-382
GESTALT THEORY, Vol. 33, No.3/4
Montello 1988). But, is it really/solely visual information that we rely on for successful wayfinding? Furthermore, are single features/characteristics of an object
responsible for being a useful landmark or is it rather the whole Gestalt? Then,
in terms of figure-ground segregation we have to deal with the question of how
a landmark pops out from its immediate surround and how different modalities
are interconnected to provide such an extraction of landmarks. From research
with visually impaired or blind people it is known that humans can also make
use of acoustic or haptic information for wayfinding (e.g., Loomis, Golledge, et
al. 1998; Habel, Kerzel, et al. 2010). Studies with unimpaired participants are
rare and often focus on a single domain (mainly vision). In a previous study we
were able to show that spatial orientation with landmarks in other modalities
than vision is possible and sometimes even more efficient (Röser, Hamburger, et
al. 2011). But, how much does it then cost (time, error) to switch between modalities if needed? For example, you are provided with a verbal description of a
route containing a few landmarks. When you travel the route you will transform
the verbal information into visual information to be able to recognize a certain
building as a relevant (or irrelevant) part of the route description. In this case
you have to switch from verbal to visual information (further details are provided in the methods section). An extensive body of literature on task-switching
is available (different tasks within a single modality; e.g., Abruthnott & Woodward 2002), but modality switching is rather new in the field of landmark research. Here, we provide first insights into possible costs of modality switching
for landmark information and furthermore challenge some of the classical findings on landmark information processing. In summary, we expect that landmark
information can successfully be processed in different modalities (visual, verbal, acoustic), with an advantage for visual information. Therefore, a landmark
as a whole is present in more than a single perceptual modality (but maybe at
different degrees of abstraction). Other modalities and cognitive processes (e.g.,
memory) cause a landmark to become a useful Gestalt for wayfinding. Furthermore, switching between modalities should be accompanied by decreased recognition/wayfinding performance and increased decision times compared to trials
in which the modality remains the same, since the Gestalt needs to be established
and made available in other modalities.
In wayfinding research different forms of landmark salience have been differentiated and established in computational models. The different saliences are visual
salience, semantic salience, and structural salience (for an overview see Caduff &
Timpf 2008; Hamburger & Knauff in press). David Caduff and Sabine Timpf
(2008) recently introduced the term cognitive salience in order to emphasize that
the information processing system (the human brain) also contributes to the
meaning and importance of a landmark and not just the object features alone.
In previous studies we were able to show that the visual salience is overempha364
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
sized, since landmarks may also be successfully processed in other modalities
(e.g., acoustic) (Hamburger, Röser, et al. in preparation; Röser et al. 2011). Therefore, we redefined the visual salience as a perceptually based (contrast) salience.
This should be kept in mind for the remainder of this manuscript.
2. Experiments
To test the above research questions and hypotheses, we tested various conditions: congruent trials with the same modality (visual/visual; verbal/verbal;
acoustic/acoustic) and incongruent ones with different modalities required (visual/verbal; verbal/visual; acoustic/visual; visual/acoustic). A comparison of verbal
and acoustic was neglected here. We rather wanted to focus on modality switching with visual information since visual information is almost always present in
everyday life (when we need to switch between modalities). We performed recognition as well as wayfinding experiments. As a side note, in the recognition experiments distractors were shown to the participants (objects that were not presented during the learning phase), which will not be presented and discussed in
detail here, since they are only necessary for obtaining possible errors. In other
experiments we could show that landmarks may be processed in other modalities, and additionally, even better than in the visual modality (Hamburger et al.
in preparation). Therefore, our focus here is on the switching costs for making
Gestalt characteristics or full Gestalten available in a different modality at retrieval than at learning.
2.1 Methods
The experimental design for the following experiments was a within-subject design with one factor. The dependent variables for each experiment were the performance (correct recognition; correct route decisions) and the decision times required for recognition and for route decisions.
Due to the fact that the material for the three experiments described in this article is the same, with variations in the landmark objects and modalities, it will
be described here once, with only the additional variations in the appropriate
sections. The samples will also be described within the appropriate experimental sections.
2.1.1 Material – Virtual Environment
The experiments were conducted in our 3D virtual maze (Squareland), which
was programmed with the freeware graphics software Google SketchUp 6.4 ©
by Google ©. The maze consisted of a 10x10 block design with an edge length of
5.5x5.5m and a height for each block of 2.95m. The paths between the blocks were
2.75m wide. Since all “streets” between the blocks are in orthogonal direction to
each other, it is somehow reminiscent of the layout of many major north-american
365
GESTALT THEORY, Vol. 33, No.3/4
cities. This simple structure provides the possibility of high experimental control,
since –without additional objects or other types of information– all intersections
and path sections look identical. Thus, only the landmarks/objects provided in the
maze offer valuable information for successful wayfinding. Figure 1 gives an impression of the virtual environment used in this experimental series. For a more
detailed description of the SQUARELAND virtual environment the reader is referred to the work by Kai Hamburger and Markus Knauff (in press).
For the current experiments the environment was designed as an indoor environment, with typical indoor structures such as white walls, a wooden floor and a ceiling. For the experiment a video clip was created with the freeware Fraps 3.1.1 (by
Beepa Ltd.). The simulated eye-height of the participant was set to 1.70m which is
close to the current average human eye-height. The simulated walking speed was
2.3m/s. The different routes that had to be travelled were identical across the different conditions and participants were randomly assigned to the conditions. The full
path length was 176m; start and goal were highlighted by wooden doors.
To avoid that participants could see through the maze and perceive more than
one intersection (or landmark) at a time, a haze was implemented in front of the
participants. This haze had a distance of 10m from the participant. Thus, participants could only see one landmark (object) at a time. After one landmark disappeared from view, the following landmark and decision point could not be perceived yet. This was necessary to provide clear links between certain landmarks
and their corresponding decision points.
Fig. 1: Screenshot of the virtual environment (SQUARELAND) used in this experimental series. Shown is an example of the conditions using animal pictures (here: seagull). You find the coloured version of this figure at http://gestalttheory.net/gth/meaning.html
366
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
The projection screen subtended 60 deg in height and 67 deg of visual angle in
width (170 x 230cm at a distance of 100cm).
2.1.2 Material – Landmark Objects
For the first two experiments we used pictures, names (words), and sounds of animals as landmarks. In a pre-experiment a total of 20 participants evaluated the
stimulus material (37 animals) to verify that the sounds, the corresponding images, and names were attributed to the same animals, so that the experimental
conditions were comparable (equally difficult). For this purpose, participants had
to assign animal words to animal pictures, pictures to words, sounds to words,
words to sounds, sounds to pictures, and pictures to sounds. We measured the
correct assignments as well as the time required for each assignment. With this
method we obtained the basis for reliably choosing the best 24 objects (out of 37)
for Experiments 1 and 2.
For the third experiment we used pictures and names of famous buildings. It
was determined in a pre-experiment (N=22) which of 98 given buildings from
all over the world were judged as most famous by the participants and they also
had to make assignments as described above. The 24 with the best values (highest
famousness and the fewest incorrect assignments) were used in the experiment.
During the recognition phase the images were surrounded by a black frame and
were presented centrally on the screen without any perspective distortion as has
been the case at the wall within the maze. We need to mention that distortions
did not affect participants’ performance (objects could perceptually equally well
be recognized). The sounds were visually indicated by a pictogram of a loudspeaker on the screen. Thus, objects had to be recognized from a different perspective, which should not be more difficult for such material, since all animals
were rather easy to identify.
2.1.3 Procedure
Participants were passively led through the virtual environment and had to learn
a certain route including the visual or verbal information on the left wall. Presenting the material on the left had two reasons. First, when a word is presented
on the left it is possible to read it in the ‘natural’ direction and in direction of the
path, which is not the case when it is shown on the right, as then participants
have to read the word ‘backwards’ (in the opposite direction of walking). Second, keeping the position constant prevented participants from only encoding
the structural path information (e.g., the object is always in direction or in opposite direction of the turn). In the case of sounds the sounds were given when
the visual and verbal information came into sight (presentation time was therefore equal for all stimuli with duration of 4s). In the subsequent test phases the
participants had to pass a recognition task and afterwards a wayfinding task. In
367
GESTALT THEORY, Vol. 33, No.3/4
the recognition task they had to decide whether they were presented with a certain landmark in the learning phase or not (independent of modality). They had
to indicate their decision via a key press on a response box RB-530 (Cedrus Corporation ©). The keys were labeled with the required answers: yes (means “I have
seen this stimulus in the learning phase”) and no (“I did not see this stimulus in
the learning phase”). The stimuli in the recognition task (pictures) on the screen
had an average size of 70 cm (35 deg) in width and 50 cm (27 deg) in height. Letters of the words were 10 cm in height for upper case letters and 7 cm for lower
case letters. Words were easily readable and the full size of words differed due to
word length. Sounds had a loudness of 35 decibel (db). During the wayfinding
task participants had to make the appropriate decision at the intersection to follow the correct route. Stimulus size varied here due to the movement within the
maze.
The experiments consisted of modality congruent and modality incongruent trials. In modality congruent trials the required sensory modality for processing
the information in the learning phase remained the same in the test phase (e.g.,
visual/visual). In the incongruent trials there was a switch from one sensory modality in the learning phase to a different sensory modality in the test phase (e.g.,
visual/acoustic). We tested with a within-subject design, so that modality-switch
and no modality-switch was tested for each participant.
3. Experiment 1: Visual and Acoustic
In the first experiment we compared visual (animal pictures) and acoustic (animal sounds) landmarks. Therefore, we could compare four different categories
(types of processing; sequence of required modalities): visual and visual, visual
and acoustic, acoustic and visual, and acoustic and acoustic. The material and
procedure for this experiment are described above.
3.1. Method
3.1.1 Participants
The sample of this experiment consisted of 20 students from the University of
Giessen (19 females, 1 male). The mean age was 25.7 years (SD=7.1). All of them
were naïve with respect to the research questions and did not have any preexperience with wayfinding experiments in virtual environments. All participants had normal or corrected-to-normal visual acuity. None of them suffered
from epileptic seizures or motion sickness, which were exclusion criteria for this
study. They received course credits for participation in the experiment. Participation was limited to one experiment, so that they were not included in any other
wayfinding experiment of this series. All participants provided informed written
consent.
368
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
3.2. Results
All data reported in the following experiments represent correct responses in the
form of correct assignments whether a stimulus was present in the learning phase
or not (given in %) and decision times (given in milliseconds, ms). We calculated a one factorial F-test for repeated measures and a post-hoc t-test if the F-test
provided any significant effects for all of the following comparisons between the
four categories. For all other data we calculated a t-test for repeated measures.
For all results the values are Greenhouse-Geisser corrected for the F-values and
we used the Bonferroni-Holm correction for the post-hoc t-values.
3.2.1. Recognition
The mean performance for this experiment was 64.17%. For the landmarks
the F-test for the performance showed differences between the four categories
(F(3)=10.827, p<.001) (see table 1). Post-hoc tests revealed the following significant differences: picture to picture differed from sound to picture (t(19)=-4.414,
p<.001); picture to picture differed from sound to sound (t(19)=-4.156, p=.001);
picture to sound differed from sound to picture (t(19)=-3.757, p=.001); and picture to sound differed from sound to sound (t(19)=-3.249, p=.004).
For the decision time we had a mean value of 2053ms. For the landmarks the Ftest revealed a significant result F(3)=4.376, p=.008 (see table 1). Post-hoc tests
showed a significant difference between picture to sound and sound to picture
(t(19)=-3.245, p=.004). The other pairwise comparisons were not statistically significant.
Table 1: Mean performance and decision times for the landmarks (animal pictures and sounds)
in the recognition task across all four modality conditions.
3.2.2 Modality Switch
If we take a look at the two possible combinations of modalities we see that we
can merge the change from one modality to another (picture to sound and sound
to picture) and compare these results with the other values (picture to picture
and sound to sound) (see figure 2). Here, we obtained neither for the performance nor for the decision time significant differences between the two groups
(both t-values are smaller than 1).
369
GESTALT THEORY, Vol. 33, No.3/4
Fig. 2: Mean performance and decision times in the recognition task (animals) for the same modality (picture to picture and sound to sound) and modality switch (picture to sound and sound
to picture). Error bars denote the standard error.
3.3 Discussion
Thus far we can see that a modality switch between visual and acoustic information is possible (which would be expected), but it comes at no additional switching costs (which is unexpected from the perception literature). The question
therefore arises whether the information is automatically processed in different
modalities at learning or is transformed into the appropriate modality at retrieval. Since we could not find any evidence for switching costs, we may assume that
the different Gestalt information is initially processed within different modalities so that the information is represented in a modality independent format in
the brain. This is in line with the dual coding theory of spatial cognition by Tobias Meilinger and Markus Knauff (2008). One further point we would like to
address is the difference between pictures and sounds. When pictures had to be
learned initially, performance in the recognition task was at chance level independent of the modality (48%). When sounds were learned, performance was
much better (80%). These differences in favor of the encoding of sounds has been
found earlier (Röser et al. 2011). This means that landmarks which are presented as sounds are easier to remember for participants compared to pictures. This,
however, does not affect the results for the modality switch, because we had two
combinations of stimuli for both conditions (sounds/sounds and pictures/pictures; pictures/sounds and sounds/pictures). Furthermore, it will be shown in the
wayfinding task that participants were able to make the correct route decisions in
combination with the stimulus material that had to be learned (e.g., wayfinding
performance was above chance level in all conditions).
370
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
4. Experiment 2: Visual and Words (Semantic)
In the second experiment we investigated visual (animal pictures) and verbal
(words; animal names) landmarks. The material and procedure are described
above.
4.1. Method
4.1.1 Participants
The sample consisted of 20 students from the University of Giessen (19 females, 1
male). The mean age was 22.05 years (SD=2.3). They all met the same criteria as
the participants of the previous experiment and received the same compensation.
4.2. Results
4.2.1 Performance
Participants showed a mean performance of 67.5% correct responses. For the
landmarks the F-test revealed no significant differences between the four categories (F(3)=1.441, n.s.) (see table 2).
For the decision time we obtained a mean of 1201ms. For the landmarks the Ftest showed a significant result F(3)=3.029, p=.037 (see table 2). Yet, post-hoc
tests revealed no significant differences for any of the pairwise comparisons.
Table 2: Mean performance and decision times for the landmarks (animal pictures and words)
in the recognition task across all four modality conditions.
4.2.2 Modality Switch
The results for the comparison between modality switch and equal modalities are
presented in figure 3. However, the t-test for the performance showed no significant difference between the two conditions (t(19)<1). But we found a significant
result for the decision time in the recognition task. The modality switch led to a
slightly slower decision time (t(19)=-2.610, p=.017).
371
GESTALT THEORY, Vol. 33, No.3/4
Fig. 3: Mean performance and decision times in the recognition task (animals) for the same modality (picture to picture and word to word) and modality switch (picture to word and word to
picture). Error bars denote the standard error.
4.3 Discussion
In contrast to the first experiment the categories with a picture in the learning
phase provided a higher performance here and are significantly above chance level (t(19)=2.491, p=.022). This could have happened due to the fact that pictures
and words are more similar than pictures and sounds (initial processing in the
visual system) possibly leading to the above mentioned overlapping effect. Regardless of whether they are different from chance level or not, we want to concentrate on the modality switch. A modality switch between visual and verbal
information is also possible in either direction. This time there were at least some
switching costs in the decision times present. This supports the notion that the
required information is translated into the correct modality at retrieval. However, even though significant, these costs are within a range of 238ms. Thus, processing and retrieval of these stimuli is again very effective. Therefore, we again
assume that the necessary information is processed independently of the modality in which it is initially presented. In other words, an image can be present in
the form of a word, while a word may elicit a mental image of the stimulus. This
assumption also favors a more global and holistic processing strategy, since processing single features first and then (sequentially) putting them together to build
a whole (object) would probably require more time. We would also like to stress
the notion of the dual coding theory of spatial cognition (Meilinger & Knauff
2008; Meilinger, Knauff, et al. 2008) here. This was true for animated content
(animals). Since animated and unanimated objects are represented differently in
the brain (e.g., Vidal, Ossandón, et al. 2010), it is also of interest to investigate a
modality switch with unanimated objects (here buildings).
372
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
5. Experiment 3: Visual and Words (Buildings)
In the third experiment we compared visual and verbal (words) landmarks. The
material and procedure is described above. In contrast to the previous experiments here the landmarks (24) were famous buildings from all over the world.
They were either presented in pictorial or textual form.
5.1. Method
5.1.1 Participants
The sample of this experiment consisted of 10 participants (half of them students
of the University of Giessen; 8 females, 2 males). The mean age was 28.2 years
(SD=9.2). They all met the same criteria as the participants of the previous experiments and received the same compensation.
5.2. Results
5.2.1. Performance
The mean correct recognition for the landmarks was 66.67% and there was a significant difference between the four categories present (F(3)=4.593, p=.010) (see
table 3). Post-hoc t-tests showed significant differences between pictures to words
and words to words (t(19)=-3.545, p=.006).
The mean decision time for recognition was 1698ms and the F-test showed no significant differences between the four categories (F(3)=1.139, p=.351) (see table 3).
Table 3: Mean performance and decision times for the landmarks (pictures and names of famous
buildings) in the recognition task across all four modality conditions.
5.2.2 Modality Switch
The results for the comparison between modality switch and equal modalities
are presented in figure 4. Here, no significant differences could be found neither
for the performance in the recognition task (t(19)<1) nor for the decision time
(t(19)<1).
373
GESTALT THEORY, Vol. 33, No.3/4
Fig. 4: Mean performance and decision times in the recognition task (buildings) for the same
modality (picture to picture and word to word) and modality switch (picture to word and word
to picture). Error bars denote the standard error.
5.3 Discussion
Again, a modality switch between learning and retrieval could be performed. A
lower performance when switching from the visual to the verbal modality could
have been expected, but the performance difference within the word modality
comes as a surprise. There, recognition should have been the same, since pure
recognition without a modality switch is required. More interesting is that again
there are no additional costs for a modality switch. This indicates once more that
the full Gestalt (with all the accompanying information; e.g., meaning) is processed right away at the moment of its first occurrence. If this was not the case,
switching costs must have been present. These findings, this time in the semantic (verbal) domain, support the results of the previous experiments and the introduced theory.
So far, we presented results of rather simple recognition tasks for landmarks after
learning the landmarks and route information (directions). It is important to note
that we are capable of storing and retrieving such information in different modalities and that storage seems to be modality independent. However, what happens
if we learned the information in one modality, but have to navigate through an
unfamiliar environment and need this previously learned information? In other
words, I learned landmarks and route information from a verbal description but
have to recognize the buildings and the corresponding turns while navigating
through the environment. Does that require additional cognitive resources and is
this then accompanied by switching costs? Possible answers may be provided from
our wayfinding (navigation) task which will be reported next.
374
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
As a side note, we again obtained performances around chance level for the picture condition (pictures in the learning phase). A possible explanation for this is
different than that from the previous experiments. Here, an overlap effect due
to the different sensory modalities could not occur. In contrast to the second experiment, more complex stimulus material was used (buildings). It seems as if
it is more difficult for participants to remember pictures of buildings than the
corresponding names. This is supported by the conditions including a modality
switch, where a higher performance for the word to picture condition than for
the picture to word condition was obtained.
Additionally, we here need to discuss the decision time differences between the
three experiments. Possible explanations are the following. In the sound condition (Experiment 1) participants normally waited until the sound was terminated before they responded. This might result in the attitude that they do not need
to respond very fast in the picture condition, which might in turn lead to longer
decision times in the visual condition as well. Previous experiments testing only
isolated modalities only showed an increased decision time in the sound conditions. Experiment 2 overall revealed the shortest decision times. A possible explanation is that simple, over-learned and comparable (pictures and words) stimulus material (animals) was used. There, the processing of the whole Gestalt occurs rather fast and more or less automatically (without any conscious percept).
In Experiment 3, however, we used more complex stimulus material (names and
pictures of famous buildings) that requires conscious processing (since it is not
over-learned), resulting in longer decision times.
6. Wayfinding Task
In the following section we describe the results for the wayfinding task, which
was part of all previously reported experiments.
6.1 Method
Participants and material were already described above for each experiment. In
each experiment, following the recognition task, participants were again presented the video sequence of the learning phase. All of them were similarly designed,
namely the participants read a short instruction with the task to find the correct
path they had previously learned. Then they saw the video sequence again starting from the beginning. This time, the movie stopped at each intersection and
they had to indicate the direction in which the movie sequence/path would proceed (only left and right turns) via key presses. Dependent variables here were
the number of correct route decisions (at the intersections) and the decision time.
375
GESTALT THEORY, Vol. 33, No.3/4
6.2 Results
A detailed presentation of the number of correct responses and the decision times
for the four different categories in the three experiments are presented in table 4.
Table 4: Mean correct decisions and decision times for the wayfinding task of the three experiments.
For the comparison between animal sounds and pictures (Experiment 1) the
mean for correct direction decisions was 73.75% (SEM=6%). The mean decision time was 1145ms (SEM=189ms). Neither for the direction decisions (F(3)<1)
nor for the decision time did the F-tests show a significant difference (F(3)<1)
between the four categories. This also accounts for the comparison of modality
switch and equal modality for the direction decisions (t(19)<1) and the decision
time (t(19)<1).
The mean wayfinding performance for Experiment 2 (the comparison between
pictures and words of animals) was 71.25% (SEM=6%) for the correct direction
376
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
decisions and 1658ms for the decision time (SEM=512ms). Here again, no significant differences between the four categories, neither for the correct direction
decisions (F(3)<1) nor for the decision time (F(3)<1) occurred. Just as for the previous experiments, the comparison between switching modalities and equal modalities were not significant neither for the correct direction decisions (t(19)<1)
nor for the decision time (t(19)<1).
In Experiment 3 (comparison of pictures and names of famous buildings) the
mean performance for the wayfinding task was 71.67% (SEM=9%) with a mean
decision time of 1658ms (SEM=173ms). Again, there are no significant differences between the four categories, neither for the performance (F(3)<1) nor for the
decision time (F(3)<1). The results for the comparison between modality switch
and equal modalities did not reveal any significant differences for the correct direction decisions (t(19)<1) and the decision time (t(19)<1).
6.3 Discussion
What we have obtained for the recognition tasks is also true for the wayfinding
task. Here, as well, a modality switch did not have any influence or additional
processing costs for the landmarks or direction information. If we need to orientate ourselves in or have to find our way through an environment, the spatial information is automatically (unconsciously) processed within different modalities.
7. General Discussion and Conclusion
First, we may notice that our participants were able to process (and make use of)
landmark information in different modalities in a recognition and wayfinding
task. Thus, healthy participants are capable of using Gestalt (landmark) information from modalities other than just vision. This challenges the classical view
concerning the importance of visual information in landmarks. We may therefore think about reducing the amount of visual information in route descriptions/route information, since the visual modality is already occupied by other
information (e.g., paying attention to traffic and traffic lights, the environment
and/or the general wayfinding and walking task). The processing of a landmark
Gestalt seems also to be easier in other modalities (Hamburger et al. in preparation; Röser et al. 2011).
In general, we also find that participants are able to successfully switch between
the modalities at learning and retrieval. But, this is not the whole story. They
also switch the modality at no additional cost, meaning that the performance
is stable no matter whether the modality needs to be switched or not. This is
an interesting finding since the opposite would be expected from the literature.
Such challenging results and the accompanying discussion should be integrated
into future research. Above we stated that the visual domain should be relieved.
We also have evidence for this assumption from the modality switching results,
377
GESTALT THEORY, Vol. 33, No.3/4
since participants performed better when acoustic or verbal information was presented during the learning phase but were later confronted with visual information. When they were presented with visual information during learning, performance was worse during test.
To summarize, we did not find evidence for systematic costs when it comes to
modality switching in recognition of landmarks and wayfinding with landmark
information. At a first glance the absence of any statistical differences (null-effects) might appear to be disappointing but this seems to be wrong as we will try
to point out in the following.
If we had found systematic evidence for modality switching costs, this would
mean that our brain has to deal with an increased workload in order to process
the available information to prepare and then execute a relevant action (here: to
make a turn or stay on the path). However, this is not what we found with our
recognition and wayfinding paradigms. Participants were performing the tasks
equally well no matter whether the learning conditions (modalities) were congruent with the retrieval situation or not. Thus, our brain is excellently capable
of dealing with Gestalt information in different modalities (transformation or adjustment) without additional costs. Some supporting evidence may be found in
the theory of Meilinger and Knauff (2008) who assume that much of the relevant information, no matter in which form it was initially presented, is automatically made available in different forms, e.g., propositional and visual. It is therefore possible that at the moment of information retrieval (from working memory
or long-term memory), the necessary Gestalt information is already available to
the different modalities without the necessity of any further processing.
As a side note, one might argue that the verbal information is also a kind of visual information. This is correct but other processing steps are involved, since for
example a word needs to be mentally transformed in order to be comparable to
the real visual information. Therefore, the words are just an abstract visual representation of visual content. The word “frog” may result in different interpretations by participants of how it looks like and it does not necessarily have to be
green as was the case in our visual condition. Furthermore, in the first instance a
word belongs to the semantic domain whereas a picture does not.
These assumptions on modality switching and landmark information are thus
far quite speculative since they are based on just a few empirical findings from
human landmark and wayfinding experiments. However, they provide first valuable insights and should encourage us to further investigate this field. It may also
prove worthwhile to use navigational relevant information in modalities other
than just vision, not only in visually impaired or blind people (e.g., Loomis et al.
1998) but also for normal people in order to reduce the amount of perceptual and
cognitive load in a single modality.
378
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
Our assumptions in the discussion sections to each experiment are for example
in line with functional imaging studies, reporting that an acoustic stimulation
(animal sounds) also leads to activation increases in the visual cortex (Tranel,
Damasio, et al. 2003). It seems as if we automatically perform (conscious or unconscious) mental imagery. Thus, the Gestalt/landmark information is indeed
processed in different modalities. This finding may not explain why a visual
stimulation (animal pictures) does not result in additional activations in auditory
areas of the brain (Finney, Clementz, et al. 2003). This latter finding would again
suggest switching costs which could not be found in our experiments. However,
it again highlights the importance of and the reliance on the visual system, even
though other modalities are capable of taking over the job. Here, we provided
first behavioral evidence for the absence of switching-costs in human wayfinding. We also provided some evidence for holistic Gestalt processing in this research field. Future research may then address questions of whether landmarks
are processed in the same brain areas as simple objects without any context information or additional activations due to the linkage of landmark/Gestalt with
its surround (context). It might also be that other brain areas turn out to be of
importance that we might not have on our “list of expected candidates” so far.
In order to find conclusive answers, more brain imaging studies on the neural
correlates of landmarks are required, which is one of the very next points on our
research agenda.
Finally, we pointed out that our brain automatically processes Gestalt information in different modalities, even if they were exclusively presented in a single
modality. Thus, our knowledge and our experiences –retrieved from long-term
memory– are highly involved in the processing of a Gestalt in spatial cognition.
In previous experiments we were able to demonstrate that single (visual) features
do not represent a full landmark (Röser et al. 2011). Instead, meaning arises from
the combination of visual features (or other modalities), our semantic knowledge
about these objects, or experiences with them, etc. Initially we cited Christian
von Ehrenfels with the words that “the whole is more than just the sum of its
parts” and now we may complete the circle: landmarks and direction information are linked together as a Gestalt, e.g., at the church I have to turn left, etc. Every single information/feature on its own does not contain much valuable information for wayfinding, but together they provide us with all the necessary information that we need to successfully navigate or differentiate environments. We
showed that the concept of a Gestalt is also an important one in the research domain of spatial cognition and we hope that this contribution motivates to move
on in this direction.
379
GESTALT THEORY, Vol. 33, No.3/4
Summary
How much does it cost to switch between different modalities when we have to process
Gestalt information in form of landmarks in wayfinding (navigation)? In a series of experiments we show that in recognition and wayfinding tasks with landmarks there is no evidence for costs in modality switching (lower performance, increased decision times). For
example, learning visual information and retrieving visual information is as effective as
learning visual information but retrieving this information in the acoustic domain and
vice versa. This has been tested for visual, verbal, and acoustic material (images, words/
names, and sounds of various animals and images and words/names of famous buildings).
Our results challenge the notion of switching-costs in the domain of human wayfinding.
We assume that the human brain already integrates the relevant Gestalt information in
different modalities so that no additional costs occur at the time of information retrieval.
Furthermore, the connections between the path and the landmarks seem to be of great
relevance, since wayfinding performance was as good as performance in the recognition
task. Our conclusion is that a modality switch is possible at no additional costs so that
landmarks may also be useful in more (different) modalities than just the visual one. Furthermore, we have evidence that our cognitive system processes more information during
the learning of landmarks and spatial information than is physically present, so that the
whole is in this cognitive domain (spatial cognition) as well is much more than just the
sum of its parts.
Keywords: Landmarks, wayfinding, modality switching, modality independent processing.
Zusammenfassung
Wie viel kostet es, zwischen verschiedenen Verarbeitungsmodalitäten zu wechseln, wenn
wir Gestaltinformation in Form von Landmarken beim Wegfinden (Navigation) verarbeiten müssen? In einer Reihe von Experimenten zeigen wir auf, dass es beim Wiedererkennen
und Wegfinden von und mit Landmarken keine Evidenzen für sogenannte Wechselkosten
gibt (niedrigere Leistung, höhere Entscheidungszeiten). Beispielsweise erwiesen sich das
Lernen und der Abruf von visueller Information als gleich effektiv wie das Erlernen von visueller Information, die dann in der akustischen Domäne abgerufen werden musste. Dies
wurde für visuelles, verbales und akustisches Material untersucht (Bilder, Wörter/Namen
und Geräusche von Tieren sowie Bilder und Wörter/Namen von berühmten Gebäuden).
Unsere Befunde widersprechen der Auffassung von Wechselkosten beim Wegefinden. Wir
gehen davon aus, dass das menschliche Gehirn sämtliche relevante Gestaltinformation bereits in die verschiedenen Modalitäten integriert, so dass es in der Abrufsituation zu keinen zusätzlichen Verarbeitungsschritten und damit Wechselkosten kommt. Darüber hinaus scheinen die Verknüpfungen zwischen Weginformation und Landmarke von großer
Bedeutung zu sein, da die Wegfindeleistung mindestens genauso gut war wie das reine
Wiedererkennen. Unsere Schlussfolgerung lautet, dass ein Modalitätswechsel möglich ist
und dieser ohne zusätzliche Verarbeitungskosten stattfindet, so dass Landmarken auch
durchaus in anderen Modalitäten als der visuellen hilfreich sein können. Zusätzlich gibt es
Evidenzen, die für eine Verarbeitung von mehr Landmarken- und Routeninformationen
sprechen als tatsächlich während der Präsentation gegeben sind, so dass das Ganze auch in
diesem kognitiven Bereich (Raumkognition) mehr ist als die Summe seiner Teile.
380
Hamburger & Röser, The Meaning of Gestalt for Human Wayfinding
Schlüsselwörter: Landmarken, Wegfinden, Modalitätswechsel, modalitätsunspezifische Verarbeitung.
Acknowledgement
This study was supported by the German Research Foundation (DFG HA5954/1-1). We
thank Karl Kopiske and Malina Traub for their help with the data collection and Nadine Jung for proofreading the manuscript.
References
Abruthnott, K. D. & Woodward, T. S. (2002): The influence of cue-task association on switch costs and alternating-switch costs. Canadian Journal of Experimental Psychology 56, 18 – 29.
Caduff, D. & Timpf, S. (2008): On the assessment of landmark salience for human navigation. Cognitive
Processing 9, 249 – 257.
Eagleman, D. M. (2001): Visual illusions and neurobiology. Nature Reviews Neuroscience 2, 920 – 926.
Finney, E. M., Clementz, B. A., Hickok, G. & Dobkins, K. R. (2003): Visual stimuli activate auditory cortex in deaf subjects: evidence from MEG. NeuroReport 14, 1425 – 1427.
Habel, C., Kerzel, M. & Lohmann, K. (2010): Verbal assistance in tactile-map explorations: A case for visual representations an reasoning, in McGreggor, K. & Kunda, M. (eds., cochairs) (2010): Papers from
the 2010 AAAI Workshop Visual Representation and Reasoning. Technical Report WS-10-07. Menlo Park,
California: The AAI Press.
Hamburger, K. (2007): Visual illusions – Perception of luminance, color, and motion in humans. Saarbrücken:
VDM Verlag Dr. Müller.
Hamburger, K. & Knauff, M. (in press): Squareland: A virtual environment for investigating cognitive
processes in human wayfinding. PsychNology.
Hamburger, K., Röser, F. & Knauff, M. (in preparation): Can we hear famous landmarks? – A comparison
of visual, verbal, and acoustic landmarks and the influence of familiarity.
Koffka, K. (1935): Principles of Gestalt Psychology. New York: Hardcourt, Brace.
Loomis, J. M., Golledge, R. G. & Klatzky, R. L. (1998): Navigation system for the blind: Auditory display
modes and guidance. Presence 7, 193 – 203.
Lynch, K. (1960): The Image of the City. Cambridge, MA: MIT Press.
Meilinger, T. & Knauff, M. (2008): Ask for directions or use a map: A field experiment on spatial orientation
and wayfinding in an urban environment. Journal of Spatial Science 53, 13 – 24.
Meilinger, T., Knauff, M. & Bülthoff, H. H. (2008): Working memory in wayfinding – A dual task experiment in a virtual city. Cognitive Science 32, 755 – 770.
Presson, C. C. & Montello, D. R. (1988): Points of reference in spatial cognition: Stalking the elusive landmark. British Journal of Developmental Psychology 6, 378 – 381.
Robinson, J. O. (1972): The Psychology of Visual Illusions. London: Constable and Company.
Röser, F., Hamburger, K. & Knauff, M. (2011): The Giessen virtual environment laboratory: human wayfinding and landmark salience. Cognitive Processing 12, 209 – 214.
Sharps, M. J. & Wertheimer, M. (2000): Gestalt perspectives on cognitive science and on experimental psychology. Review of General Psychology 4, 315 – 336.
Spillmann, L. & Ehrenstein, W. H. (1996): From neuron to Gestalt: mechanisms of visual perception, in
Greger, R. & Windhorst, U. (eds.) (1996): Comprehensive Human Physiology, Vol. 1., 861 – 893. Heidelberg: Springer.
Steinman, R. M., Pizlo, Z. & Pizlo, F .J. (2000): Phi is not beta, and why Wertheimer’s discovery launched
the Gestalt revolution. Vision Research 40, 2257 – 2264.
Tranel, D., Damasio, H., Eichhorn, G. R., Grabowski, T., Ponto, L. L. B. & Hichwa, R. D. (2003): Neural
corre-lates of naming animals from their characteristic sounds. Neuropsychologia 41, 847 – 854.
Tse, P. U. (2004): Unser Ziel muss eine Gestalt-Neurowissenschaft sein. Gestalt Theory 26, 287 – 292.
Vidal., J. R., Ossandón, T., Jerbi, K., Dalal, S. S., Minotti, L., Ryvlin, P., Kahane, P. & Lachaux, J.-P. (2010):
Category-specific visual responses: An intercranial study comparing gamma, beta, alpha, and ERP response selectivity. Frontiers in Human Neuroscience 4, 195.
381
GESTALT THEORY, Vol. 33, No.3/4
von Ehrenfels, C. (1890): Über Gestaltqualitäten. Vjschr. wiss. Philos. 14, 249 – 292. Translation in Foundations of Gestalt Theory (B. Smith Ed.), Munich: Philosophia-Verlag, 1988. (pp 82 – 120)
Wertheimer, M. (1912): Experimentelle Studien über das Sehen von Bewegung. Zeitschrift für Psychologie
61, 161 – 265.
Kai Hamburger, born in 1977, received his diploma in Psychology from the University of Frankfurt in 2004
(former Max Wertheimer chair). Since 2003 he is collaborating with Prof. Dr. Lothar Spillmann (Freiburg).
2005-2007 scholarship in the graduate program “Neural Representation and Action Control – NeuroAct
(DFG 885/1) and PhD student in the research group Experimental Psychology University Giessen (Prof.
Karl R. Gegenfurtner, PhD); graduation in 2007 (dr. rer. nat.). Currently he is assistant professor in the research group Experimental Psychology and Cognitive Science at the University of Giessen (Prof. Dr. Markus
Knauff ). Main research topics are spatial cognition (human wayfinding) and visual illusions.
Adress: Experimental Psychology and Cognitive Science, Justus Liebig University Giessen,
Otto-Behaghel-Str. 10F, 35394 Giessen, Germany.
E-Mail: kai.hamburger@psychol.uni-giessen.de
Florian Röser, born in 1982, received his diploma in Psychology from the University of Trier in 2009. Since
2010 he is PhD student in the research group Experimental Psychology and Cognitive Science at the University of Giessen (Prof. Dr. Markus Knauff ). His research topics include landmarks, spatial orientation, and
wayfinding (in a project of Dr. Kai Hamburger and Prof. Dr. Markus Knauff ).
E-Mail: Florian.Roeser@psychol.uni-giessen.de
382
Download