Twedt Thesis

advertisement
Comparing Thompson’s Thatcher effect 1
Comparing Thompson’s Thatcher effect with faces and non-face objects
Elyssa L. Twedt
Thesis completed in partial fulfillment
of the requirements of the
Honors Program in Psychological Sciences
Under the Direction of Prof. Isabel Gauthier
Vanderbilt University
April 6, 2007
Approved
________________________
Date
_______________________
Comparing Thompson’s Thatcher effect 2
Acknowledgements
I would first like to thank my advisor, Isabel Gauthier for her commitment to this project. Her
advice and guidance throughout every stage have been invaluable, and her knowledge and
willingness to teach provided a wonderful learning experience.
I would also like to thank the entire Object Perception Lab, as well as Tom Palmeri’s lab, for all
of their helpful advice along the way. A special thank you is owed to the graduate students who
offered many hours of their time in order to assist, teach, and reassure me throughout the
process.
I also wish to thank David Sheinberg, at Brown University, for his ideas and valuable
contributions to this project.
Comparing Thompson’s Thatcher effect 3
Abstract
The classical Thatcher effect (TE) is experienced when global inversion of a face makes
it difficult to notice the local inversion of its parts (Thompson, 1980). The TE can be quantified
by comparing the ease with which observers compare a normal and locally transformed image,
when both images are shown upright versus inverted. Here we compared the classical TE for
images of adult faces to a wide variety of other categories, including grimacing faces, baby faces,
animal faces, buildings, scenes, and various types of letter-strings. If the TE reflects a special
form of configural processing for faces, faces should show a much larger TE than all other
categories. Error rates revealed larger TEs for letter-strings over all other categories. Within the
letter-string categories, words showed a larger TE than non-words and low frequency words
revealed a larger TE than high frequency words. Within the face categories, adult, grimacing,
and baby faces showed comparable TEs whereas animal faces showed the largest TE. For
objects, we observe TEs for all categories, but at smaller magnitudes. Our results suggest the TE
is not exclusive to faces - it does not appear to uniquely depend on factors such as expertise or
the grotesque appearance of the transformation.
Comparing Thompson’s Thatcher effect 4
Introduction
The classical Thatcher illusion, created by Peter Thompson (1980), has long been used to
demonstrate the importance of configural processing for upright faces, a strategy that does not
seem to be available for inverted faces. The Thatcher effect is experienced when local inversion
of the eyes and mouth is more difficult to detect when the face is globally inverted compared to
when the face is upright (see Figure 1). Thompson chose to locally invert the eyes and mouth
because these features are known to convey the most about a person’s expression (Thompson,
1980). He predicted that with global inversion, facial expression would be better preserved in the
Thatcherized face than the normal face. However, he found that local inversion of the eyes and
mouth did not make a significant difference in expression when compared to the normal face.
When both images were presented in the upright position, the Thatcherized face appeared
grotesque. Thus, two effects are observed with the Thatcher effect: First, global inversion makes
local changes difficult to detect. Second, the upright Thatcherized face appears grotesque.
This effect has been explained in a number of ways, most suggesting that the effect is
face-specific. The most prominent explanation is the use of different processing strategies when
viewing upright and inverted faces. Processing of upright faces is based on configural encoding
of individual features (Boutsen and Humphreys, 2003). Configural encoding means that a person
looks at the entire face while processing the spatial relationship between individual features to
create a holistic percept. When a face is Thatcherized, configural information is disrupted,
making the face appear grotesque (Bartlett & Searcy, 1993; Boutsen & Humphreys, 2003).
In contrast, processing of inverted faces is based on componential information, which
refers to encoding individual facial features (Boutsen & Humphreys, 2003). Because
Thatcherization disrupts configural but not componential processing, inverted face processing is
Comparing Thompson’s Thatcher effect 5
not impaired. When a face is globally inverted, it is more difficult to recognize because
configural processing is disrupted, making the relationship between internal and external features
less salient and local changes more difficult to detect (Rock, 1988). This is also known as the
face inversion effect (Bartlett & Searcy, 1993; Boutsen & Humphreys, 2003; Yin, 1969).
The Thatcher effect has also been explained in terms of encoding facial expression.
(Bartlett & Searcy, 1993). In an upright face, Thatcherization creates a grotesque expression that
is easily noticed as we focus on the image holistically. During inversion, it is more difficult to
recognize a face and encode facial expression, so global inversion of a Thatcherized face reduces
the perception of a grotesque expression (Bartlett & Searcy, 1993; Muskat, 1997).
Others suggest that a frame of reference is an important aspect to experiencing the
Thatcher effect (Parks, 1983; Rock, 1988; Valentine & Bruce, 1985). More specifically, an
object-centered frame of reference incorporates information about the spatial relationship of
internal parts of an object. A person gathers information from that spatial organization to assign
direction, or “topness” to individual features (Parks, 1983). When internal parts are locally
inverted, such as eyes and mouth, the orientation of those parts conflict with the orientation of
the rest of the object. This orientation mismatch is only noticed when the object is viewed in its
usual orientation. This is because, when inverted, a frame of reference becomes less powerful,
and assignment of “topness” is less clear, making local changes less salient. Valentine & Bruce
(1985) proposed that the facial frame of reference includes the relative position of the eyes and
mouth, although Parks (1983) argued that external features are also included. In addition to the
object-centered frame, Rock (1988) proposes a retinal factor which assigns direction to an object
relative to the environment, based on a person’s own perception of “up” and “down”. When a
Thatcherized face is upright, both frames of reference are in agreement so inversion of the mouth
Comparing Thompson’s Thatcher effect 6
and eyes is easily noticed. When inverted, the object-centered frame of reference is opposite
from the environmental frame so internal featural changes are less salient. Certain images are
more often viewed from one orientation (i.e., mono-oriented) such as faces and words. Inversion
may disrupt frames of reference for mono-oriented stimuli more severely than objects with
which we have more experience viewing from multiple angles, because the latter can more easily
be corrected with mental rotation (Rock, 1988). The more familiar we are with an object at a
specific orientation, the more inversion will disrupt object processing.
Based on the previous literature, the classical Thatcher effect has been attributed to
disruption of encoding processes when faces are locally and holistically inverted, familiarity with
faces and with their canonical orientation, a powerful frame of reference, and encoding of facial
expression. These explanations may help explain why we experience the Thatcher effect for
faces, but there is little empirical research to assess whether the Thatcher effect can be
experienced with non-face objects. We did encounter a demonstration of the Thatcher effect
using words. Parks (1983) locally inverted several letters of a word and then inverted the entire
word to obtain the same general effect observed with faces (see Figure 2). Parks speculated that
the Thatcher effect may not be specific to faces based on processing mechanisms, familiarity
and/or expression, but rather, the effect may depend on a powerful frame of reference, which
becomes less effective when inverted. Parks’ theory was based on a demonstration and not
experimental data, but his observation compelled us to further explore the Thatcher effect in
other objects, including various types of letter-strings.
The main goal of the present study was to determine whether the Thatcher effect can be
experienced in non-face objects. If so, we wanted to measure whether the Thatcher effect is
larger for faces than non-face categories. In order to approach these questions, we chose to
Comparing Thompson’s Thatcher effect 7
locally manipulate images from three broad categories: faces, words, and objects/scenes. Within
these three groups, we created subgroups such as animal faces, baby faces, cars, buildings, nonwords and low-frequency words. Because, to our knowledge, a study of this scope had not been
done, we chose categories that covered a wide range of individual object differences that we
could compare to each other, in hopes of discovering certain trends, biases, and explanations for
experiencing a Thatcher effect. We measured the degree to which “Thatcherization” (i.e., local
inversion of internal features) affected various object categories. By quantifying the Thatcher
effect, we were able to determine where faces lie relative to other object categories and test
hypotheses about what factors most influence the Thatcher effect. This helped us determine
whether faces are really in their “own category” (i.e., Thatcherization has a significantly larger
affect on faces than non-face object categories) or if non-face categories elicit similar or even
larger Thatcher effects.
In addition to our general question of whether the Thatcher effect is face-specific, we
hoped to better understand what factors influence the Thatcher effect. By reviewing the literature
on inversion effects in general, we were able to hypothesize possible reasons for experiencing a
Thatcher effect in some categories, but not others. One possibility is that familiarity with an
object category is important for detecting local changes. If the Thatcher effect depends on
familiarity with an object, we would expect that objects with which we are least experienced in
discriminating would elicit smaller Thatcher effects. For example, most people have less
experience discriminating animal and baby faces than adult faces so we would expect a larger
Thatcher effect for adult faces then animal faces. A second factor may be perception of
grotesqueness or bizarreness in an upright Thatcherized face. We would expect the size of a
Thatcher effect to increase as Thatcherized images appear more bizarre. (Note: Because
Comparing Thompson’s Thatcher effect 8
grotesqueness is a form of facial expression that can only be applied to faces, we will instead use
the term ‘bizarre’ to describe an image’s unusual appearance). That is, differences between a
very bizarre Thatcherized image and a normal image will be more easily detected, leading to a
more robust Thatcher effect. A third factor, is we may have more experience seeing non-face
objects (e.g., shoes, cup) in rotated orientations that deviate from their canonical orientation, so
that the frames of reference for non-face objects are less powerful and less impaired than faces
(Yin, 1969; Rock, 1973). If an object’s frame of reference is a good indicator for the Thatcher
effect, then faces and words, which are thought to have more stable frames of reference, will
elicit larger Thatcher effects than scenes or buildings. That is, we may have more experience
seeing local parts of scenes (e.g., a picture) from different angles, so local inversion may not
disrupt the overall appearance of the upright scene. If the Thatcher effect is face-specific, then
saliency of facial expression (i.e., grotesque appearance), a special form of configural
processing, and familiarity with faces may be important criteria for experiencing the Thatcher
effect. However, if non-face categories show significant Thatcher effects, we may need to
rethink why the Thatcher effect is experienced and reevaluate the claim that faces are special
with respect to this effect.
Methods
Apparatus and Stimuli
All experiments were run on a Power Mac G3 using Matlab OS9. Stimuli consisted of
images from twelve different object categories: adult faces, grimacing faces, baby faces, animal
faces, buildings, cars, close-up scenes, large scenes, high-frequency (HF) words, low-frequency
(LF) words, HF non-words, and LF non-words. Figure 1 shows examples of normal and
Thatcherized images for each object category, except letter-strings. There were 10 images in
Comparing Thompson’s Thatcher effect 9
each category and images were collected from various Internet and image bank sources. 8-letter
words were chosen from the MRC Psycholinguistic Database, which generates a list of words
based on a set of criteria (i.e., word length and Kucera-Francis written frequency). LF words had
a frequency of 1 and HF words had a frequency between 204 and 392 (see Coltheart, 1981 for
user guidelines). For non-words, we transposed internal adjacent letters while keeping the first
and last letters in the same position. That is, we switched the position of the second and third
letters, the fourth and fifth letters, etc. Figure 2 shows letter-string examples and includes a
complete list of the words used in all letter-string categories to illustrate how non-words were
derived.
Each object was centered within a white area of 250 pixels wide x 250 pixels high.
Stimuli were presented at a screen resolution of 1280 x 950 pixels. Objects were manipulated
using Adobe Photoshop 7.0 to produce “Thatcherized” images by locally inverting parts of each
object (e.g., invert 2 letters of a word). We created 2 levels of transformation for each image.
The first level, Thatcherized 1, was created by only inverting one part of each object. The second
level, Thatcherized 2, was created by inverting two parts of each object. We created two levels of
transformation in case we obtained a ceiling on the TE for accuracy with which changes were
detected in inverted pairs. We could then match conditions in terms of performance with inverted
pairs. Thus, each image had three versions: normal, Thatcherized 1, and Thatcherized 2. For each
image category, we tried to locally manipulate the same parts for each object. For example, we
always inverted the eyes for ‘Thatcherized 1’ faces and we inverted the eyes and mouth for
‘Thatcherized 2’ faces. Buildings, close-up scenes, and large scenes contained a lot of variability
between images thus making it difficult to make uniform changes across the entire category.
Comparing Thompson’s Thatcher effect 10
While trying to manipulate similar features in each image (e.g., building windows and doors;
cups) features may have varied in location.
Experiment 1: Pilot Study
This pilot experiment was designed to measure the average size of a Thatcher effect for
each category and compare those effects across categories. Image pairs were presented
simultaneously and participants made a same/different judgment. We measured the size of the
Thatcher effect by finding the difference in accuracy and reaction time for determining that an
image pair was different when upright compared to when the pair was inverted.
Participants
20 undergraduate students from Vanderbilt University (7 women and 13 men)
volunteered to participate in the experiment in exchange for course credit. All participants
reported normal or corrected-to-normal vision.
Design
Four factors were manipulated within-subjects: orientation (upright/inverted), level of
transformation (normal, Thatcherized 1, Thatcherized 2), category, and trial type
(same/different). Both images in each pair had the same identity. That is, both images
represented the same person, word or object and were presented in the same global orientation.
On same trials, either both images were normal or both images were Thatcherized at the same
level of transformation. On different trials, a normal image was paired with either a Thatcherized
1 image or Thatcherized 2 image. For different trials, the position of each image was
counterbalanced, so that on half of the trials the normal image appeared on the left and on half
the trials the normal image appeared on the right. All image pairs were presented in both the
upright and inverted orientations.
Comparing Thompson’s Thatcher effect 11
Procedure
Participants judged whether two images in a pair were the same (‘1’) or different (‘2’).
Participants were told that two images of the same identity would appear side-by-side in the
same orientation. Image pairs remained on the screen until a response was made and no feedback
was given. We stressed the importance of responding as quickly and accurately as possible.
There were a total of 1200 trials broken into 6 blocks of 200 trials each. Participants were offered
a short break in between each block. Reaction time and accuracy were measured. Each session
took about 90 minutes to complete.
Results and Discussion
Results are shown in Figure 3. We computed mean delta (upright-inverted) accuracy and
mean delta reaction time for correct trials across all object categories and used these measures to
operationally define the size of a Thatcher effect. We focused our analysis on different trials.
Both analyses revealed significant Thatcher effects in all non-face categories, for at least one
level of transformation, suggesting that the Thatcher effect is not unique to faces. For accuracy,
adult faces did not have the most robust Thatcher effect, but rather, LF words and LF non-words
at the Thatcherized 1 level had the largest Thatcher effect. In this task, we found a lot of
variability within both reaction time and accuracy results across object categories. For example,
LF words (level 1) and LF non-words (level 1) showed comparable Thatcher effects for
accuracy, but LF words showed a much larger Thatcher effect than LF non-words for reaction
time. It was difficult to equate all categories using two different dependent measures, especially
since we found a lot of variability among categories. Because error rates were very high, it did
not make sense to focus analysis on reaction time. Thus, we decided to revise and improve this
task in order to make our results easier to interpret and reduce any speed-accuracy trade-offs.
Comparing Thompson’s Thatcher effect 12
Experiment 2: Same-Different Task 2
In this experiment, we focused our analysis on accuracy and altered the design of
Experiment 1 to reduce reaction time variability. We also changed images in the letter-string
categories so that all letters were lowercase, in contrast to Experiment 1, where each letter-string
was capitalized. The ultimate goals of this experiment were to determine whether the Thatcher
effect could be experienced in non-face objects, determine which categories yielded the most
robust Thatcher effects, and compare Thatcher effects between categories to gain insight on what
influences the Thatcher effect.
Participants
21 undergraduate students from Vanderbilt University (15 women and 6 men)
volunteered to participate in the experiment in exchange for course credit or $18 cash payment.
All participants had normal or corrected-to-normal vision and had not participated in Experiment
1.
Design
The design was similar to Experiment 1 except images in a pair were presented
sequentially. For different trials, the presentation order of images was counterbalanced, so that
on half of the trials the normal image was presented first, and on half the trials the normal image
was presented second. All image pairs were presented in both the upright and inverted position.
Procedure
Participants judged whether two images in a pair were the same (‘1’) or different (‘2’).
We stressed the importance of responding as quickly and accurately as possible. Each trial
consisted of a fixation cross, presentation of Image 1 (750 ms), a mask (300 ms), presentation of
Image 2 (750 ms), and a white screen (2250 ms). Participants were told that they could make a
Comparing Thompson’s Thatcher effect 13
response from the onset of Image 2 up until three seconds. Participants heard a tone if they
responded incorrectly or if they did not respond within the three-second time constraint. Trials
were broken into 12 blocks of 100 trials each and participants were offered a short break in
between each block. Each session took about 75 minutes to complete.
Results and Discussion
After initial data analysis, we noticed a bias to respond ‘same’ across object categories.
Thus, we calculated d’ and took a difference of d’ scores in order to correct for this bias in our
results. Delta d’ (upright trials minus inverted trials) served as our measure for size of Thatcher
effect. We focused our analysis on sensitivity results and did not include an analysis of reaction
time due to high error rates, which would make interpretation difficult. Figure 4 shows a
difference in sensitivity for deciding that two images in a pair are different during upright trials
versus inverted trials.
Our results indicated significant (i.e., relative to zero) Thatcher effects for all object
categories for at least one level of transformation. An ANOVA comparing all object categories
revealed a significant main effect for category, F(11, 120) = 13.201, p = 0.0001, and a significant
interaction between category and level of transformation, F(11, 220) = 3.2128, p = 0.0004. LF
words had the largest Thatcher effect (M = 1.68) over all other categories, followed by HF words
(M = 1.376), and animal faces (M = 0.947), respectively. Because it was difficult to directly
compare all of our categories, we decided not to look at this interaction further, but instead look
at differences within each of the three subgroups (i.e., face, objects/scenes, letter-strings). This
would allow us to better understand why the Thatcher effect is experienced in general, and in
particular, why the Thatcher effect is experienced to a larger extent in some objects than others.
Comparing Thompson’s Thatcher effect 14
Face categories. Because prior literature focused on the Thatcher effect in faces, we
manipulated different categories of faces that varied in age, specie, familiarity and facial
expression (i.e., pleasant or grimacing). If the Thatcher effect depends on familiarity, we would
expect adult faces to have the largest Thatcher effect. Categories for which we are least
experienced in discriminating, such as animal and baby faces, would have the smallest Thatcher
effect. If perception of a bizarre expression is important, then grimacing faces, which are fairly
bizarre without Thatcherization, should show a smaller Thatcher effect because the Thatcherized
and normal versions are more similar, making local changes less salient.
Thatcherized 2 animal faces showed the largest Thatcher effect over all face categories.
For level 1, faces and grimacing faces revealed comparable Thatcher effects and were larger than
baby and animal faces. For level 2, animal faces had the largest Thatcher effect over all other
face categories, whereas adult faces elicited the smallest Thatcher effect. An ANOVA revealed a
significant interaction between level of transformation and category, F(3, 60) = 7.5493, p =
0.0002. Post hoc tests showed that the Thatcher effect was larger for animal and baby faces at
level 2 transformation, whereas faces and grimacing faces had a larger Thatcher effect at level 1
transformation. Main effects for level and category were not significant. These results suggest
that familiarity is not a primary predictor of the Thatcher effect. We do see that transformation is
more salient in faces and grimacing faces since only the eyes need to be locally inverted to yield
the same size Thatcher effect as baby faces when two parts are inverted. Also, the Thatcher
effect for animal faces is much smaller when only the eyes are inverted, but very large when both
the eyes and mouth are inverted. It is interesting that the Thatcher effect for grimacing faces was
comparable to adult and baby faces, which casts doubt on the idea that a bizarre expression in a
Thatcherized face is important to detecting changes from an unaltered face.
Comparing Thompson’s Thatcher effect 15
Word categories. In analyzing the word categories, we were interested in whether word
frequency and/or lexical status of a word influenced the Thatcher effect. This would suggest that
familiarity is important and we would expect larger Thatcher effects for high frequency over low
frequency letter-strings and for words over non-words. An ANOVA revealed a main effect for
word type, F(1, 20) = 30.72, p = 0.0001, and frequency, F(1, 20) = 9.464, p = 0.006. Words
showed a significantly larger Thatcher effect than non-words and LF words showed a
significantly larger Thatcher effect than HF words. The interaction between level of
transformation and frequency was not significant, F(1, 20) = 2.9525, p = 0.1012, but did show a
trend for larger Thatcher effects in Thatcherized 1 words over Thatcherized 2 words. The main
effect for words suggests that familiarity may be important, but the main effect for frequency
contradicts this idea. To further explore the letter-string results, it is beneficial to look at d’
results separately for words and non-words at both the upright and inverted orientations. Figure 5
shows d’ results for all letter-string categories and delta d’ results for size of Thatcher effect.
Words and non-words are different when upright but are almost identical when inverted. This
suggests that certain cues or processing strategies are being used when viewing inverted words,
which makes them more similar to non-words or upright words. We also observe a frequency
effect for inverted letter-strings but not for upright letter-strings. Exploring this issue we found a
ceiling effect for accuracy on upright word trials.
Object/Scene categories. By choosing non-face, non-word objects, we hoped to better
understand if the Thatcher effect is restricted to certain categories. That is, if the Thatcher effect
depends on a powerful frame of reference or familiarity, objects and scenes may not show as
large of a Thatcher effect as faces and words. We may have more experience in seeing these
objects from different angles and rotations so that Thatcherization does not impair object
Comparing Thompson’s Thatcher effect 16
processing. An ANOVA analysis revealed significant Thatcher effects (for at least one level of
transformation) for all categories but at lower magnitudes than faces and letter-string categories.
Buildings, cars and close-up scenes did not show significant Thatcher effects at level 1, but did at
level 2. The main effect for level of transformation was significant, F(1, 20) = 4.3885, p =
0.0491, but the main effect for category did not reach significance. Object categories showed
larger Thatcher effects for Thatcherized 2 images than Thatcherized 1 images, except in scenes.
These results suggest that the Thatcher effect can be experienced with objects/scenes, although
the effect is not as robust as with faces or letter-strings. This could either be due to greater
difficulty with detecting changes in upright objects/scenes than in faces or a smaller inversion
effect for objects/scenes than faces. To assess these possibilities, it is useful to look at d’ values
separately for upright and inverted objects/scenes (see Appendix A). For upright cars, d’ is
similar to upright animal faces, but the difference between upright and inverted for animal faces
is much greater than for cars. We find a similar comparison between close-up scenes and animal
faces. This suggests that changes are not more difficult to detect in objects/scenes, but there is a
smaller inversion effect, which leads to a smaller Thatcher effect.
Image Ratings
One aspect of the Thatcher effect is that a Thatcherized face appears grotesque when
upright but not inverted. This has been explained because of disruption of configural processing
due to local inversion, the grotesque appearance of an upright face, and failure to encode facial
expression in an inverted face. One of our goals was to explain why the Thatcher effect is more
powerful for certain image categories. One hypothesis was that perceived bizarreness facilitates
the size of the Thatcher effect so that the more bizarre an image appears when upright, the larger
the Thatcher effect. That is, two upright images will be much easier to discriminate during a
Comparing Thompson’s Thatcher effect 17
same-different judgment if one is very bizarre so fewer errors will occur for upright judgments.
When inverted, this discrimination will be more difficult, and we will see a larger difference
between upright and inverted images for that category, compared to images that are not rated
highly bizarre.
To address this idea, we asked 26 Vanderbilt undergraduate students (20 women and 6
men) to rate how bizarre each upright image appeared on a scale of 1 to 7 (1 = normal; 7 = very
bizarre). Participants had not completed the previous experiments. We obtained separate ratings
for image sets used in Experiment 1 and Experiment 2 because some images in Experiment 2
were altered. Because we focused our analysis on the results from Experiment 2, we will report
the ratings that correspond to that task.
Design
Participants were in one of three possible conditions where each condition rated a
different set of images. That is, all participants rated each image but at only one level of
transformation. There were 9 subjects in both Conditions 1 and 2; there were 8 subjects in
condition 3. Image version was randomized and counterbalanced so that each condition rated
approximately the same number of images at each level of transformation. Images were
randomly presented in isolation in only the upright position for a total of 120 images in each
condition.
Procedure
Participants were shown a series of images that varied in bizarreness. Participants were
asked to rate, on a scale of 1-7, how bizarre each image appeared, relative to the objects natural
appearance (1 = normal, 7 = high level of bizarreness). Images remained on the screen until a
response was made. Each session took about 10 minutes to complete.
Comparing Thompson’s Thatcher effect 18
Results and Discussion
Figure 6 shows average bizarreness ratings for each object category at both Thatcherized
1 and Thatcherized 2 levels of transformation. As expected, Thatcherized 2 images were rated as
more bizarre for all categories. Adult faces, grimacing faces, and baby faces were rated as most
bizarre (M = 5.87, M = 5.83, and M = 5.68, respectively) for level 2 transformation. Words were
also rated as highly bizarre, especially HF and LF non-words (M = 4.81, M = 5.00, respectively),
which was expected.
To estimate the influence of perceived bizarreness on experience of the Thatcher effect,
we correlated mean bizarreness ratings with the Thatcher effect for each category. Our
correlation results (see Figure 7) revealed that bizarreness rating and Thatcher effect magnitude
were positively correlated (r = 0.81) if we do not include letter-string categories in the analysis.
We excluded letter-strings because it was expected that non-words would always be rated as
more bizarre than words and other objects. At first glance, there seems to be a very strong
correlation so that as images are rated as more bizarre, we obtain a larger Thatcher effect.
However, we must note that if we analyze face categories separately from object/scene
categories, each analysis reveals a correlation in the opposite direction. That is, for face
categories, as bizarreness ratings decrease, the Thatcher effect increases. For object/scene
categories, as bizarreness ratings increase, the Thatcher effect increases, although these
categories do not follow a very definitive pattern. Therefore, it is difficult to determine the extent
to which bizarreness influences the Thatcher effect based on this correlation. Recall that
grimacing faces showed a comparable Thatcher effect to adult faces. This was contrary to our
prediction that grimacing faces should show a smaller Thatcher effect than adult faces because
the Thatcherized and normal versions of a grimacing face are more similar, making local
Comparing Thompson’s Thatcher effect 19
changes difficult to detect. This result suggests that bizarreness is not a necessary predictor of the
Thatcher effect.
Orientation Judgment Task
A second prediction for why the Thatcher effect occurs is greater experience or
familiarity with an object at a given orientation. To test this hypothesis, we measured familiarity
with a given orientation for each category by determining the speed at which an observer can
determine an object’s orientation. If familiarity is important, we would expect faster response
times for orientation judgments to correlate with larger Thatcher effects for that category.
31 undergraduate students from Vanderbilt University (17 women and 14 men) judged
whether the image presented was upright (‘1’) or inverted (‘2’). All images, at each level of
transformation (normal, Thatcherized 1, Thatcherized 2) were presented in both the upright and
inverted orientation. Participants had not completed the previous experiments.
Results and Discussion
We focused our analysis on normal trials and found the average reaction time to make an
orientation judgment for each category. Figure 8 shows the correlation between orientation
judgment RT and size of Thatcher effect for each category. There was no significant correlation
between orientation reaction time and Thatcher effects (r = -0.113). Our measure of familiarity
may need improvement and could be explored in future research. However, recall that our results
from Experiment 2 showed that animals, which we are assumed to be less familiar with than
faces, have a larger Thatcher effect than faces. In addition, LF letter-strings had a larger Thatcher
effect than HF letter-strings, contrary to our predictions based on familiarity. Therefore,
familiarity with an object category is not a necessary predictor of the Thatcher effect.
General Discussion
Comparing Thompson’s Thatcher effect 20
Our results from Experiment 1 suggest that the Thatcher effect is not exclusive to faces. It
does not appear to uniquely depend on factors such as expertise or the grotesque appearance of
local transformations. We obtained significant Thatcher effects for all object categories and
found that faces did not show the largest Thatcher effect. Therefore, we must reassess reasons for
experiencing the Thatcher effect, as it cannot solely be based on configural processing, grotesque
expression, or expertise. Our results from bizarreness ratings indicate that while images that were
rated as highly bizarre led to larger Thatcher effects (e.g., faces), a bizarre appearance was not
necessary to obtain a significant Thatcher effect, as evidenced by cars, buildings, and scenes.
Perception of bizarreness may help explain why face categories have larger Thatcher effects than
object/scene categories in general, but we can obtain a Thatcher effect without the Thatcherized
image looking bizarre. Therefore, bizarreness may just strengthen the Thatcher effect, rather than
explain the effect.
Our orientation judgment results were not significant and we may need to find a better
measure of object familiarity. Perhaps, we could test experts for certain categories (i.e., cars),
assuming they are very familiar with that category, and see whether the Thatcher effect for that
category increases with expertise. In a general sense, face and letter-string categories had larger
Thatcher effects than objects/scenes. This may be due to larger variability within the
objects/scenes manipulations, but also could be accounted for by the fact that faces and words
are encountered more often and are usually viewed in the upright orientation, meaning that their
frame of reference has a greater influence on experience of the Thatcher effect.
Our results for the letter-string categories deserve extended attention. An ANOVA
revealed a main effect for both word type and word frequency, yet each finding seems to
contradict the other. On one hand, since words yield a larger Thatcher effect than non-words, we
Comparing Thompson’s Thatcher effect 21
could claim that familiarity is playing a role. However, the Thatcher effect for LF letter-strings is
larger than for HF letter-strings, which negates this idea. Referring back to Figure 5 can help
make things more clear as we speculate on these findings. First, non-words and words were very
similar when inverted for both HF and LF strings, but were different when upright. This result
led us to question whether inverted letter-strings are processed more like words or non-words.
An experiment by Navon & Raveh (2005) on inverted word processing sheds some light on this
issue. They suggested that there is one identification process for upright and inverted words that
uses a set of cues such as letter direction and spatial relationship between the reflected and
adjacent letters. These cues may be disrupted during inversion and letter reflection. They
proposed that inverted words are processed using a rectification strategy, or mental rotation,
which normalizes the inverted word so it can be processed more like an upright word. They do
not conclude whether this is a global or letter-by-letter process, but we can speculate on this idea.
If inverted words use a letter-by-letter rectification process, and non-words are processed letterby-letter when upright and inverted, then this could lead to similar error rates as evidenced by
similar d’ values. When upright, non-words are still processed letter-by-letter but words are
processed in a more global manner, leading to differences in d’.
The frequency effect occurs for both words and non-words and this may be explained
based on how our non-words were created. Recall that we created the non-words by transposing
adjacent interior letters in words. Therefore, our non-words were quite similar to their base
words because all letters were the same, just rearranged, so orthographic information was
preserved. Participants could be accessing information from the base word to make non-word
judgments, which is also why we do not observe a frequency effect for non-words when upright.
Perea and Rosa (2005) suggest that HF non-words generate more activation for lexical decisions
Comparing Thompson’s Thatcher effect 22
during early word processing than LF non-words making it easier to maintain grapheme
information. Since there is less activation of orthographic information in LF non-words, more
mistakes are made.
In summary, the present study went beyond previous demonstrations of the classical
Thatcher illusion by quantifying the Thatcher effect and making comparisons between a wide
variety of object categories. Our results led us to conclude that the Thatcher effect is not facespecific and it cannot uniquely be explained by familiarity or bizarreness. We may speculate that
the frame of reference hypothesis helps to explain why faces and letter-strings show a larger
Thatcher effect than objects and scenes. While the Thatcher effect does not seem to depend on
expertise with a specific category, as evidenced by a larger Thatcher effect for animals than adult
faces, it may be that the frame of reference which is learned for faces generalizes to similar
stimuli such as baby and animal faces. The same can be said for words in that the learned frame
of reference for words may generalize to non-words. Future studies could explore this strategy in
particular.
Comparing Thompson’s Thatcher effect 23
References
Bartlett, J. C., & Searcy, J. (1993). Inversion and configuration of faces. Cognitive
Psychology, 25, 281-316.
Boutsen, L., & Humphreys, G. W. (2003). The effect of inversion on the encoding of
normal and “Thatcherized” faces. Quarterly Journal of Experimental Psychology Section
A – Human Experimental Psychology, 56, 955-975.
Coltheart, M. (1981). The MRC Psycholinguistic Database. The Quarterly Journal of
Experimental Psychology Section A - Human Experimental Psychology, 33, 497-505.
Muskat, J. A., & Sjoberg, W. G. (1997). Inversion and the Thatcher illusion in
recognition of emotional expression. Perception and Motor Skills, 85, 1262.
Navon, D. & Raveh, O. (2004). On the processing of recognizing inverted words: Does it
rely only on orientation-invariant cues? Memory and Cognition, 32, 1103-1117.
Parks, T. E. (1983). Letter to the Editor. Perception, 12, 88.
Perea, M., et al. (2005). The frequency effect for pseudowords in the lexical decision task.
Perception & Psychophysics, 67, 301-314.
Rock, I. (1988). On Thompson’s inverted-face phenomenon. Perception, 17, 815-817.
Thompson, P. (1980). Margaret Thatcher: A new illusion. Perception, 9, 483-484.
Valentine, T., & Bruce, V. (1985). What’s up? The Margaret Thatcher illusion revisited.
Perception, 14, 515-516.
Yin, R. K. (1969). Looking at Upside-Down Faces. Journal of Experimental Psychology, 81,
141-145.
Comparing Thompson’s Thatcher effect 24
Figure Captions
Figure 1. Examples of Thatcherization for each object category (except words). In all image
groups, the left-hand image is normal, the middle image has been Thatcherized by locally
inverting one part (level 1), and the right-hand image has been Thatcherized by locally inverting
two parts (level 2). The first pair represents the classical Thatcher illusion where the eyes and
mouth have been locally inverted. These changes are more difficult to detect when globally
inverted, but rotating the page 180 will make these changes more obvious.
Figure 2. Example of Thatcherization for LF word (top) and HF non-word (bottom). Full list of
word and non-word stimuli. List includes 40 words with equal number of words in each letterstring category. Letters in bold were inverted to create Thatcherized images. The letter that was
inverted for Thatcherized level 1 is underlined.
Figure 3. Accuracy and reaction time results for Experiment 1. The top graph shows mean delta
(upright minus inverted) accuracy for all object categories. The bottom graph shows mean delta
(upright minus inverted) correct reaction times for all object categories. Results are for different
trials only.
Figure 4. Delta d’ results for Experiment 2. Measure of size of Thatcher effect across all object
categories at both level 1 and level 2 transformation. Delta d’ was calculated by subtracting d’
for upright trials minus d’ for inverted trials. ‘x’ represents insignificant TE.
Figure 5. d’ results for letter-string categories. Compares d’ values separately for inverted and
upright letter-strings and also includes delta d’ for HF and LF letter-strings.
Figure 6. Mean bizarreness ratings for each object category at level 1 and level 2
transformations. Images were rated on a scale of 1 (normal) to 7 (very bizarre) and ratings were
averaged across object.
Comparing Thompson’s Thatcher effect 25
Figure 7. Correlation between level of perceived bizarreness and Thatcher effect averaged across
individual images. Correlation results exclude letter-strings. r = 0.81.
Figure 8. Correlation between mean RT for orientation judgment and Thatcher effect at both
Thatcherized 1 and 2 levels. r =-0.113.
Comparing Thompson’s Thatcher effect 26
Figure 1.
Face
Grimacing
Baby
Animal
Comparing Thompson’s Thatcher effect 27
Building
Car
Close-up Scene
Scene
Comparing Thompson’s Thatcher effect 28
Figure 2.
Words High Frequency Non-words High Frequency
evidence
military
question
together
anything
although
interest
children
national
position
eivedcne
mlitiray
qeutsoin
tgoteehr
aynhtnig
atlohguh
itnreset
cihdlern
ntaoianl
psotioin
Words Low Frequency
Non-words Low Frequency
primates
blatancy
overfeed
marathon
fracture
engraver
beautify
artifice
navigate
wanderer
piramets
balatcny
oevfreed
mrataohn
fartcrue
egnarevr
baetufiy
atrficie
nvagitae
wnaederr
Comparing Thompson’s Thatcher effect 29
Figure 3.
0.4
0.2
0.1
-200
-400
-600
-800
-1000
-1200
Nonword_LF
Nonword_HF
Word_LF
Word_HF
Scene
Close-up
scene
Car
Building
Animal
Baby
Grimacing
0
0
Mean Delta Correct RTs (ms)
Thatcherized Level 1
Thatcherized Level 2
0.3
Face
Mean Delta Accuracy (Up-Inv)
Experiment 1 Results
Comparing Thompson’s Thatcher effect 30
Figure 4.
Size of Thatcher Effect
Thatcherized Level 1
Thatcherized Level 2
1.5
1
x
0.5
x
x
en
e
Sc
en
e
or
d_
W HF
or
N
on d_
L
w
or F
d_
N
on
H
F
w
or
d_
LF
W
-u
p
sc
Ca
r
in
g
Cl
os
e
Bu
ild
Ba
by
An
im
al
rim
ac
in
g
Fa
ce
0
G
Thatcher Effect (Delta d')
2
Comparing Thompson’s Thatcher effect 31
Figure 5.
Letter-Strings
Non-Words
d' (or delta d' for TE)
3.5
Words
3
2.5
2
1.5
1
0.5
0
HF
Upright
HF
Inverted
HF
Thatcher
Effect
LF
Upright
LF
Inverted
LF
Thatcher
Effect
Nonword_LF
Nonword_HF
7
Word_LF
Word_HF
Scene
Close-up
scene
Car
Building
Animal
Baby
Grimacing
Face
Mean Bizarreness Ratings - Scale
of
Comparing Thompson’s Thatcher effect 32
Figure 6.
Mean Bizarreness Ratings
Thatcherized 1
Thatcherized 2
6
5
4
3
2
1
0
Comparing Thompson’s Thatcher effect 33
Figure 7.
Correlation between bizarreness rating and Thatcher effect magnitude
r = 0.813
6
Face2
Grimacing2
Baby2
5
Face1
Baby1
Mean Bizarreness Rating
Grimacing1
4
Animal2
Close-Up Scene2
Car2
Building2
3
Animal1
Building1
Close-Up Scene1
Scene2
Car1
2
Scene1
1
0
0.2
0.4
0.6
0.8
Thatcher Ef f ect Magnitude
1
1.2
1.4
Comparing Thompson’s Thatcher effect 34
Figure 8.
Correlation between RT for Orientation Judgment and Thatcher effect magnitude
r = -0.113
900
Mean RT (ms)
850
800
750
700
0
0.5
1
1.5
Thatcher Ef f ect Magnitude
2
2.5
Comparing Thompson’s Thatcher effect 35
Appendix A:
d’ for upright and inverted (Thatcherized 2 level)
Dprime Upright
Dprime Inverted
3.5
3
2.5
d'
2
1.5
1
0.5
en
e
Sc
en
e
or
d_
W HF
or
N
on d_
L
w
or F
d_
N
on
H
F
w
or
d_
LF
W
p
sc
Ca
r
in
g
Bu
ild
Cl
os
eU
G
rim
Fa
ce
ac
in
g
Ba
by
An
im
al
0
Comparing Thompson’s Thatcher effect 36
Appendix B:
d’ values for upright and inverted (Thatcherized 1 Level)
Category
Face, Inverted
Face, Upright
Grimacing, Inverted
Grimacing, Upright
Baby, Inverted
Baby, Upright
Animal, Inverted
Animal, Upright
Count
21
21
21
21
21
21
21
21
HITS
0.86
0.93
0.90
0.92
0.86
0.92
0.90
0.90
SE
0.02
0.01
0.01
0.01
0.02
0.01
0.01
0.01
FA
0.27
0.11
0.48
0.19
0.30
0.15
0.64
0.44
SE
0.04
0.02
0.04
0.03
0.05
0.02
0.04
0.05
Dprime
1.81
2.83
1.38
2.44
1.80
2.67
0.96
1.56
SE
0.13
0.12
0.13
0.11
0.19
0.14
0.15
0.18
Building, Inverted
Building, Upright
Car, Inverted
Car, Upright
Close-Up Scene, Inverted
Close-Up Scene, Upright
Scene, Inverted
Scene, Upright
21
21
21
21
21
21
21
21
0.88
0.87
0.87
0.87
0.91
0.94
0.85
0.86
0.02
0.01
0.02
0.02
0.01
0.01
0.02
0.02
0.58
0.46
0.55
0.50
0.38
0.42
0.76
0.71
0.04
0.03
0.05
0.06
0.04
0.04
0.03
0.03
1.01
1.24
1.08
1.21
1.76
1.89
0.30
0.62
0.11
0.09
0.20
0.21
0.14
0.14
0.11
0.10
Word_HF, Inverted
Word_HF, Upright
Word_LF, Inverted
Word_LF, Upright
Nonword_HF, Inverted
Nonword_HF, Upright
Nonword_LF, Inverted
Nonword_LF, Upright
21
21
21
21
21
21
21
21
0.87
0.91
0.87
0.90
0.90
0.83
0.87
0.85
0.02
0.01
0.02
0.01
0.02
0.02
0.02
0.02
0.59
0.17
0.71
0.18
0.61
0.25
0.70
0.31
0.04
0.02
0.03
0.02
0.03
0.03
0.03
0.03
0.98
2.45
0.70
2.30
1.06
1.76
0.61
1.65
0.11
0.12
0.12
0.12
0.11
0.11
0.14
0.15
SE = Standard Error
FA = False Alarm
Comparing Thompson’s Thatcher effect 37
Appendix C:
d’ values for upright and inverted (Thatcherized 2 Level)
Category
Face, Inverted
Face, Upright
Grimacing, Inverted
Grimacing, Upright
Baby, Inverted
Baby, Upright
Animal, Inverted
Animal, Upright
Count
21
21
21
21
21
21
21
21
HITS
0.86
0.93
0.90
0.92
0.86
0.92
0.90
0.90
SE
0.02
0.01
0.01
0.01
0.02
0.01
0.01
0.01
FA
0.18
0.09
0.28
0.09
0.28
0.11
0.59
0.21
SE
0.03
0.01
0.03
0.01
0.04
0.02
0.05
0.04
Dprime
2.19
2.88
1.97
2.81
1.84
2.84
1.06
2.35
SE
0.13
0.08
0.12
0.09
0.16
0.16
0.15
0.17
Building, Inverted
Building, Upright
Car, Inverted
Car, Upright
Close-Up Scene, Inverted
Close-Up Scene, Upright
Scene, Inverted
Scene, Upright
21
21
21
21
21
21
21
21
0.88
0.87
0.87
0.87
0.91
0.94
0.85
0.86
0.02
0.01
0.02
0.02
0.01
0.01
0.02
0.02
0.53
0.34
0.30
0.16
0.29
0.28
0.63
0.58
0.04
0.04
0.03
0.02
0.03
0.03
0.04
0.04
1.15
1.62
1.85
2.30
2.02
2.26
0.69
0.98
0.12
0.12
0.16
0.15
0.11
0.10
0.10
0.13
Word_HF, Inverted
Word_HF, Upright
Word_LF, Inverted
Word_LF, Upright
Nonword_HF, Inverted
Nonword_HF, Upright
Nonword_LF, Inverted
Nonword_LF, Upright
21
21
21
21
21
21
21
21
0.87
0.91
0.87
0.90
0.90
0.83
0.87
0.85
0.02
0.01
0.02
0.01
0.02
0.02
0.02
0.02
0.45
0.13
0.59
0.08
0.45
0.20
0.56
0.20
0.04
0.02
0.04
0.01
0.04
0.03
0.04
0.03
1.34
2.63
1.02
2.79
1.52
1.97
1.03
2.03
0.11
0.14
0.11
0.13
0.13
0.15
0.14
0.17
SE = Standard Error
FA = False Alarm
Download