big five article and questions

We took the world’s most scientific personality test—
and discovered unexpectedly sexist results
By Olivia Goldhill February 11, 2018
Personality tests are both incredibly popular and largely bogus. BuzzFeed made its name in part
by publishing quizzes telling readers which ‘90s kid they are, which Friends character they
are, which Disney princess they are, and…well…which Disney princess they are, really. None of
these have any scientific basis. Then there’s the somewhat more reputable Myers-Briggs test,
inspired by Jungian theories about personality types. Some 2.5 million people take it every year,
and 88% of Fortune 500 companies use it. Despite its reputation, however, the Myers-Briggs
has poor scientific validity.
There is one personality test that is far and away more scientifically valid than any of the others:
the “Big Five.”
The Big Five evaluates personality by measuring—as the name suggests—five personality traits:
openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism, each
on a continuous scale. Studies have shown it that it effectively predicts behavior, and the test is
often used in academic psychological personality research. People who score higher in
conscientiousness tend to work harder, for example, while more neurotic personalities are more
prone to anxiety and depression.
Despite its scientific validity, and even with the contemporary fascination with personality tests,
the Big Five is relatively unpopular outside of academia. A recent FiveThirtyEight article on the
subject suggested that personality scientists haven’t effectively marketed the one credible
personality test.
But there are serious concerns not just with the marketing of the test, but with how it’s
presented to a public audience. Despite the scientific rigor around the Big Five in academia,
many online versions of the test are designed to give sexist results.
The origins of the Big Five
In the 1970s, two separate research teams found ways of evaluating personality according to the
Big Five traits, and created their own tests. Paul Costa and Robert McCrae at the National
Institutes of Health created the NEO Personality Inventory, while Lewis Goldberg at the Oregon
Research Institute created the IPIP-NEO inventory. (Both have been refined and updated in the
years since.) In 1998, Oliver John from Berkeley Personality Lab and Verónica Benet-Martinez,
psychology professor at University of California at Davis, created the 44-item “Big Five
Inventory” (BFI). These three scales are all scientifically validated and widely used in academic
research into personality. The Big Five Inventory and the longer, more nuanced IPIP-NEO are
both also freely available online.
On the sites linked above, test-takers are asked to enter their gender before getting the results—
and the response significantly impacts the interpretation of the test results. Depending on
whether you say you are “male” or” female,” the exact same answers produce very different
personality assessments. Crucially, women are told they’re significantly more disagreeable than
men who answer questions identically.
I took the online IPIP-NEO assessment twice, without varying my answers—and was rated 55 on
agreeableness as a woman, and 66 as a man. I also saw significant differences in results when I
took the NEO Five-Factor Inventory (a shorter version of the NEO Personality Inventory, also
created by Costa and McCrae), which isn’t freely available online. PAR, a publisher of
psychological assessment products, sent me tests for both a man and a woman and, once I’d
completed them, sent me the results. As a man, I was told that I’m “compassionate, goodnatured, and eager to cooperate and avoid conflict.” As a woman: “Generally warm, trusting, and
agreeable, but you can sometimes be stubborn and competitive.”
It’s a scientifically reinforced version of the sexism that pervades society: Women who are
straightforward and opinionated are told they’re difficult and argumentative, while men with the
exact same character traits are seen as charismatic leaders.
The sexist feedback isn’t due to an inherent flaw in the personality test, nor malicious intent. It’s
because of how the psychologists behind the inventories present the results. Rather than giving
an absolute score in each of the Big Five categories, they tell you your percentile in comparison
to others within your gender.
“Your results are based on comparing you to all the other humans who have taken the test,”
wrote Maggie Koerth-Baker in the FiveThirtyEight article. That seems to be true for the site on
which Koerth-Baker took her test, which uses the BFI and is run by Christopher Soto, a
psychology professor at Colby College. (I took the test as a man and a woman on this site, and
got the same results each time.)
But the other tests taken by Quartz compared women to other women, and men to other men.
As women tend to be more agreeablethan men, this means that a slightly disagreeable woman
will be relatively more disagreeable when compared only to other women. Women’s agreeable
nature is not inherently biological: Social conditions pressure women to behave in this way. And
this sexist trope is neatly reinforced in how the results from these personality tests are
Crucially, those who take the tests aren’t told that this is how their results are calculated. The Big
Five Inventory site acknowledges that “percentile scores are relative to our particular sample of
people. Thus, your percentile scores may differ if you were compared to another sample (e.g.,
elderly British people).” The site does not mention the gender-based comparison groups. The
IPIP-NEO site says that if you mark yourself as a woman, your score is calculated as compared
to other women, but does not explain how this influences results,.
The psychologists I spoke to seemed unconcerned about the implications. Laura Naumann, a
psychology professor at Nevada State College, notes that women inherently evaluate their
personalities as compared to other women: “The Big Five says, ‘Do you see yourself as someone
who is considerate and kind to others?’ In that is an inherent comparison. As a woman, you’re
thinking, ‘I think I’m considerate and kind.’ There’s research in social psychology to say women
are implicitly comparing themselves to their own gender.” Naumann pointed me to the 1950s
work of Leon Festinger, who found that people tend to evaluate their own performance relative
to others they consider comparable, and the 2000s work of Monica Biernat, a psychology
professor who found that gender stereotypes shape how people judge one another. However,
Quartz was not able to find research-based evidence that women evaluate their personalities
compared only to other women.
John Johnson, the Penn State University psychologist who created the IPIP-NEO website, says
the psychology of personality reflects everyday perceptions. “[P]sychologists borrow the concept
of personality traits from ordinary language, which reflects the way ordinary people think about
each other’s thoughts, feelings, and behaviors,” he writes in an email. “It is all rather subjective,
although people who grow up together in the same culture will tend to agree in their
assessments of who is more or less agreeable because of their shared social standards.”
These standards, he says, are affected by age and gender. Just as a six-year-old who displays
lackadaisical behavior might still be perceived as “conscientious” while a middle-aged person
with the same behaviors will not, men and women are judged differently: “Because women are,
on average, more agreeable than men, people often have a different standard for assessing
whether a woman is ‘average’ in agreeableness or ‘highly agreeable,’” Johnson says.
There’s another flaw when it comes to taking Big Five personality tests online. The online
versions of the Big Five traits inform people of negative character traits, without explaining that
the positivity or negativity of all characteristics is shaped by context.
Costa believes that the Big Five’s willingness to point out negative traits makes the test more
accurate: Myers-Briggs avoids “anything that could be negative. And that’s a great big marketing
thing,” he says. But each potentially negative Big Five character trait is informed by the
situation. “They’re only negative in certain contexts,” he says.
For example, Costa explains, agreeable people are great for a blind date, but tend to be overly
dependent. Disagreeable people, meanwhile, aren’t good at smoothing over arguments. But
they’re also less likely to obediently follow immoral orders—such as those demonstrated by the
Milgram experiment, wherein participants are asked to administer increasingly intense electric
shocks to a victim. (The longer IPIP-NEO test briefly acknowledged the importance of context,
noting, “agreeableness is not useful in situations that require tough or absolute objective
decisions,” but the Big Five Inventory website offered no such explanations.)
Personality is messier than star signs or Myers-Briggs tests or any clear-cut personality
diagnosis likes to pretend. Contrary to the popular idea that we have some inherent true self, our
personality is best scientifically evaluated simply according to how we—and those around us—
see ourselves.
In an academic setting, psychologists often also ask family, coworkers, and friends to take a bigfive questionnaire to evaluate someone else’s personality. There’s no right answer—colleagues
may well view someone as highly conscientious, while neighbors still waiting for a pot to be
returned will consider them less so. Both these views are meaningful, and collectively contribute
to a picture of someone’s personality.
It’s inherently sexist, though, to view straightforward women as hostile or rude while approving
of men who behave the same way. In an academic setting, the Big Five personality tests
acknowledge the nuances of personality, often considering multiple personality inventories of
the same person, taken by both themselves and those around them. If only the online tests were
used to reflect such subtleties.
1) Write down your own score from taking the Big Five Personality Test
2) Do you agree with the article’s author that this test is promoting sexist stereotypes? Is it unfair
to women?
3) Should, in your opinion, men and women’s answers all be scored against each other? Why or
why not?
4) Do you agree with the author that the inclusion of negative personality traits in the Big Five
makes it a more accurate assessment than the Myers-Briggs? Why or why not?
Related flashcards
Create Flashcards