Uploaded by loudsof

proekt-metody-obrabotki-tekstov-v-socialnyh-naukah

advertisement
Natural Language Processing Meets Computational
Social Science
Sofya Nikolaeva
Higher School of Economics
Faculty of Computer Science
Moscow, Russia
spnikolaeva@edu.hse.ru
Abstract—Analyzing gender bias through tropes, narrative
patterns, is a popular field of studying in social science. In this
paper, we will turn to different articles on this topic and discuss
our future research on analysing changes in gendered tropes over
time.
Index Terms — NLP, gender bias, tropes, popular media
I. I NTRODUCTION
Popular media has a powerful influence on all of us, and
at the same time frighteningly manipulates us and shapes our
philosophy of life. Unfortunately, popular media also reflects
old-fashioned societal biases and reifies gender stereotypes in
society. In particular, it exists in a form of tropes, which are
frequent-occurring narrative patterns within popular media.
The presented paper focuses on the analysis of gender
biases within narrative tropes in films and TV shows. In this
work, we consider only cis-gender males and females due to
the lack of reliable lexicons, which is limiting the ability to
investigate bias across other gender identities. For an in-depth
study of gender biases within narrative tropes, we will use
modern Natural Language Processing (NLP) methods. NLP
is an essential factor in the development of various social
sciences which works a lot with text data and people. It shows
incredible results in text analytical tasks and changes the way
humans and machines interact.
Recent articles on the research of gender biases within
tropes that exist in popular media demonstrate analysis of such
things as the dependence between gendered tropes and ratings
of movies, movie genres, and also the dependence between
the gender of the authors and the diversity of female-leaning
tropes in the books. In this research, we will continue the
work done by Dhruvil Gala et al. [1], as we will evaluate how
the gendered tropes have changed over time and use tropes to
predict film awards and box office results. Moreover, there will
be an analysis of the diversity of gendered tropes in films and
TV shows, depending on the gender of their creators, such as
a gender of directors, screenwriters, producers, etc . As data
for analysis, we take a large-scale dataset collected by Dhruvil
Gala et al. [1], which contains 1.9M examples of 30K tropes
in popular media from TVtropes.org. The information about
movies and TV series (year of release, box office, rating, etc.)
will be collected from the sites IMDb.com and kinopoisk.ru.
In this research, we hope to find out whether the differences
that exist within the diversification of male and female gen-
dered tropes is decreasing over time, which would be a sign
that our society has become more tolerant.
In the following sections, we will discuss in more detail the
reviewed literature, the methodology, the expected results and
finally, we will draw the conclusions in the last section.
II. L ITERATURE R EVIEW
As stated before, there are various articles on analyzing
gender bias through tropes in popular media. In this work,
we are mostly inspired by the article written by Dhruvil
Gala, Mohammad Omar Khursheed, Hannah Lerner, Brendan
O’Connor, Mohit Iyyer. The authors analysed gender bias
within a large collection of tropes. To further their study,
the researchers collected a large-scale dataset from an online
user-created repository TVtropes.org. They introduced a new
metric - a genderedness score. It computes for each trope
individually by counting pronouns and gendered terms. Using
this metric they evaluated the genderedness of tropes based
on various cases. For example, their analysis of the effects of
different genres on genderedness proved that films and TV
shows about sports, science fiction, and war-oriented were
directed towards male tropes, while romance, musicals, and
horror movies relied on female tropes. Another interesting
example was the prediction ratings of the movies and TV
shows based on tropes. It demonstrated that low-rated movies
had a much higher genderedness score than high-rated films
and gave an impressive opposition to the contrary conclusions
from the analyses of IMBd reviews by Karen Boyle [2]. In her
research, Boyle came to the conclusion that women continued
to be abnormal within film comedy outside of narrowly defined
roles in romantic comedies. In our work, we will deepen the research of Dhruvil Gala et al. and focus more on the predictions
based on gendered tropes. Their absorbing experiment showed
that male-dominated tropes presented higher topic diversity
(such as religion, war, sport, money), than female-leaning
tropes that often focused only on sexuality and maternalism.
There is a big difference in the diversity in male and female
tropes, e.g. the evil genius trope includes 108 male and only 15
female examples in films and TV shows. The paper by Hansen
[4] showed that the female character (titular princess) from the
video game, called The Legend of Zelda, is representative of
the Damsel in Distress trope.
The importance of studying the subject of gender bias in
movies and TV shows was also discussed by Robin Redmon
Wright [3]. In her paper, Wright gave an example of the TV
influence: her research showed that some young adult women
who watched The Avengers (the British espionage television
programme) in 1962 to 1964 got an opportunity to liberate
themselves of the person they had been socially constructed
to be. It happened because of a strong female character, played
by Honor Blackman, who ended up replacing a lead male actor
in the crime drama.
III. M ETHODOLOGY
This section is about methods we will use in our work, but
first, we need to say more about the metric genderedness score
introduced by Dhruvil Gala et al. [1]. For the examination
of the correlations of male and female genders using tropes,
the authors relied on matching tokens to gender lexicons with
methods created by Tolga Bolukbasi et al. [5]. They used
such gendered terms as gendered pronouns and employment
(e.g. actress, actor). The authors also validated the efficiency
of the lexicon in capturing gender identity: they took 150
random examples and then classified each of them using
the lexicon as male or female. For each trope Dhruvil Gala
et al. concatenated the trope’s description with all of the
trope’s examples, then tokenized, preprocessed and sorted the
resulting document Xi using The Natural Language Toolkit
(NLTK) [6].
NLTK is a package of libraries and programs for work
with Natural Language Processing written in the Python programming language. It is mainly used for training or creating
various processing methods using the various tools that the
Natural Language Toolkit provides in a large amount.
After receiving the results Dhruvil Gala et al. introduced
the concept of raw genderedness score of trope i, di =
f (Xi )
f (Xi ) + m(Xi )
|
{z
}
ri
f (T V T ROP ES)
,
f (T V T ROP ES) + m(T V T ROP ES)
|
{z
}
rT V T ROP ES
where m(Xi ) is the number of tokens in the resulting
document that matched the male lexicon, and f (Xi ) represented the female lexicon, while f (T V T ROP ES) and
m(T V T ROP ES) represent the numbers of matches for female and male gender respectively in all trope documents.
Finally, they introduced genderedness score as di ’s normalized
z-score. The authors claimed the tropes with genderedness
scores outside of [−1, 1] as highly gendered, while the lowest
score equals to 1.84 indicated the male-dominated trope and
the highest score (4.02) indicated the female-dominated one.
In our work we will try to improve this metric: we will
use more complicated methods of taking trope genderedness,
as well as increase the vocabulary of the gender lexicon
given by Dhruvil Gala et al. Furthermore, we will use the
significant information on methodology from the work done by
Ananya et al. [7]. They suggested methods that could measure
gender bias using contextual cues to improve probabilistic
estimations. It will remarkably help in our research.
IV. R ESULTS A NTICIPATED
In the modern world, old-fashioned social biases have
almost disappeared, but popular media still reflect them in
themselves. In this research, we hope that the results will
demonstrate that the difference in the diversity of male and
female gendered tropes is decreasing over time. It would show
that the popular media is changing in relation to gender bias
and becoming more open-minded towards the development of
complex female characters.
Also, we think that our analysis of the dependence between
the diversity of female gendered tropes in films/TV shows
and the gender of their creators (e.g. directors, screenwriters,
producers, etc) will show similar results to the analogical
research done by Dhruvil Gala et al., where the researchers
tried to predict the author gender identities using the tropes in
their books. The experiment demonstrated that female authors
use more female-leaning tropes and much less stereotypical
female-oriented tropes than male writers.
Moreover, we will use tropes to predict film and TV show
awards, and we believe that the results could demonstrate the
direct correlation between more developed female characters.
We believe that this is possible because of the recent attention
of the cinematic community to equality in the modern film
industries. Also, we will predict the box office results by using
tropes, and we think that the analysis of the dependence between the results and gendered tropes will be quite interesting.
V. C ONCLUSION
Summarizing everything mentioned up, we can conclude
that the analyzing gender bias through tropes is currently quite
important and deserves public attention. This research will
show how the gendered tropes have changed over time, and
most likely indicate that social biases have not yet outlived
themselves in the modern film industry. However, we believe
that the results will also show us the development of new
female tropes, the deepening of old female-dominated tropes
and the retreat from stereotypical characters in popular media.
This area has research potential: in the presented paper we
consider only cis-gender males and females, however, it is possible to deepen the study on different gender bias by increasing
the current dataset of tropes. It would let us analyse societal
biases across other gender identities and give a possibility to
the future research.
R EFERENCES
[1] Dhruvil Gala, Mohammad Omar Khursheed, Hannah Lerner, Brendan
O’Connor, Mohit Iyyer 2020 “Analyzing Gender Bias within Narrative
Tropes“
[2] Karen Boyle. 2014. “Gender, comedy and reviewing culture on the
Internet Movie Database.“ The Journal Participations, Volume 11, Issue
1,
[3] Robin Redmon Wright “Gender Consciousness, British Women, and The
Avengers“, 2008 Conference Proceedings (St. Louis, MO)
[4] Rubén H. Garcı́a-Ortega, Juan J. Merelo-Guervós, Pablo Garcı́a
Sánchez, Gad Pitaru. 2018. “Overview of PicTropes, a film trope
dataset“.
[5] Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama,
and Adam Kalai. 2016. “Man is to computer programmer as woman is
to home- maker? debiasing word embeddings“.
[6] Edward Loper and Steven Bird “NLTK: The Natural Language Toolkit“
PA 19104-6389
[7] Ananya, Nitya Parthasarthi, and Sameer Singh. 2019. “GenderQuant:
Quantifying mention-level genderedness. “
Word count: 1525
Download