Natural Language Processing Meets Computational Social Science Sofya Nikolaeva Higher School of Economics Faculty of Computer Science Moscow, Russia spnikolaeva@edu.hse.ru Abstract—Analyzing gender bias through tropes, narrative patterns, is a popular field of studying in social science. In this paper, we will turn to different articles on this topic and discuss our future research on analysing changes in gendered tropes over time. Index Terms — NLP, gender bias, tropes, popular media I. I NTRODUCTION Popular media has a powerful influence on all of us, and at the same time frighteningly manipulates us and shapes our philosophy of life. Unfortunately, popular media also reflects old-fashioned societal biases and reifies gender stereotypes in society. In particular, it exists in a form of tropes, which are frequent-occurring narrative patterns within popular media. The presented paper focuses on the analysis of gender biases within narrative tropes in films and TV shows. In this work, we consider only cis-gender males and females due to the lack of reliable lexicons, which is limiting the ability to investigate bias across other gender identities. For an in-depth study of gender biases within narrative tropes, we will use modern Natural Language Processing (NLP) methods. NLP is an essential factor in the development of various social sciences which works a lot with text data and people. It shows incredible results in text analytical tasks and changes the way humans and machines interact. Recent articles on the research of gender biases within tropes that exist in popular media demonstrate analysis of such things as the dependence between gendered tropes and ratings of movies, movie genres, and also the dependence between the gender of the authors and the diversity of female-leaning tropes in the books. In this research, we will continue the work done by Dhruvil Gala et al. [1], as we will evaluate how the gendered tropes have changed over time and use tropes to predict film awards and box office results. Moreover, there will be an analysis of the diversity of gendered tropes in films and TV shows, depending on the gender of their creators, such as a gender of directors, screenwriters, producers, etc . As data for analysis, we take a large-scale dataset collected by Dhruvil Gala et al. [1], which contains 1.9M examples of 30K tropes in popular media from TVtropes.org. The information about movies and TV series (year of release, box office, rating, etc.) will be collected from the sites IMDb.com and kinopoisk.ru. In this research, we hope to find out whether the differences that exist within the diversification of male and female gen- dered tropes is decreasing over time, which would be a sign that our society has become more tolerant. In the following sections, we will discuss in more detail the reviewed literature, the methodology, the expected results and finally, we will draw the conclusions in the last section. II. L ITERATURE R EVIEW As stated before, there are various articles on analyzing gender bias through tropes in popular media. In this work, we are mostly inspired by the article written by Dhruvil Gala, Mohammad Omar Khursheed, Hannah Lerner, Brendan O’Connor, Mohit Iyyer. The authors analysed gender bias within a large collection of tropes. To further their study, the researchers collected a large-scale dataset from an online user-created repository TVtropes.org. They introduced a new metric - a genderedness score. It computes for each trope individually by counting pronouns and gendered terms. Using this metric they evaluated the genderedness of tropes based on various cases. For example, their analysis of the effects of different genres on genderedness proved that films and TV shows about sports, science fiction, and war-oriented were directed towards male tropes, while romance, musicals, and horror movies relied on female tropes. Another interesting example was the prediction ratings of the movies and TV shows based on tropes. It demonstrated that low-rated movies had a much higher genderedness score than high-rated films and gave an impressive opposition to the contrary conclusions from the analyses of IMBd reviews by Karen Boyle [2]. In her research, Boyle came to the conclusion that women continued to be abnormal within film comedy outside of narrowly defined roles in romantic comedies. In our work, we will deepen the research of Dhruvil Gala et al. and focus more on the predictions based on gendered tropes. Their absorbing experiment showed that male-dominated tropes presented higher topic diversity (such as religion, war, sport, money), than female-leaning tropes that often focused only on sexuality and maternalism. There is a big difference in the diversity in male and female tropes, e.g. the evil genius trope includes 108 male and only 15 female examples in films and TV shows. The paper by Hansen [4] showed that the female character (titular princess) from the video game, called The Legend of Zelda, is representative of the Damsel in Distress trope. The importance of studying the subject of gender bias in movies and TV shows was also discussed by Robin Redmon Wright [3]. In her paper, Wright gave an example of the TV influence: her research showed that some young adult women who watched The Avengers (the British espionage television programme) in 1962 to 1964 got an opportunity to liberate themselves of the person they had been socially constructed to be. It happened because of a strong female character, played by Honor Blackman, who ended up replacing a lead male actor in the crime drama. III. M ETHODOLOGY This section is about methods we will use in our work, but first, we need to say more about the metric genderedness score introduced by Dhruvil Gala et al. [1]. For the examination of the correlations of male and female genders using tropes, the authors relied on matching tokens to gender lexicons with methods created by Tolga Bolukbasi et al. [5]. They used such gendered terms as gendered pronouns and employment (e.g. actress, actor). The authors also validated the efficiency of the lexicon in capturing gender identity: they took 150 random examples and then classified each of them using the lexicon as male or female. For each trope Dhruvil Gala et al. concatenated the trope’s description with all of the trope’s examples, then tokenized, preprocessed and sorted the resulting document Xi using The Natural Language Toolkit (NLTK) [6]. NLTK is a package of libraries and programs for work with Natural Language Processing written in the Python programming language. It is mainly used for training or creating various processing methods using the various tools that the Natural Language Toolkit provides in a large amount. After receiving the results Dhruvil Gala et al. introduced the concept of raw genderedness score of trope i, di = f (Xi ) f (Xi ) + m(Xi ) | {z } ri f (T V T ROP ES) , f (T V T ROP ES) + m(T V T ROP ES) | {z } rT V T ROP ES where m(Xi ) is the number of tokens in the resulting document that matched the male lexicon, and f (Xi ) represented the female lexicon, while f (T V T ROP ES) and m(T V T ROP ES) represent the numbers of matches for female and male gender respectively in all trope documents. Finally, they introduced genderedness score as di ’s normalized z-score. The authors claimed the tropes with genderedness scores outside of [−1, 1] as highly gendered, while the lowest score equals to 1.84 indicated the male-dominated trope and the highest score (4.02) indicated the female-dominated one. In our work we will try to improve this metric: we will use more complicated methods of taking trope genderedness, as well as increase the vocabulary of the gender lexicon given by Dhruvil Gala et al. Furthermore, we will use the significant information on methodology from the work done by Ananya et al. [7]. They suggested methods that could measure gender bias using contextual cues to improve probabilistic estimations. It will remarkably help in our research. IV. R ESULTS A NTICIPATED In the modern world, old-fashioned social biases have almost disappeared, but popular media still reflect them in themselves. In this research, we hope that the results will demonstrate that the difference in the diversity of male and female gendered tropes is decreasing over time. It would show that the popular media is changing in relation to gender bias and becoming more open-minded towards the development of complex female characters. Also, we think that our analysis of the dependence between the diversity of female gendered tropes in films/TV shows and the gender of their creators (e.g. directors, screenwriters, producers, etc) will show similar results to the analogical research done by Dhruvil Gala et al., where the researchers tried to predict the author gender identities using the tropes in their books. The experiment demonstrated that female authors use more female-leaning tropes and much less stereotypical female-oriented tropes than male writers. Moreover, we will use tropes to predict film and TV show awards, and we believe that the results could demonstrate the direct correlation between more developed female characters. We believe that this is possible because of the recent attention of the cinematic community to equality in the modern film industries. Also, we will predict the box office results by using tropes, and we think that the analysis of the dependence between the results and gendered tropes will be quite interesting. V. C ONCLUSION Summarizing everything mentioned up, we can conclude that the analyzing gender bias through tropes is currently quite important and deserves public attention. This research will show how the gendered tropes have changed over time, and most likely indicate that social biases have not yet outlived themselves in the modern film industry. However, we believe that the results will also show us the development of new female tropes, the deepening of old female-dominated tropes and the retreat from stereotypical characters in popular media. This area has research potential: in the presented paper we consider only cis-gender males and females, however, it is possible to deepen the study on different gender bias by increasing the current dataset of tropes. It would let us analyse societal biases across other gender identities and give a possibility to the future research. R EFERENCES [1] Dhruvil Gala, Mohammad Omar Khursheed, Hannah Lerner, Brendan O’Connor, Mohit Iyyer 2020 “Analyzing Gender Bias within Narrative Tropes“ [2] Karen Boyle. 2014. “Gender, comedy and reviewing culture on the Internet Movie Database.“ The Journal Participations, Volume 11, Issue 1, [3] Robin Redmon Wright “Gender Consciousness, British Women, and The Avengers“, 2008 Conference Proceedings (St. Louis, MO) [4] Rubén H. Garcı́a-Ortega, Juan J. Merelo-Guervós, Pablo Garcı́a Sánchez, Gad Pitaru. 2018. “Overview of PicTropes, a film trope dataset“. [5] Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Kalai. 2016. “Man is to computer programmer as woman is to home- maker? debiasing word embeddings“. [6] Edward Loper and Steven Bird “NLTK: The Natural Language Toolkit“ PA 19104-6389 [7] Ananya, Nitya Parthasarthi, and Sameer Singh. 2019. “GenderQuant: Quantifying mention-level genderedness. “ Word count: 1525