Trump vs Biden 2020 Election Stylometric Analysis

Trump’s and Biden’s Styles during the 2020 US Presidential Election Jacques Savoy[0000−0002−4486−0067], Marylène Wehren To appear in Digital Scholarship in the Humanities (2021-2022) University of Neuchatel rue Emile Argand 11 2000 Neuchatel, Switzerland {Jacques.Savoy, Marylene.Wehren}@unine.ch Abstract This study analyzes the stylistic and rhetorical characteristics of Donald Trump and Joe Biden during the 2020 US presidential election. As communication channels, the oral (TV debates and radio interviews), written speeches, and tweets sent during the last two months have been examined. As stylometric markers, the most frequent functional words indicate that Trump employs more diverse personal pronouns more frequently than Biden. For the Democratic nominee, the first-person pronouns (I and we) are mainly used with, compared to Trump, a higher rate of prepositions, signaling the use of a larger number of nouns. When comparing the three media of communication, this study shows that the oral form is closer to the written one and that tweet-based communication presents a distinct type compared to the other two. Usually, written texts tend to contain more functional words than either the oral form or tweets. When comparing the two nominees, the frequency of familiar words (e.g., the, to, of, a, etc.) tends to be higher in Biden’s addresses, while Trump opts more often for grandiose terms (e.g., great, just, strong, etc.). As rhetorical features, the Republican president uses more emotional words, both positive and negative. The Democratic candidate does not propound negative messages and opts for symbolic (e.g., nation, country) and achievement terms (e.g., plan, win, work) with numerous references to people and family (e.g., folks, Americans, son, etc.). Keywords: Stylometry; Stylistic measurement; Political stylometric analysis. 1 Introduction This study is focused on the 2020 US presidential campaigns led by Donald Trump and Joe Biden. During such an electoral campaign, each political leader must convince his electoral base of his leadership and demonstrate his strength and ability to govern the country. For Trump, these functions were mainly fulfilled by social networks and particularly by Twitter rather than by traditional speeches. Moreover, during this presidential election, the corona virus pandemic reinforced the relevance of social networks due to the complexities in organizing political meetings. -1- In this context, our main objective is to determine the stylistic characteristics of the two main candidates. As each nominee can utter a speech during a meeting, provide answers during interviews or TV debates, and send tweets, this study will analyze the style occurring in the oral, written, and tweet-based communication channels. Thus, the main objective is to identify stylistic markers specific to a given political leader and to a particular communication channel. Going a step further, this analysis will identify rhetorical features specific to each candidate. In this view, rhetoric is defined as the art of effective and persuasive speaking, the way to motivate an audience, while language style is present as pervasive and frequent forms used by an author for mainly aesthetic reasons (Biber & Conrad, 2009). Even if Trump’s or Biden’s speeches are examined, we know that behind each well-known politician there is usually a speechwriter (or a team of ghostwriters). For example, behind Kennedy one can find the name of Sorensen (Carpenter & Seltzer, 1970), Favreau behind Obama, and even Madison & Hamilton behind some speeches delivered by Washington. But as mentioned by Sorensen: “If a man in a high office speaks words which convey his principles and policies and ideas and he’s willing to stand behind them and take whatever blame or therefore credit go with them, [the speech is] his”. More precisely, the White House spokesman Hogan Gidley said: “The president is a best-selling author and deeply gifted orator who packs arenas and has a meticulous and carefully honed method for writing his speeches, whether it be at a rally, a manufacturing plant opening or the State of the Union. What the American people hear is 100 percent President Trump’s own words”. (Rogers, 2020) To analyze both candidates’ styles, the rest of this article is subdivided as follows. Section 2 exposes some related work while Section 3 presents the corpora used in this study as well as some overview statistics. Simple stylistic markers are analyzed in Section 4, and the next one discusses the distribution of functional words over the three communication channels, namely the oral, written, and tweet-based. Section 6 presents a more advanced stylistic and rhetoric analysis of the electoral speeches and tweets. The characteristic vocabulary associated with the two candidates is exposed in Section 7. Finally, a conclusion reports the main findings of this study. 2 State of the Art Freely available, easy to understand, and having an important impact, political texts have been studied according to different perspectives. The most significant are governmental speeches focusing on a given president or prime minister in power (Mayaffe, 2004), (Labbé et al., 2021), (e.g., Speeches from the Throne (Canada and Quebec) and general policy statements of French governments in Labbé & Monière (2003; 2008), State of the Union (Rule et al., 2015), (Savoy, 2015) or US inaugural addresses (Kubát & Cech, 2016)). In other countries, the presidential function is limited to the head of the state and their messages, though viewed as less important, -2- can still present interesting stylometric applications (Pauli & Tuzzi, 2009), (Kubát et al., 2020). As a second less studied category, one can focus on electoral speeches (Arnold & Labbé, 2015), (Savoy, 2018a) as well as press releases during a presidential campaign (Labbé & Monière, 2013). When studying governmental speeches over decades, the constitutional institutions tend to smooth out the differences between political parties when exercising power. Stylistic and rhetorical variations between presidents or prime ministers could be mainly explained by their temporal differences. The arrival of a strong leader as well as exceptional events (e.g., worldwide war, deep economic depression) could however reveal a real vocabulary and stylistic change (Labbé & Monière, 2003), (Savoy, 2015), (Kubát & Cech, 2016). As a third source of political speeches, discussions in parliament have been studied according to several perspectives. For example, Laver et al. (2003) or Grimmer & Stewart (2013) describe a methodology to extract topical and political positions from texts. Yu (2008) demonstrates that machine learning methods (e.g., SVM and naïve Bayes) can be trained to classify congressional speeches according to political parties. In a following analysis, Yu (2013) shows that author gender can be determined and that female political figures figures tend to use emotional words and personal pronouns more frequently than men. Based on tweets, the differentiation between political parties can also be observed (Sylwester & Purver, 2015). Such differences are correlated with psychological factors, with positive emotional terms occurring more frequently in Democrats’ tweets as well as swear expressions (e.g., alien, asshole, hell, etc.), or first singular person pronouns (e.g., I, me). For Raubach (2019), positive emotion words should be associated to the party in power and not simply attached to a given party. Stylistic fingerprints defined by the frequency of functional words (articles, pronouns, prepositions, conjunctions, and auxiliary verbs) have been proposed to discriminate between political leaders (Savoy, 2020). However, sentiments and emotions tend also to play an essential role nowadays in rhetoric and style. To quantify this aspect, O’Connor et al. (2010) or Young & Soroka (2012) suggest counting the frequency of words appearing in a dictionary of positive or negative emotional terms. In a similar way, Hart (1984) has designed and implemented a political text analyzer called DICTION that generalizes the idea of representing emotions, or more generally concepts, by defining lists of terms. In a first book, Hart (1984) exposes the rhetoric and stylistic variations between the US presidents from Truman to Reagan, while a follow-up study (Hart et al., 2013) exposes the stylistic variations from G.W. Bush to Obama. Recently, Hart (2020) analyzed Trump’s rhetoric and concluded that Trump is the president of the extremes presenting either a high or low level depending on the target emotion or rhetorical concept. As another example, LIWC (Linguistic Inquiry and Word Count) (Tausczik & Pennebaker, 2010) regroups different categories used to evaluate the author’s psychological status (e.g., feminine, emotional, leadership), as well as her/his style (e.g., mainly based on personal pronouns (Pennebaker, 2011)). The underlying hypothesis is to assume that the words serve as guides to the way the author thinks, acts or feels. Using this system, Slatcher et al. (2007) were able to -3- determine the personalities of different political candidates (2004 US presidential election). They defined the psychological portrait both on single measurements (e.g., the relative frequency of pronouns, social words, etc.) and using a set of composite indices reflecting the cognitive complexity, presidentiality or honesty of each candidate. These personality measurements were in agreement with different opinion polls. For example, G.W. Bush uses the pronoun I, positive emotion words (e.g., happy, truly, win), and future tense more frequently. The public perceives Kerry as a kind of depressed person, serious, somber, and cold, uttering negative emotion expressions (e.g., sad, worthless, lost) and physical words (e.g., head, ache, sleep) more frequently. In brief, previous studies have mainly analyzed governmental speeches, and less frequently the electoral speeches (Boller, 2004). A few studies focus on the legislative level (e.g., the Congress) and these studies are based on the written form. Other communication channels are more difficult to obtain (e.g., transcripts of TV debates or interviews, webpages available on Facebook, blogs, or tweets). The current study focuses together precisely on these three forms, namely the written, oral, and web-based channels during a recent electoral campaign. 3 Corpora To accurately analyze the style adopted by the two main candidates of the 2020 US presidential election, some general background information is required. This campaign was marked by the corona virus pandemic. Therefore, the number of electoral meetings was rather limited and social networks have played a more important role. As other significant events during this campaign, one can mention that a state of emergency for Covid-19 was declared on March 13th. After B. Sanders dropped out (April 16th), Joe Biden was the only candidate for the Democratic party. The death of G. Floyd (May 25th) highlighted the racial question, generating demonstrations and sometimes violent riots. In August, the two national conventions took place (Democratic, Aug. 17th-20th; Republican, Aug. 24th-27th). The death of the Supreme Court judge R. Bader Ginsburg (September 18th) allowed the nomination of the conservative judge A. Coney Barrett. On October 1st, president Donald Trump announced that he was infected by the corona virus. In this context and to convince the voters, the candidates can rely on traditional meetings as well as on TV debates and interviews. The first form corresponds to a written communication while the second is an oral one. One can consider that speeches delivered by the nominees correspond to an oral communication form, while (written) messages must be categorized as a distinct text genre. However, as mentioned by Biber & Conrad (2009, p. 262) “Language that has its source in writing but performed in speech does not necessarily follow the generalization (written vs. oral). That is, a person reading a written text aloud will produce speech that has the linguistic characteristics of the written text. Similarly, written texts can be memorized and then spoken”. When comparing these two communication forms, the written one is more precise and permanent while the oral form is more spontaneous, direct, and usually less formal. Usually a -4- (written) message presents a more complex sentence construction including longer words. Looser structure and filler phrases (e.g., uh, um) occur more frequently in oral productions, as well as several repetitions of the same expressions (Crystal, 2018). To these two forms, we also study the social networks and particularly Twitter. It was recognized that such web-based communication channels might be viewed as new forms between the classical oral and written usage (Crystal, 2006). This corpus is generated from the same period, with discourses produced to achieve the same aim and covering similar topics. Thus, several factors affecting the style are kept constant such as the time period, the subjects, and the objective. To generate our corpus, all tweets sent by Trump (Twitter account @realDonaldTrump) have been downloaded from a dedicated website1. This corpus runs from September 1st up to November 3rd, 2020 and contains 2,278 tweets. For Biden, an API has been implemented to directly extract his 871 tweets from the Twitter server. Some overall statistics are depicted in Table 1a, while Table 1b shows an overview of the three communication forms. As speeches and interviews, the acceptance speeches uttered during the national convention is the first one in our collection. The transcripts of the two TV debates have been included in the oral corpus together with a few radio interviews. Table 1a. Some statistics about Donald Trump’s and Joe Biden’s Twitter accounts Name Number Day Mean Token Trump Biden 2,278 871 43.4 13.6 23.7 33.3 Uppercase Words RT Per 100 tweets @ # URL 5.36% 0.54% 51.7 3.1 87.0 13.9 62.3 81.4 16.0 4.1 Table 1b. Some statistics according to authors and communication channels Form Token Vocabulary size TTR Lexical Density Percent. Big Words Oral Trump Written Trump Tweets Trump Oral Biden Written Biden Tweets Biden 37,759 63,731 54,455 53,016 32,286 25,185 2,569 3,816 6,308 3,695 3,463 2,882 0.331 0.367 0.481 0.360 0.411 0.405 43.1% 46.4% 58.5% 43.1% 47.7% 49.9% 16.7% 17.9% 30.7% 19.8% 23.6% 27.5% The data shown in Table 1a signals that Trump sent more tweets per day than Biden (43.4 vs. 13.6). Trump’s tweets are, in mean, shorter (23.7 tokens vs. 33.3), containing more mentions (e.g., @WhiteHouse, @FoxNews), and hashtags (e.g., #MAGA, #Vote). Trump retweets significantly more (51.7 per 100 tweets) than Biden. Those repeated tweets usually contain a single URL presenting a video. Another stylistic marker strongly associated with Trump is the 1 See www.trumptwitterarchive.com/archive. -5- presence of words in uppercase letters (e.g., AMERICA, GREAT, VOTE). As mentioned in Table 1a, 5.36% of tweeted words longer than three letters belong to this category (in this count, the shorter words such as “I”, “US”, “FBI”, etc. have been ignored). Biden’s tweets comprise more URLs, usually to provide videos or webpages supporting his claims. To differentiate the latent characteristics of the three communication channels, one can measure the vocabulary richness by computing the type-token ratio (TTR) (Baayen, 2008). The precise definition is described in Equation 1 where, for a given text t, the number of distinct word-types is denoted by Voc(t) and its length (number of tokens) is returned by the function Token(t). TTR(𝑡) = 𝑉𝑜𝑐(𝑡) /𝑇𝑜𝑘𝑒𝑛(𝑡) (1) High values indicate the presence of a rich vocabulary showing that the underlying text exposes many different topics or that the author writes on a few themes from several angles with different expressions and formulations. On the other hand, a small TTR value signifies that the vocabulary used by the author is limited or that the words and expressions are repeated. TTR values are however sensitive to the text length and as the length increases, the resulting TTR decreases (Baayen, 2008). To avoid this problem, the values reported in Table 1b correspond to the mean over a sample of TTRs computed after each segment of 1,000 tokens (Covington & McFall, 2010). The values in Table 1b indicate that the oral form presents the smallest TTR values due to a higher rate of repetition. Higher TTR values are shown with the written communications. One can also observe that the difference between oral and written is larger for Biden (0.360 vs. 0.411) than for Trump (0.331 vs. 0.367), an indication that Trump is focusing mainly on a few subjects. When considering the tweets, the TTR scores are high, and for Trump clearly higher than his written form. Thus, based only on the TTR values, scripting in Twitter seems closer to the written form than to the oral one. As a second overall measurement, the lexical density (LD) (Biber et al., 2002) has been computed as a mean over a sequence of 1,000 tokens. For each bucket of 1,000 tokens, we applied Equation 2 in which Function words(t) indicates the number of functional words in t, and Lexical words(t) the number of lexical words in t. This latter set is composed of nouns, names, adjectives, verbs, and adverbs. On the other hand, functional words regroup all other grammatical categories. LD(𝑡) = 𝐿𝑒𝑥𝑖𝑐𝑎𝑙 𝑤𝑜𝑟𝑑𝑠(𝑡) 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑤𝑜𝑟𝑑𝑠(𝑡) /𝑇𝑜𝑘𝑒𝑛(𝑡) = 1 - = /𝑇𝑜𝑘𝑒𝑛(𝑡)@ (2) A relatively high LD percentage indicates a more complex text, containing more information. As depicted in the last but one column of Table 1b, the written communication medium shows higher values than the oral one. It is interesting to note that tweets present the highest LD percentages. This finding can be explained by the fact that each tweet is an independent unit centered on a specific subject that must be clearly identified. Thus, pronouns and determiners occur less frequently (e.g., “Law and order”, “Kat has my Complete and Total Endorsement!”). -6- As depicted in the last column of Table 1b, the percentage of big words (composed of six letters or more) is higher in written messages than in oral production. Moreover, Trump utilizes fewer longer words than Biden, both in oral (16.7% vs. 19.8%) and in speeches (17.9% vs. 26.3%). A high percentage of big words can be viewed as an indication of complex formulations (Hart, 1984), (Pennebaker, 2011). When the communication channel is the oral one, it is recommended to favor shorter words. “One finding of cognitive science is that words have the most powerful effect on our minds when they are simple. The technical term is basic level. Basic-level words tend to be short. … Basic-level words are easily remembered; those messages will be best recalled that use basic-level language.” (Lakoff & Wehling, 2012, p. 41) Both in oral and written, Trump has adopted a simple and direct rhetoric, maybe an indication of a poverty of thought. As a second hypothesis, one can conclude that this lexical choice was chosen to demonstrate that Trump can talk like the people, without complex formulations and with the objective to be easily understood by the citizens. An overall higher percentage of such complex words is achieved by Biden’s speeches (23.6%), indicating a formulation harder to be easily understood. For both candidates, tweets present the highest percentages (30.7% and 27.5%). These values could be explained by the presence of specific features such as mentions, urls, or hashtags containing usually more than six characters. 4 Simple Stylistic Fingerprints As a first analysis to identify the stylistic markers associated with the three media of communication and the two authors, one can focus on the most frequent functional words (MFFWs). As each text is a composite signal, various factors, in addition to the style, can explain the word occurrence frequencies such as the text genre, author’s background, time period, topics, audience, etc. (Savoy, 2020). Table 2. Most Frequent Functional Words Oral Written Tweets Rank Trump Biden Trump Biden Trump Biden 1 2 3 4 5 6 7 8 9 10 the I and it a you to they that we the to and I that we in it a of the and I we to you a it of they the to and of a we I in it that the to and a of of is in for you will the to and is in we a of I it -7- In Table 2, the top ten most frequent functional words per author and communication channel are depicted. In the first rank, one can recognized the definite article (the) considered as the trademark of the English language. The ranks of the personal pronouns present interesting characteristics. Usually, they appear more frequently in oral communications and Table 2 confirms this finding. For Trump, the pronoun “I” occurs as the second MFFW. Observing a high frequency for the I-words is a common stylistic marker of successful candidates in an electoral campaign (Arnold & Labbé, 2015), (Savoy, 2018a; 2018b). However, occurring in the second position indicates that Trump is egocentric (Pennebaker, 2011). The pronoun “you” arises in the sixth rank, as a clear indication that Trump tries to establish a dialogue with the audience. When comparing Trump and Biden in oral form, eight terms appear in common. Trump, however, prefers using more pronouns, and “you” and “they” do not appear in Biden’s top ranked terms. When looking at the written form, Trump uses the same four pronouns but with the “we” having a higher position. The frequent use of this pronoun indicates the author’s wish to imply the public in his statements (e.g., “we, together, will …”). For a political leader, the we-words also own the advantage of being ambiguous. What is behind a we? The future president and the cabinet, the president and the people? Moreover, the preposition “of” is more frequent than in the oral form. As for the oral form, and compared to Biden, Trump resorts more to pronouns with “you” and “they” occurring in his ten MFFWs, but not in Biden’s list. For both candidates, one can observe a high similarity between the ten words appearing in the oral and written columns. For Trump, nine words appear in both columns but not in the same ranking. Only “that” occurs in oral form but not “of”. For Biden, the same ten functional words occur in both lists. In Trump’s tweets, the frequency of personal pronouns tends to decrease. In contrast, Trump uses more verbs (is, will) producing a text more oriented towards action. One can also observe a clear distinction between the ten MFFWs occurring in Trump’s tweets and those in oral and written forms. For example, only five terms appear in common in the oral set and tweets. With Biden the stylistic markers are very similar when comparing tweets and either oral or written forms with the same three terms in the first three ranks (the, to, and). 5 Functional Words Distribution Instead of focusing on each individual feature, one can generate an overview using all functional terms to display the stylistic relationships between the three communication channels and two authors. To achieve this, the percentage of personal pronouns have been computed, subdivided by person and number. Thus, a class denoted Self2 represented by the words “I”, “me”, “myself”, “mine” and the adjective “my”. Similarly, a category You, She/He, We and They has been defined. A supplementary category containing all other pronouns (e.g., it, that, who, whose, …) has also been defined. As other feature classes, the articles (the, a, an), the prepositions, the 2 In this study, the denomination of a wordlist is capitalized and presented in italics. -8- conjunctions, and the modal and auxiliary verbs have been added. In addition, the percentage of functional words for each communication channel and author can be computed. Table 3 reports the different percentage of occurrence according to the communication channels and authors. The largest values in this table are depicted in bold and the smallest are shown in italics. From this dataset, one can apply an automatic classification scheme (Lebart et al., 1998). The first step is to compute an intertextual distance between each representation. As a simple measure, the Manhattan distance has been computed according to Equation 3. 𝐷!"#ℎ"$$"# (𝐴, 𝐵) = ∑& %'(| 𝑎% − 𝑏% | (3) In this formula, m indicates the number of components of the vector A or B and the variable ai indicates the value of the ith component (or category in our context) inside the vector A (and similarity bi for the vector B). For example and limited to the first two categories depicted in Table 3, the vector A could represent Trump in oral (A = [3.62, 2.6]) while B is associated with Trump in written (B = [2.9, 2.63]). The Manhattan distance between A and B is computed according to the following formula: 𝐷!"#ℎ"$$"# (𝐴, 𝐵) 0.75% = 0.075 = | 3.62% − 2.9% | + | 2.6% − 2.63% | = 0.72% + 0.03% = (4) In our context, m = 10 indicates the number of stylistic categories reported in Table 3 without considering the last row (percentage of functional words). This last category depicts excessively high values compared to the others and thus might have a preeminent impact when computing a distance between two representations (Savoy, 2020). When adding this last row in our example, we need to add |56.72% – 53.11%| = 3.61%, and the final distance is 0.75%+3.61% = 4.36%. As we can see, this last component dominates the distance computation and thus must be discarded. Table 3. Percentage of functional words per author and communication channel Oral Category Self You She / He We They Other pronouns Articles Prepositions Conjunctions Modal & aux. verbs Functional words Written Tweets Trump Biden Trump Biden Trump Biden 3.62% 2.60% 1.68% 2.19% 2.76% 8.48% 8.48% 10.36% 6.51% 7.85% 56.72% 3.09% 1.63% 1.50% 2.56% 1.48% 8.45% 8.45% 13.17% 5.39% 8.52% 56.80% 2.90% 2.63% 1.77% 3.29% 2.15% 6.68% 6.68% 10.18% 5.56% 6.99% 53.11% 2.35% 1.71% 1.28% 3.01% 1.25% 6.68% 6.68% 13.38% 6.32% 5.69% 52.37% 1.33% 1.39% 1.26% 1.41% 0.63% 3.42% 3.42% 11.03% 4.03% 7.13% 40.82% 2.09% 2.07% 0.93% 3.06% 0.59% 5.20% 5.20% 13.25% 5.51% 8.33% 50.01% -9- After applying this distance measure, we achieve a symmetrical matrix composed of 6 x 6 = 36 values (depicted in the Appendix), which does not represent a simple comprehensive view. To obtain a better visualization, a clustering method produces a dendrogram tree showing clusters having similar profiles. Based on genomic studies, tree-based visualization models have been suggested in which the distances between all vectors are mostly respected (Bartélémy et al.,1991), (Paradis, 2011). Figure 1 is produced according to this strategy. In this graph, each point is indicated by two uppercase letters denoting the author’s name (JB = Joe Biden) followed by the communication channel (e.g., DTtweets means Trump’s tweets). In this figure, the distance between two points is indicated by the length of the lines needed to connect them. For example, starting with the point JBoral, we follow the branch until we reach the backbone, go along the backbone and then select the line leading to JBtweets, a path representing a length of 0.075 to join these two points. In this figure, the longest distance (0.159) connects DToral and DTtweets. The smallest (0.065) can be found between DToral and DTwritten and the second smallest one between JBoral and JBwritten (0.075). However not all distances are fully respected in such a graph and some deformations are always present. To avoid partially this problem, the exact values of all distances are given in the Appendix. Figure 1. Tree-based representation of Manhattan distance between communication channels Based on these two smallest distance values, one can identify two main clusters, each composed by the oral and written communication forms of each candidate. The two tweet-based stylistic representations appear in the bottom part of Figure 1. Moreover, DTtweets depicts a clear distance from all other communication channels. In the Appendix, Table A.1 indicates that - 10 - all distance values from DTtweets are larger than 0.1 specifying a communication channel distinct from the others. 6 Advanced Stylistic Markers Instead of limiting the analysis to the functional words, one can regroup selected terms under a semantic tag. In the category Symbolism for example, Hart (1984) and Hart et al. (2013) include a list of words related to country (e.g., nation, America), ideology (e.g., democracy, freedom, rights), or generally, political concepts and institutions (e.g., law, government). A high value in this category signals a text discussing general political concepts in an abstract way. The Blame category regroups terms having a negative connotation such as angry, deceptive, wrong, etc. The Human class includes various references to citizens such as folks, people, we, he, their, etc. while the Praise class (e.g., best, good, popular, true) is used to measure verbal affirmations. As additional measurements, the Familiarity group consists of words encountered in everyday speech (e.g., as, in, this, with, …). A high score in this class reveals a more colloquial style. Based on a similar strategy, the LIWC (Linguistic Inquiry & Word Count) (Tausczik & Pennebaker, 2010) system regroups words under emotional, topical or psychological categories. In this perspective, the LIWC system defines positive emotions (Posemo) (e.g., like, hope, win) or negative (Negemo) (e.g., fear, fake, blam*). In the current study, we also measured the percentage of Family terms covering expressions related to the family members or to the family itself (e.g., family, folks, son, …). In this study, words belonging to the Achieve category (e.g., first, plan, win, …) form a dedicated category. During an election, the emotional aspect could play a key role in motivating the voters (Marcus & MacKuen, 1993), (Brader, 2005), (McDonald & Lenz, 2008). In this view, the negative emotions such as anger and fear have been present in political advertisements for a while. The effectiveness of such messages depends on three main factors (Ansolabehere & Iyengar, 1995), namely the tone, the party of the supporting candidate, and the topic. In this view, a Republican voter would be more receptive to negative tweets about crime, illegal immigration, or large government. On the other hand, a Democrat would be more convinced by ads on civil rights or jobs programs. We must mention that such deceptive and deceiving messages are usually protected by the First Amendment and therefore cannot be usually prosecuted. To analyze these emotional components, the percentage of words appearing in the category Posemo, Negemo, and Blame are reported in Table 4. To facilitate the reading of this table, the largest values are shown in bold. Clearly, the positive terms are more frequent than the negative ones. For example, in oral, Trump uses 3.33% positive words vs. 2.06% + 0.85% = 2.91% (Negemo + Blame) while for Biden the difference is more important (4.49% vs. 2.05% + 0.69% = 2.74%). When comparing the two nominees, Trump uses more emotional words (sum of the Posemo and Negemo categories). In addition, Trump utilizes more negative terms (categories Negemo and Blame) in his tweets (4.66% + 2.65%) and his oral responses to journalists (2.06% + 0.85%). - 11 - Table 4. Frequencies of some semantics-based categories Oral Written Tweets Tag Trump Biden Trump Biden Trump Biden Posemo Negemo Blame Praise Human Familiarity Symbolism Achieve Family 3.33% 2.06% 0.85% 1.66% 10.33% 20.56% 1.15% 1.96% 0.10% 2.97% 1.86% 0.35% 1.05% 8.63% 24.75% 1.46% 2.95% 0.26% 4.59% 2.05% 0.69% 2.45% 11.03% 19.93% 1.85% 2.58% 0.17% 3.94% 2.01% 0.44% 1.23% 8.88% 23.96% 2.40% 4.11% 0.69% 4.66% 2.65% 0.75% 1.83% 5.40% 18.72% 1.76% 3.62% 0.18% 3.79% 1.82% 0.37% 1.27% 7.88% 21.42% 2.25% 4.38% 0.58% Overall, and according to Hart (2020), Trump is more an intuitive person, following his feelings rather than thinking rationally. These positive emotional terms can be found in expressions such as “I love you” uttered by Trump during his meetings. Trump wants to be viewed as an ordinary person, able to understand them. Based on Table 4, Trump’s rhetoric can also be characterized by a high degree of Praise with terms such as “great”, “good,” “strong”, or “brave”. Trump likes such grandiose adjectives and adverbs. When analyzing the distribution of the Human class, one can also observe that those terms are more strongly associated with Trump than Biden. The presence of several personal pronouns in this category can explain this result. On the other hand, Biden tends to adopt a more colloquial tone with a higher frequency of Familiarity terms (words used frequently in our daily conversations). The Symbolism category is more present in Biden’s communications, with messages discussing more abstract notions (e.g., peace, justice, nation). This is also a characteristic of the 2020 election, with Biden recurrently talking about moral values (e.g., honesty, hope, generosity) and avoiding deceptive expressions. The Democratic nominee employs more often achieve terms (e.g., win, work, better, overcome) as well as words related to Family (e.g., children, folks, family). 7 Characteristic Vocabulary All political leaders tend to speak with similar words or expressions; the differences between them reside mainly in their frequencies. Thus, one can expect that some terms will regularly be employed by a given person and ignored or rarely used by another. Thus, to determine overused terms, Muller (1992) suggests analyzing the number of term occurrences between a subset (e.g., a target person) compared to the entire corpus. Such a technique was applied to reveal the rhetorical differences between past US presidents (Savoy, 2017) or to identify the particular expressions of each French president from 1958 to 2018 (Labbé et al., 2021). - 12 - Table 5. Most characteristic words of the two candidates Oral Written Tweets Rank Trump Biden Trump Biden Trump Biden 1 2 3 4 5 6 7 8 9 10 they gonna I but Rush you because not very say fact sure able number making deal significant invest idea should great right they you we know very got want good quote families lives virus communities their promise justice veterans failure RT urllink realDonaldTrump Biden MAGA complete endorsement news total fake urllink tune Donald vote chip Trump head today folks we According to the three communication channels, ten characteristic words for both candidates are depicted in Table 5. Let’s start with the traditional oral and written forms. As shown previously, Trump’s style can be characterized by a recurrent use of personal pronouns (they, I, you, we). To emphasize some of his claims, the adverbs “very”, “great”, or the adjective “good” were also stylistic fingerprints of the former US president. Trump is also a person who knows (“I know” or “you know”), reports “correct” facts and claims (“That’s right.” or “Right?”) or sentences spoken by others (“They say …”). In addition, the Republican president liked numbers, real or imagined, to justify his actions or plans. Biden’s style can be characterized by a few expressions such as “in fact”, “make sure”, “to be able”, “Number one … Number two…” and “I quote “We’re not …””. As candidate, he must also specify some of his important intents with the imperative statement “we should” (but only once “I should” in “I should nominate…”). Certainly, his first target will be the COVID-19 “virus” problem that costs so many lives, a total “failure” of the current administration. As secondary aims, Biden talks about “racial justice”, “economic justice”, in response to “a cry for justice from communities”. The targets of his future actions are also the families, and the veterans in addition to communities. The adopted style is clearly more serene, peaceful, and free from malicious or disrespectful expressions (e.g., illegal aliens or criminal aliens when talking about immigrants). With tweets, the most characteristic words are distinct from the two previous modes of communication. The specific features of this social network appear with the retweet function (RT), the presence of hyperlinks (urllink), or Twitter account names (e.g., @realDonaldTrump). As for other elections, in the set of most specific terms one can find the candidates’ names (Joe, Biden, Donald, Trump), mottos (MAGA) as well as encouragements to vote (“vote today”) or to see a video or a given webpage (“Tune in as we …”, “Head to urllink to learn …”, “Chip in to help …”). The president is also supporting Republican candidates to the Congress with the phrase “… has my complete and total endorsement!”. As in the 2016 US election, Trump continues to call the traditional media “fake news.” In tweets sent by Biden, there is a clear - 13 - intent to include the people in the proposed claims or statements. The personal pronoun “we” and the noun “folks” occur frequently in this perspective (e.g., “Folks, we are just …”). During an electoral campaign, the candidates do not speak about all pertinent political subjects, and the terms “budget”, “debt”, or “deficit” occur rarely. This 2020 election was not an exception and the word “budget” never appears. The word “debt” is related to the issue of student debt or to the expression “debt of gratitude”. When using “deficit”, Trump tweets about “reduce deficit through cuts to social security” while Biden talks about the trade deficit with China. As other missing or marginal issues in this campaign, one can count only six times the words beginning with “immigr*” or “education” in tweets sent by both candidates in September and October. The word “energy” was mainly used to qualify Biden (e.g., “low energy Joe” in tweets sent by Trump) and not to discuss the energy issue. As other examples, one can cite “liberty” appearing only twice in tweets written by Trump, never in tweets sent by Biden. 8 Conclusion During an electoral campaign, candidates hold meetings and speak on TV and radio to convince the citizens of their capabilities to govern the country. In addition to the oral and written communication forms, social networks propose a more direct channel to the general public. Due to the pandemic situation in 2020, Twitter played a more important role for both Donald Trump and Joe Biden. To identify the stylistic features of both candidates, a corpus was generated with speeches, TV debates, radio interviews, and tweets produced during the last two months of the 2020 US presidential election. When comparing the three communication channels for both nominees, one can observe that the oral form presents more repetitions and thus shows a lower TTR value (see Table 1b). This measure is higher for both the written and tweet-based channels, but the lexical density is higher for the tweets than for written messages. This finding reflects the fact that tweets are more context-free messages. In such a case, the author must name the things he is talking about. Thus, one can observe more nouns and less pronouns or conjunctions (usually indicating longer sentences). As distinctive stylistic markers, Trump employs more personal pronouns in the oral and written form (see Tables 2 and 3). In oral, the pronoun “I” appears as the second most frequent word from Trump’s mouth, a clear indication of a strong ego (Pennebaker, 2011). When considering the top ten most frequent functional words, Trump’s style is rather similar in the oral and written forms. With tweets, one can observe a clear dissimilarity with the two other forms (see Table 2 and Figure 1). This could be partially explained by a high rate of retweets (51.7%, see Table 1a). When focusing on the rhetoric, Trump uses emotional words or expressions more frequently (see Table 4). Clearly, and for both candidates, one can count more positive emotions than negative ones. With Trump however, those negative words occur more frequently in tweets. - 14 - Trump’s rhetoric is also based on praise words (e.g., “true”, “brave”) and grandiose adverbs or adjectives (e.g., “great”, “strong”, “good”). The Democratic nominee does not opt for negative terms and prefers to put forward moral values (e.g., “pace”, “honesty”, “hope”) to contrast with the negative ads and expressions of his opponent. This positive attitude adopted by Biden was a surprise because negative ads imply negative answers according to (Ansolabehere & Iyengar, 1995). When comparing this electoral campaign with the previous one (Savoy, 2018b), (Sides et al., 2018), (Hart, 2020), we observe that in both cases, the emotional component has played an important role in motivating the voters (Marcus & MacKuen, 1993), (Brader, 2005), (McDonald & Lenz, 2008). In this view, the negative emotions such as anger and fear have been present in US political advertisements for a few decades (Ansolabehere & Iyengar, 1995). In 2020, the negative tone was mainly adopted mainly by the Republicans with a focus on two main targets, the press and the media (reporting fake news) on the one hand, and on the other the expected violence of the “radical left” once in power. In 2016, Trump’s campaign targets four topics, namely the (criminals) immigrants, China (destroying US jobs), the political establishment in Washington and the press and media (supporting only the elites). Finally, we must recognize that stylistic and rhetorical analysis based on a set of wordlists is not without concerns. One can emphasize that a word could have more than one meaning and this language ambivalence is ignored by a simple count of the number of occurrences. For example, the word energy occurring in the context of “low energy Joe” (in Trump’s tweets) is not related with the energy problem. Moreover, the short context of term is disregarded by a simple word count procedure. For example, in the expression “hope, not fear, peace, not violence” (in Biden’s tweets) counting two words as positive emotional terms and two as negative is not fully correct. This tweet presents clearly a positive tone, and rejects the negative attitude. This semantic distinction is not fully understood by the computer when simply counting word occurrences. Acknowledgments This study was fully supported by Hasler Foundation (Bern, Switzerland). References Ansolabehere, S., and Iyengar, S. (1995). Going Negative. How Political Advertisements Shrink & Polarize the Electorate. The Free Press: New York. Arnold, E., and Labbé, D. (2015). Vote for me. Don’t vote for the other one. Journal of World Languages, 2(1), 32–49. Baayen, H.R. (2008). Analyzing Linguistic Data. A Practical Introduction Using R. Cambridge: Cambridge University Press. Bartélémy, J.P., and Guénoche, A. (1991). Trees and Proximity Representations. New York: John Wiley. Biber, D., Conrad, S., and Leech, G. (2002). The Longman Student Grammar of Spoken and Written English. London: Longman. - 15 - Biber, D., and Conrad, S. (2009). Register, Genre, and Style. Cambridge: Cambridge University Press. Boller, P.F. Jr. (2004). Presidential Campaigns. From George Washington to George W. Bush. Oxford: Oxford University Press. Brader, T. (2005). Striking a responsive chord: How political ads motivate and persuade voters by appealing to emotions. American Journal of Political Science, 49(2), 388–405. Carpenter R.H.and Seltzer R.V. (1970). On Nixon's Kennedy style. Speaker and Gavel, vol. 7, no 41. Covington, M.A., and McFall, J.D. (2010). Cutting the Goridian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94-100. Crystal, D. (2006). Language and the Internet. Cambridge: Cambridge University Press. Crystal, D. (2018). The Cambridge Encyclopedia of the English Language. Cambridge: Cambridge University Press. Grimmer, J., and Stewart, B.M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. Hansen K.M., and Pedersen R.T. (2008). Negative campaigning in a multiparty system. Scandinavian Political Studies, 31(4), 408–427. Hart, R.P. (1984). Verbal Style and the Presidency. A Computer-based Analysis. Orlando: Academic Press. Hart, R.P. (2020). Trump and Us. What He Says and Why People Listen. Cambridge: Cambridge University Press. Hart, R.P., Childers, J.P., and Lind, C.J. (2013). Political Tone. How Leaders Talk and Why. Chicago: The University of Chicago Press. Kubát, M., and Cech, R. (2016). Quantitative analysis of US presidential inaugural addresses. Glottometrics, 34, 14–27. Kubát, M., Macutek, J., and Cech, R. (2020). Communists spoke differently: An analysis of Czechoslovak and Czech annual presidential speeches. Digital Scholarship in the Humanities, 35(4), to appear. Labbé, D., and Monière, D. (2003). Le discours gouvernemental. Canada, Québec, France (1945-2000). Paris: Honoré Champion. Labbé, D., and Monière, D. (2008). Les mots qui nous gouvernent. Le discours des premiers ministres québécois: 1960-2005. Montréal: Monière-Wollank. Labbé, D., and Monière, D. (2013). La campagne présidentielle de 2012. Votez pour moi! Paris: L’Harmattan. Labbé, D., and Savoy, J. (2021). Stylistic analysis of the French presidential speeches: Is Macron really different? Digital Scholarship in the Humanities, to appear. Lakoff, G., and Wehling, E. (2012). The Little Blue Book: The Essential Guide to Thinking and Talking Democratic. New York: Free Press. Laver, M., Benoit, K., and Garry, J. (2003). Extracting policy positions from political texts using words as data. American Political Science Review, 97(2), 311–331. - 16 - Marcus, G.E., MacKuen M.B. (1993). Anxiety, enthusiasm, and the vote: The emotional underpinning of learning and involvement during presidential campaigns. American Political Science Review, 87(3), 672–685. Mayaffe, D. (2004). Le Discours Présidentiel Sous la Ve République. Paris: Presses de la Fondation Nationale des Sciences Politiques. McDonald Ladd J., and Lenz, G.S. (2008). Reassessing the role of anxiety in vote choice. Political Psychology, 29(2), 275–296. Muller, C. (1992). Principes et Méthodes de Statistique Lexicale. Paris: Honoré Champion. O’Connor, B., Balasubramanyan, R., Routledge, B.R., and Smith, N.A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. Proceedings 4th International AAAI Conference on Weblogs and Social Media, 122–129. Paradis, E. (2011). Analysis of Phylogenetics and Evolution with R. New York: Springer. Pauli, F., and Tuzzi, A. (2009). The end of year addresses of the Presidents of the Italian Republic (1948–2006): Discourse similarities and differences. Glottometrics, 18, 40–51. Pennebaker, J.W. (2011). The Secret Life of Pronouns. What our Words Say About us. New York: Bloomsbury Press. Raubach, E.E. (2019). Does political ideology influence how politicians speak? A linguistic analysis of one-minute speeches from the US House of Representatives of the 115th Congress. Document numérique, 22(1-2), 127–138. Rogers, K. (2020). The State of the Union is Trump’s biggest speech. Who writes it? The New York Times, Feb. 3rd. Rule, A., Cointet, J.-P., and Bearman, P.S. (2015). Lexical shifts, substantive changes, and continuity in State of the Union discourse. 1790–2014. In: Proceedings of the National Academy of Sciences, 112, 35, 1-8. Savoy, J. (2015). Text clustering: An application with the State of the Union addresses. Journal of the American Society for Information Science & Technology, 66(8), 1645–1654. Savoy, J. (2017). Analysis of the style and the rhetoric of the American presidents over two centuries. Glottometrics, 38(1), 55–76. Savoy, J. (2018a). Analysis of the style and the rhetoric of the 2016 US presidential primaries. Digital Scholarship in the Humanities, 33(1), 143–159. Savoy, J. (2018b). Trump and Clinton's Style and Rhetoric during the 2016 Presidential Election. Journal of Quantitative Linguistics, 25(2), 168-189. Savoy, J. (2020). Machine Learning Methods for Stylometry: Authorship Attribution and Author Profiling. Cham: Springer. Sides J., Tesler M., and Vavreck L. (2018). Identity crisis. The 2016 presidential campaign and the battle for the meaning of America. Princeton University Press: Princeton. Slatcher, R.B., Chung, C.K., Pennebaker, J.W., and Stone, L.D. (2007). Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates. Journal of Research in Personality, 41(1), 63–75. - 17 - Sylwester, K., and Purver, M. (2015). Twitter language use to reflects psychological differences between Democrats and Republicans. PLoS One, 10(9). Tausczik, Y.R., and Pennebaker, J.W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. Young, L., and Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29, 205–231. Yu, B. (2008). Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1), 33–48. Yu, B. (2013). Language and gender in congressional speech. Literary and Linguistic Computing, 29(1), 118–132. Appendix Table A.1. Manhattan distance between the different representations Manhattan DToral JBoral DTwritten JBwritten DTtweets JBtweets DToral 0.0 0.086 0.065 0.127 0.159 0.139 JBoral 0.086 0.0 0.101 0.075 0.149 0.081 DTwritten 0.065 0.101 0.0 0.092 0.125 0.102 - 18 - JBwritten 0.127 0.075 0.092 0.0 0.137 0.078 DTtweets 0.159 0.149 0.125 0.137 0.0 0.104 JBtweets 0.139 0.081 0.102 0.078 0.104 0.0

Trump vs Biden 2020 Election Stylometric Analysis

Related documents

Products

Support

Trump vs Biden 2020 Election Stylometric Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib