Uploaded by Hồng Hạnh Lê

USElection2020DSH

advertisement
Trump’s and Biden’s Styles during the
2020 US Presidential Election
Jacques Savoy[0000−0002−4486−0067], Marylène Wehren
To appear in Digital Scholarship in the Humanities (2021-2022)
University of Neuchatel
rue Emile Argand 11
2000 Neuchatel, Switzerland
{Jacques.Savoy, Marylene.Wehren}@unine.ch
Abstract
This study analyzes the stylistic and rhetorical characteristics of Donald Trump and Joe Biden
during the 2020 US presidential election. As communication channels, the oral (TV debates and
radio interviews), written speeches, and tweets sent during the last two months have been
examined. As stylometric markers, the most frequent functional words indicate that Trump
employs more diverse personal pronouns more frequently than Biden. For the Democratic
nominee, the first-person pronouns (I and we) are mainly used with, compared to Trump, a higher
rate of prepositions, signaling the use of a larger number of nouns. When comparing the three
media of communication, this study shows that the oral form is closer to the written one and that
tweet-based communication presents a distinct type compared to the other two. Usually, written
texts tend to contain more functional words than either the oral form or tweets. When comparing
the two nominees, the frequency of familiar words (e.g., the, to, of, a, etc.) tends to be higher in
Biden’s addresses, while Trump opts more often for grandiose terms (e.g., great, just, strong,
etc.). As rhetorical features, the Republican president uses more emotional words, both positive
and negative. The Democratic candidate does not propound negative messages and opts for
symbolic (e.g., nation, country) and achievement terms (e.g., plan, win, work) with numerous
references to people and family (e.g., folks, Americans, son, etc.).
Keywords: Stylometry; Stylistic measurement; Political stylometric analysis.
1
Introduction
This study is focused on the 2020 US presidential campaigns led by Donald Trump and Joe
Biden. During such an electoral campaign, each political leader must convince his electoral base
of his leadership and demonstrate his strength and ability to govern the country. For Trump,
these functions were mainly fulfilled by social networks and particularly by Twitter rather than
by traditional speeches. Moreover, during this presidential election, the corona virus pandemic
reinforced the relevance of social networks due to the complexities in organizing political meetings.
-1-
In this context, our main objective is to determine the stylistic characteristics of the two main
candidates. As each nominee can utter a speech during a meeting, provide answers during
interviews or TV debates, and send tweets, this study will analyze the style occurring in the oral,
written, and tweet-based communication channels. Thus, the main objective is to identify
stylistic markers specific to a given political leader and to a particular communication channel.
Going a step further, this analysis will identify rhetorical features specific to each candidate. In
this view, rhetoric is defined as the art of effective and persuasive speaking, the way to motivate
an audience, while language style is present as pervasive and frequent forms used by an author
for mainly aesthetic reasons (Biber & Conrad, 2009).
Even if Trump’s or Biden’s speeches are examined, we know that behind each well-known
politician there is usually a speechwriter (or a team of ghostwriters). For example, behind
Kennedy one can find the name of Sorensen (Carpenter & Seltzer, 1970), Favreau behind
Obama, and even Madison & Hamilton behind some speeches delivered by Washington. But as
mentioned by Sorensen:
“If a man in a high office speaks words which convey his principles and policies and
ideas and he’s willing to stand behind them and take whatever blame or therefore credit
go with them, [the speech is] his”.
More precisely, the White House spokesman Hogan Gidley said:
“The president is a best-selling author and deeply gifted orator who packs arenas and has
a meticulous and carefully honed method for writing his speeches, whether it be at a rally,
a manufacturing plant opening or the State of the Union. What the American people hear
is 100 percent President Trump’s own words”. (Rogers, 2020)
To analyze both candidates’ styles, the rest of this article is subdivided as follows. Section 2
exposes some related work while Section 3 presents the corpora used in this study as well as
some overview statistics. Simple stylistic markers are analyzed in Section 4, and the next one
discusses the distribution of functional words over the three communication channels, namely
the oral, written, and tweet-based. Section 6 presents a more advanced stylistic and rhetoric
analysis of the electoral speeches and tweets. The characteristic vocabulary associated with the
two candidates is exposed in Section 7. Finally, a conclusion reports the main findings of this
study.
2
State of the Art
Freely available, easy to understand, and having an important impact, political texts have been
studied according to different perspectives. The most significant are governmental speeches
focusing on a given president or prime minister in power (Mayaffe, 2004), (Labbé et al., 2021),
(e.g., Speeches from the Throne (Canada and Quebec) and general policy statements of French
governments in Labbé & Monière (2003; 2008), State of the Union (Rule et al., 2015), (Savoy,
2015) or US inaugural addresses (Kubát & Cech, 2016)). In other countries, the presidential
function is limited to the head of the state and their messages, though viewed as less important,
-2-
can still present interesting stylometric applications (Pauli & Tuzzi, 2009), (Kubát et al., 2020).
As a second less studied category, one can focus on electoral speeches (Arnold & Labbé, 2015),
(Savoy, 2018a) as well as press releases during a presidential campaign (Labbé & Monière,
2013).
When studying governmental speeches over decades, the constitutional institutions tend to
smooth out the differences between political parties when exercising power. Stylistic and rhetorical variations between presidents or prime ministers could be mainly explained by their temporal differences. The arrival of a strong leader as well as exceptional events (e.g., worldwide
war, deep economic depression) could however reveal a real vocabulary and stylistic change
(Labbé & Monière, 2003), (Savoy, 2015), (Kubát & Cech, 2016).
As a third source of political speeches, discussions in parliament have been studied according
to several perspectives. For example, Laver et al. (2003) or Grimmer & Stewart (2013) describe
a methodology to extract topical and political positions from texts. Yu (2008) demonstrates that
machine learning methods (e.g., SVM and naïve Bayes) can be trained to classify congressional
speeches according to political parties. In a following analysis, Yu (2013) shows that author
gender can be determined and that female political figures figures tend to use emotional words
and personal pronouns more frequently than men. Based on tweets, the differentiation between
political parties can also be observed (Sylwester & Purver, 2015). Such differences are correlated with psychological factors, with positive emotional terms occurring more frequently in
Democrats’ tweets as well as swear expressions (e.g., alien, asshole, hell, etc.), or first singular
person pronouns (e.g., I, me). For Raubach (2019), positive emotion words should be associated
to the party in power and not simply attached to a given party.
Stylistic fingerprints defined by the frequency of functional words (articles, pronouns, prepositions, conjunctions, and auxiliary verbs) have been proposed to discriminate between political
leaders (Savoy, 2020). However, sentiments and emotions tend also to play an essential role
nowadays in rhetoric and style. To quantify this aspect, O’Connor et al. (2010) or Young &
Soroka (2012) suggest counting the frequency of words appearing in a dictionary of positive or
negative emotional terms.
In a similar way, Hart (1984) has designed and implemented a political text analyzer called
DICTION that generalizes the idea of representing emotions, or more generally concepts, by defining lists of terms. In a first book, Hart (1984) exposes the rhetoric and stylistic variations
between the US presidents from Truman to Reagan, while a follow-up study (Hart et al., 2013)
exposes the stylistic variations from G.W. Bush to Obama. Recently, Hart (2020) analyzed
Trump’s rhetoric and concluded that Trump is the president of the extremes presenting either a
high or low level depending on the target emotion or rhetorical concept.
As another example, LIWC (Linguistic Inquiry and Word Count) (Tausczik & Pennebaker,
2010) regroups different categories used to evaluate the author’s psychological status (e.g., feminine, emotional, leadership), as well as her/his style (e.g., mainly based on personal pronouns
(Pennebaker, 2011)). The underlying hypothesis is to assume that the words serve as guides to
the way the author thinks, acts or feels. Using this system, Slatcher et al. (2007) were able to
-3-
determine the personalities of different political candidates (2004 US presidential election).
They defined the psychological portrait both on single measurements (e.g., the relative frequency
of pronouns, social words, etc.) and using a set of composite indices reflecting the cognitive
complexity, presidentiality or honesty of each candidate. These personality measurements were
in agreement with different opinion polls. For example, G.W. Bush uses the pronoun I, positive
emotion words (e.g., happy, truly, win), and future tense more frequently. The public perceives
Kerry as a kind of depressed person, serious, somber, and cold, uttering negative emotion expressions (e.g., sad, worthless, lost) and physical words (e.g., head, ache, sleep) more frequently.
In brief, previous studies have mainly analyzed governmental speeches, and less frequently
the electoral speeches (Boller, 2004). A few studies focus on the legislative level (e.g., the Congress) and these studies are based on the written form. Other communication channels are more
difficult to obtain (e.g., transcripts of TV debates or interviews, webpages available on Facebook,
blogs, or tweets). The current study focuses together precisely on these three forms, namely the
written, oral, and web-based channels during a recent electoral campaign.
3
Corpora
To accurately analyze the style adopted by the two main candidates of the 2020 US presidential
election, some general background information is required. This campaign was marked by the
corona virus pandemic. Therefore, the number of electoral meetings was rather limited and social networks have played a more important role.
As other significant events during this campaign, one can mention that a state of emergency
for Covid-19 was declared on March 13th. After B. Sanders dropped out (April 16th), Joe Biden
was the only candidate for the Democratic party. The death of G. Floyd (May 25th) highlighted
the racial question, generating demonstrations and sometimes violent riots. In August, the two
national conventions took place (Democratic, Aug. 17th-20th; Republican, Aug. 24th-27th). The
death of the Supreme Court judge R. Bader Ginsburg (September 18th) allowed the nomination
of the conservative judge A. Coney Barrett. On October 1st, president Donald Trump announced
that he was infected by the corona virus.
In this context and to convince the voters, the candidates can rely on traditional meetings as
well as on TV debates and interviews. The first form corresponds to a written communication
while the second is an oral one. One can consider that speeches delivered by the nominees
correspond to an oral communication form, while (written) messages must be categorized as a
distinct text genre. However, as mentioned by Biber & Conrad (2009, p. 262)
“Language that has its source in writing but performed in speech does not necessarily
follow the generalization (written vs. oral). That is, a person reading a written text aloud
will produce speech that has the linguistic characteristics of the written text. Similarly,
written texts can be memorized and then spoken”.
When comparing these two communication forms, the written one is more precise and
permanent while the oral form is more spontaneous, direct, and usually less formal. Usually a
-4-
(written) message presents a more complex sentence construction including longer words.
Looser structure and filler phrases (e.g., uh, um) occur more frequently in oral productions, as
well as several repetitions of the same expressions (Crystal, 2018).
To these two forms, we also study the social networks and particularly Twitter. It was
recognized that such web-based communication channels might be viewed as new forms
between the classical oral and written usage (Crystal, 2006). This corpus is generated from the
same period, with discourses produced to achieve the same aim and covering similar topics.
Thus, several factors affecting the style are kept constant such as the time period, the subjects,
and the objective.
To generate our corpus, all tweets sent by Trump (Twitter account @realDonaldTrump) have
been downloaded from a dedicated website1. This corpus runs from September 1st up to
November 3rd, 2020 and contains 2,278 tweets. For Biden, an API has been implemented to
directly extract his 871 tweets from the Twitter server.
Some overall statistics are depicted in Table 1a, while Table 1b shows an overview of the three
communication forms. As speeches and interviews, the acceptance speeches uttered during the
national convention is the first one in our collection. The transcripts of the two TV debates have
been included in the oral corpus together with a few radio interviews.
Table 1a. Some statistics about Donald Trump’s and Joe Biden’s Twitter accounts
Name
Number
Day
Mean
Token
Trump
Biden
2,278
871
43.4
13.6
23.7
33.3
Uppercase
Words
RT
Per 100 tweets
@
#
URL
5.36%
0.54%
51.7
3.1
87.0
13.9
62.3
81.4
16.0
4.1
Table 1b. Some statistics according to authors and communication channels
Form
Token
Vocabulary size
TTR
Lexical Density
Percent. Big Words
Oral Trump
Written Trump
Tweets Trump
Oral Biden
Written Biden
Tweets Biden
37,759
63,731
54,455
53,016
32,286
25,185
2,569
3,816
6,308
3,695
3,463
2,882
0.331
0.367
0.481
0.360
0.411
0.405
43.1%
46.4%
58.5%
43.1%
47.7%
49.9%
16.7%
17.9%
30.7%
19.8%
23.6%
27.5%
The data shown in Table 1a signals that Trump sent more tweets per day than Biden (43.4 vs.
13.6). Trump’s tweets are, in mean, shorter (23.7 tokens vs. 33.3), containing more mentions
(e.g., @WhiteHouse, @FoxNews), and hashtags (e.g., #MAGA, #Vote). Trump retweets
significantly more (51.7 per 100 tweets) than Biden. Those repeated tweets usually contain a
single URL presenting a video. Another stylistic marker strongly associated with Trump is the
1
See www.trumptwitterarchive.com/archive.
-5-
presence of words in uppercase letters (e.g., AMERICA, GREAT, VOTE). As mentioned in
Table 1a, 5.36% of tweeted words longer than three letters belong to this category (in this count,
the shorter words such as “I”, “US”, “FBI”, etc. have been ignored). Biden’s tweets comprise
more URLs, usually to provide videos or webpages supporting his claims.
To differentiate the latent characteristics of the three communication channels, one can
measure the vocabulary richness by computing the type-token ratio (TTR) (Baayen, 2008). The
precise definition is described in Equation 1 where, for a given text t, the number of distinct
word-types is denoted by Voc(t) and its length (number of tokens) is returned by the function
Token(t).
TTR(𝑡) =
𝑉𝑜𝑐(𝑡)
/𝑇𝑜𝑘𝑒𝑛(𝑡)
(1)
High values indicate the presence of a rich vocabulary showing that the underlying text
exposes many different topics or that the author writes on a few themes from several angles with
different expressions and formulations. On the other hand, a small TTR value signifies that the
vocabulary used by the author is limited or that the words and expressions are repeated.
TTR values are however sensitive to the text length and as the length increases, the resulting
TTR decreases (Baayen, 2008). To avoid this problem, the values reported in Table 1b
correspond to the mean over a sample of TTRs computed after each segment of 1,000 tokens
(Covington & McFall, 2010). The values in Table 1b indicate that the oral form presents the
smallest TTR values due to a higher rate of repetition. Higher TTR values are shown with the
written communications. One can also observe that the difference between oral and written is
larger for Biden (0.360 vs. 0.411) than for Trump (0.331 vs. 0.367), an indication that Trump is
focusing mainly on a few subjects. When considering the tweets, the TTR scores are high, and
for Trump clearly higher than his written form. Thus, based only on the TTR values, scripting
in Twitter seems closer to the written form than to the oral one.
As a second overall measurement, the lexical density (LD) (Biber et al., 2002) has been
computed as a mean over a sequence of 1,000 tokens. For each bucket of 1,000 tokens, we
applied Equation 2 in which Function words(t) indicates the number of functional words in t,
and Lexical words(t) the number of lexical words in t. This latter set is composed of nouns,
names, adjectives, verbs, and adverbs. On the other hand, functional words regroup all other
grammatical categories.
LD(𝑡) =
𝐿𝑒𝑥𝑖𝑐𝑎𝑙 𝑤𝑜𝑟𝑑𝑠(𝑡)
𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑤𝑜𝑟𝑑𝑠(𝑡)
/𝑇𝑜𝑘𝑒𝑛(𝑡) = 1 - =
/𝑇𝑜𝑘𝑒𝑛(𝑡)@
(2)
A relatively high LD percentage indicates a more complex text, containing more information.
As depicted in the last but one column of Table 1b, the written communication medium shows
higher values than the oral one. It is interesting to note that tweets present the highest LD
percentages. This finding can be explained by the fact that each tweet is an independent unit
centered on a specific subject that must be clearly identified. Thus, pronouns and determiners
occur less frequently (e.g., “Law and order”, “Kat has my Complete and Total Endorsement!”).
-6-
As depicted in the last column of Table 1b, the percentage of big words (composed of six
letters or more) is higher in written messages than in oral production. Moreover, Trump utilizes
fewer longer words than Biden, both in oral (16.7% vs. 19.8%) and in speeches (17.9% vs.
26.3%). A high percentage of big words can be viewed as an indication of complex formulations
(Hart, 1984), (Pennebaker, 2011). When the communication channel is the oral one, it is
recommended to favor shorter words.
“One finding of cognitive science is that words have the most powerful effect on our
minds when they are simple. The technical term is basic level. Basic-level words tend
to be short. … Basic-level words are easily remembered; those messages will be best
recalled that use basic-level language.” (Lakoff & Wehling, 2012, p. 41)
Both in oral and written, Trump has adopted a simple and direct rhetoric, maybe an indication
of a poverty of thought. As a second hypothesis, one can conclude that this lexical choice was
chosen to demonstrate that Trump can talk like the people, without complex formulations and
with the objective to be easily understood by the citizens.
An overall higher percentage of such complex words is achieved by Biden’s speeches (23.6%),
indicating a formulation harder to be easily understood. For both candidates, tweets present the
highest percentages (30.7% and 27.5%). These values could be explained by the presence of
specific features such as mentions, urls, or hashtags containing usually more than six characters.
4
Simple Stylistic Fingerprints
As a first analysis to identify the stylistic markers associated with the three media of communication and the two authors, one can focus on the most frequent functional words (MFFWs). As
each text is a composite signal, various factors, in addition to the style, can explain the word
occurrence frequencies such as the text genre, author’s background, time period, topics, audience, etc. (Savoy, 2020).
Table 2. Most Frequent Functional Words
Oral
Written
Tweets
Rank
Trump
Biden
Trump
Biden
Trump
Biden
1
2
3
4
5
6
7
8
9
10
the
I
and
it
a
you
to
they
that
we
the
to
and
I
that
we
in
it
a
of
the
and
I
we
to
you
a
it
of
they
the
to
and
of
a
we
I
in
it
that
the
to
and
a
of
of
is
in
for
you
will
the
to
and
is
in
we
a
of
I
it
-7-
In Table 2, the top ten most frequent functional words per author and communication channel
are depicted. In the first rank, one can recognized the definite article (the) considered as the
trademark of the English language. The ranks of the personal pronouns present interesting characteristics. Usually, they appear more frequently in oral communications and Table 2 confirms
this finding. For Trump, the pronoun “I” occurs as the second MFFW. Observing a high frequency for the I-words is a common stylistic marker of successful candidates in an electoral
campaign (Arnold & Labbé, 2015), (Savoy, 2018a; 2018b). However, occurring in the second
position indicates that Trump is egocentric (Pennebaker, 2011). The pronoun “you” arises in the
sixth rank, as a clear indication that Trump tries to establish a dialogue with the audience. When
comparing Trump and Biden in oral form, eight terms appear in common. Trump, however,
prefers using more pronouns, and “you” and “they” do not appear in Biden’s top ranked terms.
When looking at the written form, Trump uses the same four pronouns but with the “we”
having a higher position. The frequent use of this pronoun indicates the author’s wish to imply
the public in his statements (e.g., “we, together, will …”). For a political leader, the we-words
also own the advantage of being ambiguous. What is behind a we? The future president and the
cabinet, the president and the people? Moreover, the preposition “of” is more frequent than in
the oral form. As for the oral form, and compared to Biden, Trump resorts more to pronouns
with “you” and “they” occurring in his ten MFFWs, but not in Biden’s list.
For both candidates, one can observe a high similarity between the ten words appearing in the
oral and written columns. For Trump, nine words appear in both columns but not in the same
ranking. Only “that” occurs in oral form but not “of”. For Biden, the same ten functional words
occur in both lists.
In Trump’s tweets, the frequency of personal pronouns tends to decrease. In contrast, Trump
uses more verbs (is, will) producing a text more oriented towards action. One can also observe
a clear distinction between the ten MFFWs occurring in Trump’s tweets and those in oral and
written forms. For example, only five terms appear in common in the oral set and tweets. With
Biden the stylistic markers are very similar when comparing tweets and either oral or written
forms with the same three terms in the first three ranks (the, to, and).
5
Functional Words Distribution
Instead of focusing on each individual feature, one can generate an overview using all functional
terms to display the stylistic relationships between the three communication channels and two
authors. To achieve this, the percentage of personal pronouns have been computed, subdivided
by person and number. Thus, a class denoted Self2 represented by the words “I”, “me”, “myself”,
“mine” and the adjective “my”. Similarly, a category You, She/He, We and They has been defined. A supplementary category containing all other pronouns (e.g., it, that, who, whose, …)
has also been defined. As other feature classes, the articles (the, a, an), the prepositions, the
2
In this study, the denomination of a wordlist is capitalized and presented in italics.
-8-
conjunctions, and the modal and auxiliary verbs have been added. In addition, the percentage of
functional words for each communication channel and author can be computed.
Table 3 reports the different percentage of occurrence according to the communication channels and authors. The largest values in this table are depicted in bold and the smallest are shown
in italics. From this dataset, one can apply an automatic classification scheme (Lebart et al.,
1998). The first step is to compute an intertextual distance between each representation. As a
simple measure, the Manhattan distance has been computed according to Equation 3.
𝐷!"#ℎ"$$"# (𝐴, 𝐵) = ∑&
%'(| 𝑎% − 𝑏% |
(3)
In this formula, m indicates the number of components of the vector A or B and the variable
ai indicates the value of the ith component (or category in our context) inside the vector A (and
similarity bi for the vector B). For example and limited to the first two categories depicted in
Table 3, the vector A could represent Trump in oral (A = [3.62, 2.6]) while B is associated with
Trump in written (B = [2.9, 2.63]). The Manhattan distance between A and B is computed according to the following formula:
𝐷!"#ℎ"$$"# (𝐴, 𝐵)
0.75% = 0.075
=
| 3.62% − 2.9% | + | 2.6% − 2.63% | = 0.72% + 0.03% =
(4)
In our context, m = 10 indicates the number of stylistic categories reported in Table 3 without
considering the last row (percentage of functional words). This last category depicts excessively
high values compared to the others and thus might have a preeminent impact when computing a
distance between two representations (Savoy, 2020). When adding this last row in our example,
we need to add |56.72% – 53.11%| = 3.61%, and the final distance is 0.75%+3.61% = 4.36%.
As we can see, this last component dominates the distance computation and thus must be discarded.
Table 3. Percentage of functional words per author and communication channel
Oral
Category
Self
You
She / He
We
They
Other pronouns
Articles
Prepositions
Conjunctions
Modal & aux. verbs
Functional words
Written
Tweets
Trump
Biden
Trump
Biden
Trump
Biden
3.62%
2.60%
1.68%
2.19%
2.76%
8.48%
8.48%
10.36%
6.51%
7.85%
56.72%
3.09%
1.63%
1.50%
2.56%
1.48%
8.45%
8.45%
13.17%
5.39%
8.52%
56.80%
2.90%
2.63%
1.77%
3.29%
2.15%
6.68%
6.68%
10.18%
5.56%
6.99%
53.11%
2.35%
1.71%
1.28%
3.01%
1.25%
6.68%
6.68%
13.38%
6.32%
5.69%
52.37%
1.33%
1.39%
1.26%
1.41%
0.63%
3.42%
3.42%
11.03%
4.03%
7.13%
40.82%
2.09%
2.07%
0.93%
3.06%
0.59%
5.20%
5.20%
13.25%
5.51%
8.33%
50.01%
-9-
After applying this distance measure, we achieve a symmetrical matrix composed of 6 x 6
= 36 values (depicted in the Appendix), which does not represent a simple comprehensive view.
To obtain a better visualization, a clustering method produces a dendrogram tree showing clusters having similar profiles. Based on genomic studies, tree-based visualization models have
been suggested in which the distances between all vectors are mostly respected (Bartélémy et
al.,1991), (Paradis, 2011). Figure 1 is produced according to this strategy. In this graph, each
point is indicated by two uppercase letters denoting the author’s name (JB = Joe Biden) followed
by the communication channel (e.g., DTtweets means Trump’s tweets).
In this figure, the distance between two points is indicated by the length of the lines needed to
connect them. For example, starting with the point JBoral, we follow the branch until we reach
the backbone, go along the backbone and then select the line leading to JBtweets, a path representing a length of 0.075 to join these two points. In this figure, the longest distance (0.159)
connects DToral and DTtweets. The smallest (0.065) can be found between DToral and DTwritten and the second smallest one between JBoral and JBwritten (0.075). However not all distances are fully respected in such a graph and some deformations are always present. To avoid
partially this problem, the exact values of all distances are given in the Appendix.
Figure 1. Tree-based representation of Manhattan distance between
communication channels
Based on these two smallest distance values, one can identify two main clusters, each composed by the oral and written communication forms of each candidate. The two tweet-based
stylistic representations appear in the bottom part of Figure 1. Moreover, DTtweets depicts a
clear distance from all other communication channels. In the Appendix, Table A.1 indicates that
- 10 -
all distance values from DTtweets are larger than 0.1 specifying a communication channel distinct from the others.
6
Advanced Stylistic Markers
Instead of limiting the analysis to the functional words, one can regroup selected terms under a
semantic tag. In the category Symbolism for example, Hart (1984) and Hart et al. (2013) include
a list of words related to country (e.g., nation, America), ideology (e.g., democracy, freedom,
rights), or generally, political concepts and institutions (e.g., law, government). A high value in
this category signals a text discussing general political concepts in an abstract way. The Blame
category regroups terms having a negative connotation such as angry, deceptive, wrong, etc. The
Human class includes various references to citizens such as folks, people, we, he, their, etc. while
the Praise class (e.g., best, good, popular, true) is used to measure verbal affirmations. As additional measurements, the Familiarity group consists of words encountered in everyday speech
(e.g., as, in, this, with, …). A high score in this class reveals a more colloquial style.
Based on a similar strategy, the LIWC (Linguistic Inquiry & Word Count) (Tausczik &
Pennebaker, 2010) system regroups words under emotional, topical or psychological categories.
In this perspective, the LIWC system defines positive emotions (Posemo) (e.g., like, hope, win)
or negative (Negemo) (e.g., fear, fake, blam*). In the current study, we also measured the
percentage of Family terms covering expressions related to the family members or to the family
itself (e.g., family, folks, son, …). In this study, words belonging to the Achieve category (e.g.,
first, plan, win, …) form a dedicated category.
During an election, the emotional aspect could play a key role in motivating the voters (Marcus
& MacKuen, 1993), (Brader, 2005), (McDonald & Lenz, 2008). In this view, the negative
emotions such as anger and fear have been present in political advertisements for a while. The
effectiveness of such messages depends on three main factors (Ansolabehere & Iyengar, 1995),
namely the tone, the party of the supporting candidate, and the topic. In this view, a Republican
voter would be more receptive to negative tweets about crime, illegal immigration, or large
government. On the other hand, a Democrat would be more convinced by ads on civil rights or
jobs programs. We must mention that such deceptive and deceiving messages are usually
protected by the First Amendment and therefore cannot be usually prosecuted.
To analyze these emotional components, the percentage of words appearing in the category
Posemo, Negemo, and Blame are reported in Table 4. To facilitate the reading of this table, the
largest values are shown in bold. Clearly, the positive terms are more frequent than the negative
ones. For example, in oral, Trump uses 3.33% positive words vs. 2.06% + 0.85% = 2.91%
(Negemo + Blame) while for Biden the difference is more important (4.49% vs. 2.05% + 0.69%
= 2.74%).
When comparing the two nominees, Trump uses more emotional words (sum of the Posemo
and Negemo categories). In addition, Trump utilizes more negative terms (categories Negemo
and Blame) in his tweets (4.66% + 2.65%) and his oral responses to journalists (2.06% + 0.85%).
- 11 -
Table 4. Frequencies of some semantics-based categories
Oral
Written
Tweets
Tag
Trump
Biden
Trump
Biden
Trump
Biden
Posemo
Negemo
Blame
Praise
Human
Familiarity
Symbolism
Achieve
Family
3.33%
2.06%
0.85%
1.66%
10.33%
20.56%
1.15%
1.96%
0.10%
2.97%
1.86%
0.35%
1.05%
8.63%
24.75%
1.46%
2.95%
0.26%
4.59%
2.05%
0.69%
2.45%
11.03%
19.93%
1.85%
2.58%
0.17%
3.94%
2.01%
0.44%
1.23%
8.88%
23.96%
2.40%
4.11%
0.69%
4.66%
2.65%
0.75%
1.83%
5.40%
18.72%
1.76%
3.62%
0.18%
3.79%
1.82%
0.37%
1.27%
7.88%
21.42%
2.25%
4.38%
0.58%
Overall, and according to Hart (2020), Trump is more an intuitive person, following his
feelings rather than thinking rationally. These positive emotional terms can be found in
expressions such as “I love you” uttered by Trump during his meetings. Trump wants to be
viewed as an ordinary person, able to understand them.
Based on Table 4, Trump’s rhetoric can also be characterized by a high degree of Praise with
terms such as “great”, “good,” “strong”, or “brave”. Trump likes such grandiose adjectives and
adverbs. When analyzing the distribution of the Human class, one can also observe that those
terms are more strongly associated with Trump than Biden. The presence of several personal
pronouns in this category can explain this result.
On the other hand, Biden tends to adopt a more colloquial tone with a higher frequency of
Familiarity terms (words used frequently in our daily conversations). The Symbolism category
is more present in Biden’s communications, with messages discussing more abstract notions
(e.g., peace, justice, nation). This is also a characteristic of the 2020 election, with Biden
recurrently talking about moral values (e.g., honesty, hope, generosity) and avoiding deceptive
expressions. The Democratic nominee employs more often achieve terms (e.g., win, work,
better, overcome) as well as words related to Family (e.g., children, folks, family).
7
Characteristic Vocabulary
All political leaders tend to speak with similar words or expressions; the differences between
them reside mainly in their frequencies. Thus, one can expect that some terms will regularly be
employed by a given person and ignored or rarely used by another. Thus, to determine overused
terms, Muller (1992) suggests analyzing the number of term occurrences between a subset (e.g.,
a target person) compared to the entire corpus. Such a technique was applied to reveal the rhetorical differences between past US presidents (Savoy, 2017) or to identify the particular expressions of each French president from 1958 to 2018 (Labbé et al., 2021).
- 12 -
Table 5. Most characteristic words of the two candidates
Oral
Written
Tweets
Rank
Trump
Biden
Trump
Biden
Trump
Biden
1
2
3
4
5
6
7
8
9
10
they
gonna
I
but
Rush
you
because
not
very
say
fact
sure
able
number
making
deal
significant
invest
idea
should
great
right
they
you
we
know
very
got
want
good
quote
families
lives
virus
communities
their
promise
justice
veterans
failure
RT
urllink
realDonaldTrump
Biden
MAGA
complete
endorsement
news
total
fake
urllink
tune
Donald
vote
chip
Trump
head
today
folks
we
According to the three communication channels, ten characteristic words for both candidates
are depicted in Table 5. Let’s start with the traditional oral and written forms. As shown
previously, Trump’s style can be characterized by a recurrent use of personal pronouns (they, I,
you, we). To emphasize some of his claims, the adverbs “very”, “great”, or the adjective “good”
were also stylistic fingerprints of the former US president. Trump is also a person who knows
(“I know” or “you know”), reports “correct” facts and claims (“That’s right.” or “Right?”) or
sentences spoken by others (“They say …”). In addition, the Republican president liked
numbers, real or imagined, to justify his actions or plans.
Biden’s style can be characterized by a few expressions such as “in fact”, “make sure”, “to be
able”, “Number one … Number two…” and “I quote “We’re not …””. As candidate, he must
also specify some of his important intents with the imperative statement “we should” (but only
once “I should” in “I should nominate…”). Certainly, his first target will be the COVID-19
“virus” problem that costs so many lives, a total “failure” of the current administration. As
secondary aims, Biden talks about “racial justice”, “economic justice”, in response to “a cry for
justice from communities”. The targets of his future actions are also the families, and the
veterans in addition to communities. The adopted style is clearly more serene, peaceful, and free
from malicious or disrespectful expressions (e.g., illegal aliens or criminal aliens when talking
about immigrants).
With tweets, the most characteristic words are distinct from the two previous modes of
communication. The specific features of this social network appear with the retweet function
(RT), the presence of hyperlinks (urllink), or Twitter account names (e.g., @realDonaldTrump).
As for other elections, in the set of most specific terms one can find the candidates’ names (Joe,
Biden, Donald, Trump), mottos (MAGA) as well as encouragements to vote (“vote today”) or to
see a video or a given webpage (“Tune in as we …”, “Head to urllink to learn …”, “Chip in to
help …”). The president is also supporting Republican candidates to the Congress with the
phrase “… has my complete and total endorsement!”. As in the 2016 US election, Trump
continues to call the traditional media “fake news.” In tweets sent by Biden, there is a clear
- 13 -
intent to include the people in the proposed claims or statements. The personal pronoun “we”
and the noun “folks” occur frequently in this perspective (e.g., “Folks, we are just …”).
During an electoral campaign, the candidates do not speak about all pertinent political
subjects, and the terms “budget”, “debt”, or “deficit” occur rarely. This 2020 election was not
an exception and the word “budget” never appears. The word “debt” is related to the issue of
student debt or to the expression “debt of gratitude”. When using “deficit”, Trump tweets about
“reduce deficit through cuts to social security” while Biden talks about the trade deficit with
China.
As other missing or marginal issues in this campaign, one can count only six times the words
beginning with “immigr*” or “education” in tweets sent by both candidates in September and
October. The word “energy” was mainly used to qualify Biden (e.g., “low energy Joe” in tweets
sent by Trump) and not to discuss the energy issue. As other examples, one can cite “liberty”
appearing only twice in tweets written by Trump, never in tweets sent by Biden.
8
Conclusion
During an electoral campaign, candidates hold meetings and speak on TV and radio to convince
the citizens of their capabilities to govern the country. In addition to the oral and written communication forms, social networks propose a more direct channel to the general public. Due to
the pandemic situation in 2020, Twitter played a more important role for both Donald Trump
and Joe Biden. To identify the stylistic features of both candidates, a corpus was generated with
speeches, TV debates, radio interviews, and tweets produced during the last two months of the
2020 US presidential election.
When comparing the three communication channels for both nominees, one can observe that
the oral form presents more repetitions and thus shows a lower TTR value (see Table 1b). This
measure is higher for both the written and tweet-based channels, but the lexical density is higher
for the tweets than for written messages. This finding reflects the fact that tweets are more
context-free messages. In such a case, the author must name the things he is talking about. Thus,
one can observe more nouns and less pronouns or conjunctions (usually indicating longer
sentences).
As distinctive stylistic markers, Trump employs more personal pronouns in the oral and
written form (see Tables 2 and 3). In oral, the pronoun “I” appears as the second most frequent
word from Trump’s mouth, a clear indication of a strong ego (Pennebaker, 2011). When
considering the top ten most frequent functional words, Trump’s style is rather similar in the oral
and written forms. With tweets, one can observe a clear dissimilarity with the two other forms
(see Table 2 and Figure 1). This could be partially explained by a high rate of retweets (51.7%,
see Table 1a).
When focusing on the rhetoric, Trump uses emotional words or expressions more frequently
(see Table 4). Clearly, and for both candidates, one can count more positive emotions than
negative ones. With Trump however, those negative words occur more frequently in tweets.
- 14 -
Trump’s rhetoric is also based on praise words (e.g., “true”, “brave”) and grandiose adverbs or
adjectives (e.g., “great”, “strong”, “good”). The Democratic nominee does not opt for negative
terms and prefers to put forward moral values (e.g., “pace”, “honesty”, “hope”) to contrast with
the negative ads and expressions of his opponent. This positive attitude adopted by Biden was
a surprise because negative ads imply negative answers according to (Ansolabehere & Iyengar,
1995).
When comparing this electoral campaign with the previous one (Savoy, 2018b), (Sides et al.,
2018), (Hart, 2020), we observe that in both cases, the emotional component has played an
important role in motivating the voters (Marcus & MacKuen, 1993), (Brader, 2005), (McDonald
& Lenz, 2008). In this view, the negative emotions such as anger and fear have been present in
US political advertisements for a few decades (Ansolabehere & Iyengar, 1995). In 2020, the
negative tone was mainly adopted mainly by the Republicans with a focus on two main targets,
the press and the media (reporting fake news) on the one hand, and on the other the expected
violence of the “radical left” once in power. In 2016, Trump’s campaign targets four topics,
namely the (criminals) immigrants, China (destroying US jobs), the political establishment in
Washington and the press and media (supporting only the elites).
Finally, we must recognize that stylistic and rhetorical analysis based on a set of wordlists is
not without concerns. One can emphasize that a word could have more than one meaning and
this language ambivalence is ignored by a simple count of the number of occurrences. For
example, the word energy occurring in the context of “low energy Joe” (in Trump’s tweets) is
not related with the energy problem. Moreover, the short context of term is disregarded by a
simple word count procedure. For example, in the expression “hope, not fear, peace, not
violence” (in Biden’s tweets) counting two words as positive emotional terms and two as
negative is not fully correct. This tweet presents clearly a positive tone, and rejects the negative
attitude. This semantic distinction is not fully understood by the computer when simply counting
word occurrences.
Acknowledgments
This study was fully supported by Hasler Foundation (Bern, Switzerland).
References
Ansolabehere, S., and Iyengar, S. (1995). Going Negative. How Political Advertisements
Shrink & Polarize the Electorate. The Free Press: New York.
Arnold, E., and Labbé, D. (2015). Vote for me. Don’t vote for the other one. Journal of World
Languages, 2(1), 32–49.
Baayen, H.R. (2008). Analyzing Linguistic Data. A Practical Introduction Using R. Cambridge:
Cambridge University Press.
Bartélémy, J.P., and Guénoche, A. (1991). Trees and Proximity Representations. New York:
John Wiley.
Biber, D., Conrad, S., and Leech, G. (2002). The Longman Student Grammar of Spoken and
Written English. London: Longman.
- 15 -
Biber, D., and Conrad, S. (2009). Register, Genre, and Style. Cambridge: Cambridge
University Press.
Boller, P.F. Jr. (2004). Presidential Campaigns. From George Washington to George W. Bush.
Oxford: Oxford University Press.
Brader, T. (2005). Striking a responsive chord: How political ads motivate and persuade voters
by appealing to emotions. American Journal of Political Science, 49(2), 388–405.
Carpenter R.H.and Seltzer R.V. (1970). On Nixon's Kennedy style. Speaker and Gavel, vol. 7,
no 41.
Covington, M.A., and McFall, J.D. (2010). Cutting the Goridian knot: The moving-average
type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94-100.
Crystal, D. (2006). Language and the Internet. Cambridge: Cambridge University Press.
Crystal, D. (2018). The Cambridge Encyclopedia of the English Language. Cambridge:
Cambridge University Press.
Grimmer, J., and Stewart, B.M. (2013). Text as data: The promise and pitfalls of automatic
content analysis methods for political texts. Political Analysis, 21(3), 267–297.
Hansen K.M., and Pedersen R.T. (2008). Negative campaigning in a multiparty system. Scandinavian Political Studies, 31(4), 408–427.
Hart, R.P. (1984). Verbal Style and the Presidency. A Computer-based Analysis. Orlando:
Academic Press.
Hart, R.P. (2020). Trump and Us. What He Says and Why People Listen. Cambridge:
Cambridge University Press.
Hart, R.P., Childers, J.P., and Lind, C.J. (2013). Political Tone. How Leaders Talk and Why.
Chicago: The University of Chicago Press.
Kubát, M., and Cech, R. (2016). Quantitative analysis of US presidential inaugural addresses.
Glottometrics, 34, 14–27.
Kubát, M., Macutek, J., and Cech, R. (2020). Communists spoke differently: An analysis of
Czechoslovak and Czech annual presidential speeches. Digital Scholarship in the Humanities,
35(4), to appear.
Labbé, D., and Monière, D. (2003). Le discours gouvernemental. Canada, Québec, France
(1945-2000). Paris: Honoré Champion.
Labbé, D., and Monière, D. (2008). Les mots qui nous gouvernent. Le discours des premiers
ministres québécois: 1960-2005. Montréal: Monière-Wollank.
Labbé, D., and Monière, D. (2013). La campagne présidentielle de 2012. Votez pour moi!
Paris: L’Harmattan.
Labbé, D., and Savoy, J. (2021). Stylistic analysis of the French presidential speeches: Is
Macron really different? Digital Scholarship in the Humanities, to appear.
Lakoff, G., and Wehling, E. (2012). The Little Blue Book: The Essential Guide to Thinking
and Talking Democratic. New York: Free Press.
Laver, M., Benoit, K., and Garry, J. (2003). Extracting policy positions from political texts
using words as data. American Political Science Review, 97(2), 311–331.
- 16 -
Marcus, G.E., MacKuen M.B. (1993). Anxiety, enthusiasm, and the vote: The emotional underpinning of learning and involvement during presidential campaigns. American Political
Science Review, 87(3), 672–685.
Mayaffe, D. (2004). Le Discours Présidentiel Sous la Ve République. Paris: Presses de la Fondation Nationale des Sciences Politiques.
McDonald Ladd J., and Lenz, G.S. (2008). Reassessing the role of anxiety in vote choice.
Political Psychology, 29(2), 275–296.
Muller, C. (1992). Principes et Méthodes de Statistique Lexicale. Paris: Honoré Champion.
O’Connor, B., Balasubramanyan, R., Routledge, B.R., and Smith, N.A. (2010). From tweets
to polls: Linking text sentiment to public opinion time series. Proceedings 4th International
AAAI Conference on Weblogs and Social Media, 122–129.
Paradis, E. (2011). Analysis of Phylogenetics and Evolution with R. New York: Springer.
Pauli, F., and Tuzzi, A. (2009). The end of year addresses of the Presidents of the Italian
Republic (1948–2006): Discourse similarities and differences. Glottometrics, 18, 40–51.
Pennebaker, J.W. (2011). The Secret Life of Pronouns. What our Words Say About us. New
York: Bloomsbury Press.
Raubach, E.E. (2019). Does political ideology influence how politicians speak? A linguistic
analysis of one-minute speeches from the US House of Representatives of the 115th Congress.
Document numérique, 22(1-2), 127–138.
Rogers, K. (2020). The State of the Union is Trump’s biggest speech. Who writes it? The New
York Times, Feb. 3rd.
Rule, A., Cointet, J.-P., and Bearman, P.S. (2015). Lexical shifts, substantive changes, and
continuity in State of the Union discourse. 1790–2014. In: Proceedings of the National
Academy of Sciences, 112, 35, 1-8.
Savoy, J. (2015). Text clustering: An application with the State of the Union addresses. Journal
of the American Society for Information Science & Technology, 66(8), 1645–1654.
Savoy, J. (2017). Analysis of the style and the rhetoric of the American presidents over two
centuries. Glottometrics, 38(1), 55–76.
Savoy, J. (2018a). Analysis of the style and the rhetoric of the 2016 US presidential primaries.
Digital Scholarship in the Humanities, 33(1), 143–159.
Savoy, J. (2018b). Trump and Clinton's Style and Rhetoric during the 2016 Presidential Election.
Journal of Quantitative Linguistics, 25(2), 168-189.
Savoy, J. (2020). Machine Learning Methods for Stylometry: Authorship Attribution and Author
Profiling. Cham: Springer.
Sides J., Tesler M., and Vavreck L. (2018). Identity crisis. The 2016 presidential campaign
and the battle for the meaning of America. Princeton University Press: Princeton.
Slatcher, R.B., Chung, C.K., Pennebaker, J.W., and Stone, L.D. (2007). Winning words:
Individual differences in linguistic style among U.S. presidential and vice presidential
candidates. Journal of Research in Personality, 41(1), 63–75.
- 17 -
Sylwester, K., and Purver, M. (2015). Twitter language use to reflects psychological differences between Democrats and Republicans. PLoS One, 10(9).
Tausczik, Y.R., and Pennebaker, J.W. (2010). The psychological meaning of words: LIWC
and computerized text analysis methods. Journal of Language and Social Psychology, 29(1),
24–54.
Young, L., and Soroka, S. (2012). Affective news: The automated coding of sentiment in
political texts. Political Communication, 29, 205–231.
Yu, B. (2008). Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1), 33–48.
Yu, B. (2013). Language and gender in congressional speech. Literary and Linguistic Computing, 29(1), 118–132.
Appendix
Table A.1. Manhattan distance between the different representations
Manhattan
DToral
JBoral
DTwritten
JBwritten
DTtweets
JBtweets
DToral
0.0
0.086
0.065
0.127
0.159
0.139
JBoral
0.086
0.0
0.101
0.075
0.149
0.081
DTwritten
0.065
0.101
0.0
0.092
0.125
0.102
- 18 -
JBwritten
0.127
0.075
0.092
0.0
0.137
0.078
DTtweets
0.159
0.149
0.125
0.137
0.0
0.104
JBtweets
0.139
0.081
0.102
0.078
0.104
0.0
Download