Tweets and Votes: A Study of the 2011 Singapore General Election

advertisement
Predicting Elections with Social Media:
Opportunities and Challenges
Marko M. Skoric
Nanyang Technological University, Singapore
Nathaniel D. Poor
Independent Scholar, Brooklyn, NY, USA
Palakorn Achananuparp, Ee-Peng Lim, and Jing Jiang
Singapore Management University, Singapore
Previous Research
Studies examined whether the content and structure
of Twitter can be used to predict the results of
elections in:
•
•
•
•
Germany (Tumasjan et al., 2010)
Portugal (Fonseca, 2011)
United Kingdom (Tweetminster, 2009)
United States (Gayo-Avello et al., 2011)
Finding are very mixed, with mean absolute error
(MAE) ranging from less than 2% to more than 17%.
• Mostly simple frequency counts
In general, national-level predictions are more
accurate than regional and local.
Prediction Elections from Social
Media: Methodological Issues
Open (relatively) Twitter API offers good
opportunities for research.
• How to select/sample content?
• Keyword search, geo-location, “seeded” sample, etc.
• Sentiment analysis?
Facebook is less suitable for electoral predictions
at present at present.
In the absence of published scientific polls, social
media data may be of considerable value.
• For example, Weibo (200 million users in China)
The Question
Can the content of tweets containing names
of...
• political parties,
• political candidates, and
• contested constituencies...
be used to make predictions about the share
of votes on the election day?
2011 Singapore General Elections:
A Case Study
Introduction
We studied the relation between tweets and election
results during the 2011 Singapore General Election,
examining if and how tweets could forecast the election
results.
In line with some previous studies, we found that during
the elections the frequency of some Twitter content could
be used to make predictions about the share of votes at
the national level.
While the predictions may not be as accurate as those
done via traditional public opinion polling methods, they
can be useful, especially in the absence of published polls.
Background on Singapore Elections
Singapore is classified as a “partly free” society
•
•
•
•
Elections held periodically and free of fraud
The ruling party dominates traditional media and politics
Internet is mostly free
2011 Elections as the most competitive elections in
history
• Social media as a “balancing force” (Lin, Bagrow, Lazer 2011)
Voting in Singapore is compulsory and the electoral system
is a version of the Westminster system, first-past-the-post
method of electing MPs.
Two types of electoral divisions in Singapore: Single
Member Constituencies (SMCs) and Group Member
Constituencies (GRCs).
Research Questions
RQ1: Is the share of Twitter messages mentioning
political parties and their candidates predictive of
their respective share of the vote at the national
level?
RQ2: Is the relative frequency of Twitter messages
mentioning the names of opposition candidates
predictive of the opposition’s share of the vote at
the constituency level?
Sampling
We developed our own Twitter crawler:
• Perl
• Twitter API
• MySQL
Seeded sample:
• 59 Core Users: Known political figures, political
candidates, political parties and organizations,
activists, journalists, and bloggers.
• Plus Followed & Followers: ~13,000 accounts.
• Singapore location in Twitter profile.
Measures
Tweets mentioning the names of the seven political
parties and the candidates contesting the elections
as well as the names of 26 contested SMCs and
GRCs.
Tweets used...
• from April 27, 2011 (nomination day),
• to May 7, 2011 (polling day).
Almost 1.5 million tweets were collected.
110,815 political tweets were identified and
counted in this study.
% of Tweets and Votes
Party
% Tweets
% Votes
Error
PAP*
42.80 (1)
60.14 (1)
-17.34
WP
20.83 (2)
12.83 (2)
8.00
NSP
13.86 (3)
12.04 (3)
1.82
SDP
11.07 (4)
4.83 (4)
6.24
RP
5.22 (5)
4.28 (5)
0.94
SPP
4.41 (6)
3.11 (6)
1.30
SDA
1.81 (7)
2.78 (7)
-0.97
MAE
5.23
Numbers in parentheses indicate relative rank.
*Ruling party (PAP).
MAE = mean absolute error.
National Level, Votes/Tweets
R2 = 0.912
Correlations, Tweets/Votes
Constituency Level
% vote opp Opp tweets PAP tweets
% vote opp
Opp tweets
1
Const
tweets
.416*
.294
.764**
1
.803**
.211
1
-.006
PAP tweets
Const
tweets
N=26 (26 constituencies)
* p < .05 (2-tailed), ** p < .01 (2-tailed)
1
Constituency Level, Votes/Tweets
R2 = 0.173
Summary of Results
Tweets can be used to predict votes, although more
so on the national level than the constituency level.
Opposition parties were generally “overhyped” on
Twitter, and the ruling party was significantly
“underhyped”.
• Compensating for the ruling party’s dominance in
traditional media?
• Frequency of tweets containing the names of
constituencies was a strong predictor of the opposition
vote share.
Discussion
Public opinions polls vs. Twitter
• Solicited opinions vs. short political conversations
• Increasing difficulties with traditional surveys
• Should we benchmark Twitter findings against public
opinion data or against media use behavior?
• Effect sizes are medium to large for tweet
predictions
Strong correlation between the number of tweets for the
opposition parties and the number of tweets for the ruling
party – tweets are election-related conversations.
Social media can act as a discussion space for those citizens
who are kept out of older, traditional, media forms.
Conclusion
Analysis of Twitter messages may represent an
inexpensive, unobtrusive and reasonably accurate
method for gauging public opinion.
•In the absence of scientific polls, tweet predictions
may be useful
What should be done in future?
• Specify robust data collection/sampling
methods and analytical approaches
• Identify demographics and other biases and apply
weights
• Offer suitable theoretical frameworks
• Pursue comparative studies, across different
political contexts and cultures
Thank you!
Download