Predicting Elections with Social Media: Opportunities and Challenges Marko M. Skoric Nanyang Technological University, Singapore Nathaniel D. Poor Independent Scholar, Brooklyn, NY, USA Palakorn Achananuparp, Ee-Peng Lim, and Jing Jiang Singapore Management University, Singapore Previous Research Studies examined whether the content and structure of Twitter can be used to predict the results of elections in: • • • • Germany (Tumasjan et al., 2010) Portugal (Fonseca, 2011) United Kingdom (Tweetminster, 2009) United States (Gayo-Avello et al., 2011) Finding are very mixed, with mean absolute error (MAE) ranging from less than 2% to more than 17%. • Mostly simple frequency counts In general, national-level predictions are more accurate than regional and local. Prediction Elections from Social Media: Methodological Issues Open (relatively) Twitter API offers good opportunities for research. • How to select/sample content? • Keyword search, geo-location, “seeded” sample, etc. • Sentiment analysis? Facebook is less suitable for electoral predictions at present at present. In the absence of published scientific polls, social media data may be of considerable value. • For example, Weibo (200 million users in China) The Question Can the content of tweets containing names of... • political parties, • political candidates, and • contested constituencies... be used to make predictions about the share of votes on the election day? 2011 Singapore General Elections: A Case Study Introduction We studied the relation between tweets and election results during the 2011 Singapore General Election, examining if and how tweets could forecast the election results. In line with some previous studies, we found that during the elections the frequency of some Twitter content could be used to make predictions about the share of votes at the national level. While the predictions may not be as accurate as those done via traditional public opinion polling methods, they can be useful, especially in the absence of published polls. Background on Singapore Elections Singapore is classified as a “partly free” society • • • • Elections held periodically and free of fraud The ruling party dominates traditional media and politics Internet is mostly free 2011 Elections as the most competitive elections in history • Social media as a “balancing force” (Lin, Bagrow, Lazer 2011) Voting in Singapore is compulsory and the electoral system is a version of the Westminster system, first-past-the-post method of electing MPs. Two types of electoral divisions in Singapore: Single Member Constituencies (SMCs) and Group Member Constituencies (GRCs). Research Questions RQ1: Is the share of Twitter messages mentioning political parties and their candidates predictive of their respective share of the vote at the national level? RQ2: Is the relative frequency of Twitter messages mentioning the names of opposition candidates predictive of the opposition’s share of the vote at the constituency level? Sampling We developed our own Twitter crawler: • Perl • Twitter API • MySQL Seeded sample: • 59 Core Users: Known political figures, political candidates, political parties and organizations, activists, journalists, and bloggers. • Plus Followed & Followers: ~13,000 accounts. • Singapore location in Twitter profile. Measures Tweets mentioning the names of the seven political parties and the candidates contesting the elections as well as the names of 26 contested SMCs and GRCs. Tweets used... • from April 27, 2011 (nomination day), • to May 7, 2011 (polling day). Almost 1.5 million tweets were collected. 110,815 political tweets were identified and counted in this study. % of Tweets and Votes Party % Tweets % Votes Error PAP* 42.80 (1) 60.14 (1) -17.34 WP 20.83 (2) 12.83 (2) 8.00 NSP 13.86 (3) 12.04 (3) 1.82 SDP 11.07 (4) 4.83 (4) 6.24 RP 5.22 (5) 4.28 (5) 0.94 SPP 4.41 (6) 3.11 (6) 1.30 SDA 1.81 (7) 2.78 (7) -0.97 MAE 5.23 Numbers in parentheses indicate relative rank. *Ruling party (PAP). MAE = mean absolute error. National Level, Votes/Tweets R2 = 0.912 Correlations, Tweets/Votes Constituency Level % vote opp Opp tweets PAP tweets % vote opp Opp tweets 1 Const tweets .416* .294 .764** 1 .803** .211 1 -.006 PAP tweets Const tweets N=26 (26 constituencies) * p < .05 (2-tailed), ** p < .01 (2-tailed) 1 Constituency Level, Votes/Tweets R2 = 0.173 Summary of Results Tweets can be used to predict votes, although more so on the national level than the constituency level. Opposition parties were generally “overhyped” on Twitter, and the ruling party was significantly “underhyped”. • Compensating for the ruling party’s dominance in traditional media? • Frequency of tweets containing the names of constituencies was a strong predictor of the opposition vote share. Discussion Public opinions polls vs. Twitter • Solicited opinions vs. short political conversations • Increasing difficulties with traditional surveys • Should we benchmark Twitter findings against public opinion data or against media use behavior? • Effect sizes are medium to large for tweet predictions Strong correlation between the number of tweets for the opposition parties and the number of tweets for the ruling party – tweets are election-related conversations. Social media can act as a discussion space for those citizens who are kept out of older, traditional, media forms. Conclusion Analysis of Twitter messages may represent an inexpensive, unobtrusive and reasonably accurate method for gauging public opinion. •In the absence of scientific polls, tweet predictions may be useful What should be done in future? • Specify robust data collection/sampling methods and analytical approaches • Identify demographics and other biases and apply weights • Offer suitable theoretical frameworks • Pursue comparative studies, across different political contexts and cultures Thank you!