Mixed-Initiative Social Media Analytics at the World Bank: Observations of Citizen Sentiment in Twitter Data to Explore “Trust” of Political Actors and State Institutions and its Relationship to Social Protest 2 0 O c t o b e r , 2 0 1 5 I E E E B i g D a t a a n d H u m a n i t i e s N a d y a C a l d e r o n , B r i a n F i s h e r , J e f f H e m s l e y , B i l l C e s c a v i c h , G r e g J a n s e n , R i c h a r d M a r c i a n o a n d V i c t o r i a L . L e m i e u x Motivation 2 WE FEEL FINE PROJECT Main Goal of the Project: – Use Big Data Analytics to contribute to research on trust in government, specifically the relationship among trust in Government, trust in state institutions and citizens’ collective behavior Research Questions: – how Brazilian citizens’ felt about their state institutions – how these feelings connected to their sentiments about Brazilian Federal and State government services and politicians – how such sentiments translated into collective behaviors 3 I N FO R M AT I O N L A N D S C A P E I N B R A Z I L Number of internet users in Brazil 107.7m Internet user penetration in Brazil 53.1% Average duration of monthly internet usage in Brazil 29.4h Google is most popular search engine in Brazil based in market share Mobile phone internet users in Brazil 96.7% 72.1m Active social media penetration in Brazil 45% Number of social network users in Brazil 78.1m Number of Twitter Users 41.2m 4 CONTEXT OF THE STUDY 2014 FIFA WORLD CUP 5 MIXED INITIATIVE SOCIAL MEDIA ANALYTICS (MISMA) Interleaved contributions by the user and the system, to- gether converging on a solution to a single problem. Asymmetric division of labour such that the contributions made by the computer and the user are distinct. Kirkpatrick, A. E., Dilkina, B., & Havens, W. S. (2005, June). A framework for designing and evaluating mixed-initiative optimization systems. In ICAPS Workshop on Mixed-Initiative Planning and Scheduling, held in conjunction with the Fifteenth International Conference on Automated Planning and Scheduling, Monterey, California, June. 6 OVERVIEW OF THE MISMA METHODOLOGY 1. Using the sentiment expressed in the content of twitter data to measure “Trust” 2. Instrument choice = sentiment analysis (SentiStrength) 3. Initial “big picture” harvest of Twitter data 4. Visual analysis and Text analysis of “big picture” 5. Use of search terms to harvest historical twitter data 6. Sentiment classification of harvested tweets 7. Development and use of VA tool to explore historical Twitter collections 8. Pair analysis 9. Analysis of competing hypotheses (ACH) 7 S E N T I M E N T A N A LY S I S • Rule-Based Classifier • Representative Features : Dictionary of sentiment words, booster words, negating words, question words. • We Integrated ANEW-BR Valence words • Evaluation with Portuguese Speakers 8 Sentiment Classification with SentiStrength +4 Valence: + or Magnitude: 0 - 4 -4 9 H I STO R I C A L T W E E T CO L L EC T I O N D E TA I L S 10 WORD CO-OCCURRENCE serviço, serviços, saúde, hospital , hospitais, policia, polícia, policiais, educação, faculdade, dilma, governo, lula, presidente, presidenta, federal, prefeito, prefeitura, ministério, ministro, municipal, vigilância, politica, política, oposição, justiça, justo, petista, pt, corrupção, corrupto, corruptos, brasileiro, brasileiros, brasileira, brasileiras, crise, água, emergência, falta, petrobras, petrobrás COLLECTION THEMES: SEARCH PHRASES Political Opinion dilma lula pt, dilma lula política, Public Services **Water falta água, crise água, organização criminosa pt, crise hídrica, dilma governo, pt governo, política dilma, política governo, brasileiros dilma, brasileiro governo, oposição governo, acabar corrupção, impeachment dilma, dilma precisa, reforma politica, dilma vítima, dilma pobres, água dilma, água governo, água saúde, falta d água, água pt, água acabando, água corrupção, água brasileiro, seca dilma, água educação, educação saúde, educação dilma, educação serviços, saúde serviços, água educação, governo saúde, dilma saúde educação governo, educação federal, federal polícia Petrobras dilma petrobrás, petrobrás pt, petrobrás corrupto, corrupção petrobrás, petrobrás crise, petrobrás presidente, petrobrás brasileiro, petrobrás dinheiro, graça foster petrobrás, petrobrás mesma coisa, dilma graça povo brasileiro, brasileira política 12 V I S UA L A N A LYS I S The sense-making loop for visual analysis based on a simple model of visualization. Van Wijk’s (2005) 13 14 15 PROJECT FINDINGS Brazilian citizens were expressing negative sentiment about the national government’s low level of investment in education, health and water, and to a lesser extent security and electricity, relative to spending on the World Cup. At the state level, water was the key issue. The study also found support for theories of relative deprivation as a cause of protest and for theories of digitally-mediated modes of political contestation. The use of big data analytics made it possible to observe the protests from a distance, both in terms of space (i.e., the research team did not travel to Brazil) and time (i.e., the study used historical Tweets). 16 LIMITATIONS OF THE STUDY Representativeness of the sample Performance of Sentiment Classification for political opinion analysis: take interactive machine learning approach to refine during exploration Historical metadata: limitations with geographical analysis Topical Themes introduce biases that need to be explicit A-Historicity Network analysis Tool design Domain Expertise Privacy 17 FINAL THOUGHT E valuation persuades rather than convinces, argues rather than convinces, is credible rather than certain, is variously accepted rather than compelling. - Ernesto R. House, Evaluation with Validity, Beverly Hills, 1980 18