The study of social phenomena via social media Alessandro Flammini Indiana University Thanks to: And to: Why do we care about Online Social Media? InformaBon shared on SM maCers People increasingly makes choices based on info share on SM: purchasing product poliBcal informaBon health lifestyle eternal informaBon consumed (TV shows, blogs, news, books, music ….) SM have been leveraged to: anBcipate stock market forecast movie revenue probe consumer preferences coordinate acBon in crisis predict outcome of poliBcal elecBons The “world” is paying aCenBon Social Media: many actors, many agendas • DetecBng persuasion campaigns on TwiCer • AutomaBc detecBon of arBficial users Visit us @truthy.indiana.edu Social Networks as a macroscopic system he network dsiffusion tructure aaffects ffects the informaBon How How does dtoes he itnformaBon evoluBon doiffusion? f the social network, and ? How does the network evolves in response to the spread of informaBon it supports? How different pieces informaBon compete for our aCenBon? How do memes compete for our aCenBon? the lifeBme, popularity and diversity How compeBBon for aCenBon affects How oes our finite aCenBon affects the lifeBme, popularity, and diversity of of dinformaBon we consume? the informaBon we consume? PoliBcal discourse on TwiCer & the issue of polarizaBon Is people polarized as their congressmen? Data and Networks How the poliBcal network looks like 3 months of data. ~300k tweets. ~20k users. On the right the result of label propagaBon clustering algorithm Looking inside the groups Retweet MenBon Le[ Right Und. Nodes 1.1% 93.4% 5.3% 7,115 80.1% 8.7% 11.1% 11,355 39.5% 52.2% 8.1% 7,021 ClassificaBon of users poliBcal leaning Features Accuracy Text (TF-­‐IDF) 79% Hashtags 91% Retweet network 95% Tags + Network 95% Standard machine learning classificaBon with different classes of features. Text is in bag of words + TF-­‐IDF representaBon Words were ranked according to their Term Frequency – Inverse Document Frequency, Represented here in Log-­‐scale Occupy Wall Street TwiCer traffic & Bmeline Event Occupy Wall Street Traffic Volume ● Initial Protests ● Initial Arrests ● Foley Square March ● Times Square Protests ● Union Square March ● Zuccotti Cleared 5000 ● Counter Protest 0 ● Egyptian Elections ● May Day Strike ● 30000 ● ● Tweets 25000 20000 ● ● 15000 ● 10000 ● 09/11 ● ● 10/11 11/11 12/11 01/12 02/12 03/12 Date 04/12 05/12 06/12 07/12 08/12 09/12 On-­‐the-­‐ground events (circles). TwiCer data-­‐stream outages (blue bands) are highlighted. Bins have 12-­‐hours length. The geography of online OWS NY NY CA DC CA MulB-­‐scale backbone extracBon – confidence level a = 0.15 Occupy-­‐related discourse (right) shows a prominent hub-­‐and-­‐spoke structure with features disBnct from conversaBons on domesBc poliBcs (le[). DC CollecBve Framing vs Resource mobilizaBon CollecBve framing Resource mobilizaBon RaBo = Collec've framing: the social processes whereby parBcipants negoBate the shared language and narraBve frames to define the movement's idenBty and goals. Resource mobiliza'on: the work to marshal the physical and technological infrastructure, human resources, and financial capital to sustain ongoing acBvity. People engagement Topic Attention Allocation Occupy 0.4 Domestic Engaged User Ratio Revolution 0.3 Engaged User Attention Ratio 0.0 0.2 0.1 0.2 0.1 0.3 0.4 0.5 0.0 06/11 07/11 08/11 09/11 10/11 11/11 12/11 01/12 02/12 03/12 04/12 05/12 06/12 07/12 08/12 09/12 Date Point size is proporBonal to user engagement, measured as average (across users) raBo of OWS-­‐related tweets over enBre producBon 0.6 At the core of the protest Proportion of Activity In−Group In−Group Connectivity Among Occupy Users 0.40 0.35 Tweet Type retweets 0.30 mentions 0.25 0.20 06/11 07/11 08/11 09/11 10/11 11/11 12/11 01/12 02/12 03/12 04/12 05/12 06/12 07/12 08/12 09/12 Date Longitudinal connecBvity level of 25k users – reported as mean and 95% confidence intervals Conclusions SM data can be leveraged to answer quesBons relevant to social science Technical issues I carefully hide: data collecBon, filtering, archiving, retrieval, analysis May possible biases Opportunity to observe in the “wild”, without relying o what people says about the self or others Complement tradiBonal approaches