Committee Members Dr. Subbarao Kambhampati, Chair Dr. Eric Horvitz, Dr. John Krumm Dr. Huan Liu Dr. Hari Sundaram Since the dawn of civilization, people congregated in town squares to discuss events The emergence of social media has now created a sprawling virtual town square, whose scope is vast, and whose chatter can be captured! opening exciting possibilities for analyzing what people are actually saying.. debate i-5 bridge collapse Superbowl Obama’s selfie What’s the relation between event and tweets? Which part of the event did a tweet refer to? What were the topics of the event and tweets? What were the sentiments of the event elicited on tweets? How to characterize the crowds’ tweeting behavior ? How to detect an event from social media responses How to predict the crowds' engagement in future events How to find social media responses about the events How to model relations between event and its responses How to address these challenges? How to link social media responses to events How to infer topics and sentiments of social media responses How to characterize the crowds’ behavior in response to events How to distill insights about event based on social media responses How to predict future development of event How to predict crowds’ engagement in future event The event master “Fox News Unveils New State-Of-The-Art Newsroom” – the Verge, Oct 17, 2013 Tweets volume on Egypt & Morsi 12k ~ per hour Most existing event analytics solutions are primitive. Simply combining other solutions ignores connections between events and responses Given the vast amounts of social media responses and complex nature of events, we need automated tools to conduct in-depth analysis Task 1: Event sensemaking Event topics, segments, Event-tweet alignment, Event sentiments Task 2: Event recognition Trending events with associated Twitter responses Task 3: Event engagement prediction Predict user’s engagement in future events Specific Specific Specific General Specific General Specific General General ET-LDA [AAAI’12, ICWSM’12, MMW’12] Event-tweets alignment Frequency of specific tweets Evolution of specific tweets ET-LDA [AAAI’12, ICWSM’12, MMW’12] Specific Specific Specific General Specific General Specific General General SocSent [IJCAI’13] Fire happened at 5 St and Pike, heard sirens, lots smoke DeMA [CHI’13] Hey Mike: we found this event may be of interest to you based on our prediction on your potential engagement ! Our predictions were made based on your Twitter engagement history. Regards, Alice Alice [under review] Eventics, automated toolbox to conduct in-depth analysis of 3 core tasks in event analytics ET-LDA & SocSent for Event sensemaking DeMA for Event recognition Alice for Event engagement prediction Our toolbox enables a richer perspective about How people respond to events on Twitter What factors affect crowd’s engagement in events Motivation Republican Primary Debate, 09/07/2011 Tweets tagged with #ReaganDebate What’s the relation between an event and tweets? Which part of the event did a tweet refer to? What were the topics of the event and tweets? How to characterize the crowds’ tweeting behavior? Given an event’s transcript S, and its associated tweets T – Characterize the event in terms of its topics and segments, and its influences (w.r.t the nature and magnitude) on the crowds’ Twitter responding behavior Requirements: 1. 2. 3. 4. Extract topics in the event and tweets Segment the event into topically coherent chunks Establish the alignment between the event and tweets Measure the influence of the event on its associated tweets Both topics and segments are latent Tweets are topically influenced by the content of the event. A tweet’s topics can be general (highlevel and constant across the entire event), or specific (concrete and relate to specific segments of the event) An event is formed by discrete sequentiallyordered segments, each of which discusses a particular set of topics Applying existing event segmentation tools e.g., time-windows For each <tweet, segment> pair, measuring similarities e.g., TF-IDF Counting related tweets for each segment Unfortunately, these approaches are not able to discover latent topics/segments, besides they model event and its Twitter responses independently ET-LDA (joint Event and Tweets LDA) is a hierarchical fully Bayesian model, which jointly models an event and its Twitter responses via their inter-dependency, i.e., topical influences Yuheng Hu, Ajita John, Fei Wang, Subbarao Kambhampati. “ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback.” In AAAI Conference on Artificial Intelligence (AAAI) 2012 Yuheng Hu, Ajita John, Doree Duncan Seligmann, Fei Wang. “What were the Tweets about? Topical Associations between Public Events and Twitter Feeds.” ICWSM’12 Yuheng Hu, Ajita John, Doree Duncan Seligmann. “Event Analytics via Social Media.” In Proc. ACM Multimedia 2011 Workshop on Social and Behavioral Networked Media Access (SBNMA) , 2011 Event transcript ……………… ……………… …….................... ……………… ……………… …………………… …………………… …………………… Event Determine segment topics θ(s)~Dirichlet(α), or θ(s)~ 𝛿 (θ(s-1),θ(s)), Determine event segmentation C(s)~Bernoulli(𝛿) Determine which segment a tweet (word) refers to S(t) ~ Categorical(γ) Tweets Determine tweet type C(t)~Bernoulli(λ) General topics Ψ(t)~Dirichlet(α) Determine word’s topic in event Zs~multinomial(θ) Tweets word’s topic Zt~multinomial(ψ) or Zt~multinomial(θ) We need to infer P(Zs, Zt, Cs , Cs, St | Ws, Wt ) How joint distribution looks like: Gibbs sampling approximates the posterior distribution by iteratively updating each latent variable given the remaining variables 30 1. Event segmentation 2. Topic extraction 3. Alignment MESpeech Pk ReaganDebate ET-LDA LCSeg ET-LDA LCSeg 0.295 0.361 0.31 0.397 Pk = probability that a randomly chosen pair of words from the event will be incorrectly separated by a hypothesized segment boundary ReaganDebate S1 S2 S3 ET-LDA 0.51 0.61 0.69 LDA 0.48 0.51 0.52 Performance based on Likert scale MESpeech S1 S2 S3 S4 S5 ET-LDA 0.49 0.51 0.56 0.58 0.63 LDA 0.48 0.49 0.54 0.51 0.57 ReaganDebate ET-LDA 0.51 0.52 0.57 0.62 0.61 LDA 0.48 0.49 0.51 0.51 0.58 Performance based on Likert scale rapid increase from 33% to 54% Controversial topic mentioned, the responses were pronounced most responses were either tangential or about the highlevel themes Observation 1: crowds’ responses tended to be general and steady before the event; after the event, while during the event, they were more specific and episodic. People can talk about things that have been discussed before or being discussed currently People can also talk about things which are expected to be discussed later ET-LDA alignment Observation 2: topical context of the tweets did not always correlate with the timeline of the event – an event segment can be referred to by specific tweets at any time irrespective of whether it has already occurred or is occurring currently or will occur later on Specific Something the #GOP candidates won't mention about Reagan - Reagan grew the size of the federal government tremendously. #reagandebate Yes, we need to talk about jobs and teachers needing jobs! #Reagandebate General Boring #GOPDebate #tcot #ReaganDebate Ron Paul. Gogogog :) . #reagandebate 39 Proposed Work Robustness of ET-LDA Predictive power of ET-LDA Proposed Work What other tasks can we do based on this alignment? Specific Specific Specific General Specific General Specific General What were the sentiments General elicited by the segments and topics of the event on Twitter? Applications: Event analysis, Stock market, Advertisement 45 45 + — — + Is this sufficient? Unfortunately, NO.. 46 How to overcome these challenges? 47 + — — + Yuheng Hu, Fei Wang, Subbarao Kambhampati. “Listen to the Crowd: Automated Analysis of Events via Aggregated Twitter Sentiment.” In International Joint Conference on Artificial Intelligence (IJCAI) 2013 tweet Regulation From prior topic Tweet-event alignment from ET-LDA Regulation From prior sentiment T Labels for small tweets sentiment Regulation From prior term We require that the factors respect the prior knowledge to the extent possible. topic factorization sentiment tweet sentiment term tweet segment terms segment tweet segment Sentiment lexicon G G0 T X S R0 F F0 R0 regulates G, T and S together T X S represents segmentsentiment matrix G X T X S represents tweets-sentiment matrix G0 tweet segment 0.53 0.2 0.01 … 0.05 0.5 0.3 … 0.4 0.23 0.21 … 0.06 0.2 0.12 … sentiment term F0 1 1 0 0 1 0 1 Obtain F0 sentiment lexicon from MPQA corpus. F0(i, 1) = 1 if word i is possible, and F0(i, 2) = 1 for negative sentiment sentiment tweets R0 0 Obtain G0 sentiment lexicon from ET-LDA inference. Each row represent nt tweets and its columns represent ns segments of the event. the content is the posterior probability of a tweet referring to the segments. 1 0 1 0 0 1 0 1 Ask people to label the sentiment for a few tweets (e.g., less than 1000) for the purposes of capturing some domain-specific connotations Multiplicative update rules The coupling between G,T, S, F makes it difficult to find optimal solutions for all factors simultaneously. We adopt an alternating optimization scheme [Ding et al., 2006] Ψ is the Lagrangian multipliers which enforce nonnegativity constraints on F, C represents terms irrelevant to F Baselines: 1. LexRatio: Counts the ratio of sentiment words from subjectivity lexicon in a tweet to SocSent determine its sentiment orientation [Wilson etimproves al., 2009] other 2. MinCuts: Utilizes contextual information via the minimum-cutby framework to improve approaches polarity-classification accuracy [Pang and Lee, 2004] 7.3% to 18.8% 3. MFLK: Supervised matrix factorization method [Li et al. 2009] SocSent utilizes the partially available knowledge on tweet-event alignment from ET-LDA to improve the quality of sentiment classification in both events. SocSent improves the three baselines with a range of 6.5% to 17.3% for both datasets F0 for sentiment Lexicon, R0 for tweets labels, G0 for prior tweet/event alignment knowledge from ET-LDA. 58 Proposed Work Proposed Work Fire happened at 5 St and Pike, heard sirens, lots smoke How to detect events from social media responses Given a set of tweets Find an event where it consists of a set of topically-related trending features extracted from tweets at a given time, where trending is a time interval over which the rate of change of momentum is positive Challenges 1. Be versatile DeMA is an unsupervised feature-pivot online event detector, which recognizes trending events their associated Twitter responses from a stream of noisy Twitter message, with 3 steps: 1. Trending feature identification 2. Trending feature ranking 3. Trending feature grouping Yuheng Hu, Shelly Farnham, Andrés Monroy-Hernández. “Whoo.ly: Facilitating Information Seeking For Hyperlocal Communities Using Social Media.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI) 2013 Description of Events Terms Time of trending Westminster Dog Show Westminster, dog, show, club Oct 2nd 2012 10:10am Gas leaked Gas, Leak, Pike, 10th, St, Pine, Blocked, Siren May 23rd 2012 9:55am Opening of Gluten-free kitchen Gluten, Free, Dedicated, Kitchen, Bar, Cap, Hill June 2nd 2012 8:08am Sea Toy Fair #Toyfairsea, Starwars, Hasbro, Lego Aug 18th 2012 11:23am Hey Mike: we found this event may be of interest to you based on our prediction on your potential engagement! Our predictions were made based on your Twitter engagement history. Regards, Alice How to predict crowds' engagement in future events Involvement Intimacy Interest Interaction influence Alice is a statistical framework which can be used to understand people’s engagement with events that are trending in a local community, and to Yuheng Hu, Shelly Farnham. “Understanding and Predicting People’s Community Engagement in Social Media” Under submission predict the engagement on unseen events Methods Precision Recall F-1 SVM 0.88 0.75 0.81 Accuracy of results on binary classification Class #Users Accuracy 0 1611 80.41% 1 2013 84.45% 2 1255 72.11% 3 43 74.01% Accuracy of results on multiclass classification Proposed Work Proposed Work Proposed Work Robustness of ET-LDA Predictive power of ET-LDA Proposed Work Proposed Work Proposed Work