ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback Yuheng Hu (@hyheng) Arizona State Univ. Ajita John Avaya Labs Fei Wang IBM T.J Watson Research Subbarao Kambhampati Arizona State Univ. 1 2 Motivation Republican Primary Debate, 09/07/2011 Tweets tagged with #ReaganDebate Which part of the event did a tweet refer to? What were the topics of the event and tweets? Applications: Event playback/Analysis, Sentiment Analysis, Advertisement, etc 3 Event-Tweet Alignment: The Problem • Given an event’s transcript S and its associated tweets T – Find the segment s (s ∈ S) which is topically referred by tweet t (t ∈ T) [Could be a general tweet] • Alignment requires: 1. Extracting topics in the tweets and event 2. Segmenting the event into topically coherent chunks 3. Classify the tweets --General vs. Specific 4 Event-Tweet Alignment: A Model 5 Event-Tweet Alignment: Challenges • Both topics and Segments are latent • Tweets are topically influenced by the content of the event. A tweet’s words’ topics can be – general (high-level and constant across the entire event), or – specific (concrete and relate to specific segments of the event) • General tweet = weakly influenced by the event • Specific tweet = strongly influenced by the event • An event is formed by discrete sequentially-ordered segments, each of which discusses a particular set of topics 6 Event-Tweet Alignment: Approaches • Prior work – Event Segmentation • HMM-based, etc – Topics Modeling • LDA, PLSI • Possible Solution – Apply LDA to event and Tweets separately – Measure the closeness by JS-divergence of their topic distributions – Problem: Event and and its twitter feeds are modeled largely independently • Our Solution: Joint Modeling – ET-LDA (event-tweets LDA) considers an event and its Twitter feeds jointly and characterizes the topic influences between them in a fully Bayeisan model • Potential advantages – Tweets provide a richer context about the topic evolution in the event – Can measure the influence of the event on the twitterati 7 ET-LDA 8 ET-LDA Model Tweets Event Determine event segmentation Determine which segment a tweet (word) refers to Determine tweet type Determine word’s topic in event Tweets word’s topic 9 ET-LDA Model For more details of the inference, please 10 refer to our paper: Learning ET-LDA: Gibbs sampling Coupling between a and b makes the posterior computation of latent variables is intractable For more details of the inference, please 11 refer to our paper: Experimental Evaluation Evaluation Plan for ET-LDA • Performance of topic extraction • Performance of topic influence prediction • Performance of event segmentation Experimental Setup • Tweets for President Obama’s speech on the Middle East (#MESpeech) & Republican Primary debate in the US (#ReaganDebate), expanded by search snippets • Event transcripts from New York Times • Tweets expanded with search snippets for context 12 Topics Extraction (#MESpeech) MESpeech: specific topics are sensitive to the event’s context 13 and keep evolving as the event progresses Examples of segments of (#MESpeech) 7 segments • 1st segment Thank you. Thank you. (Applause.) Thank you very much. Thank you. Please, have a seat. Thank you very much. I want to begin by thanking Hillary Clinton, who has traveled so much these last six months that she is approaching a new landmark – one million frequent flyer miles. I count on Hillary every single day, and I believe that she will go down as one of the finest Secretaries of State in our nation's history. • 2nd segment Introduction The State Department is a fitting venue to mark a new chapter in American diplomacy. For six months, we have witnessed an extraordinary change taking place in the Middle East and North Africa. Square by square, town by town, country by country, the people have risen up to demand their basic human rights. Two leaders have stepped aside. More may follow. And though these countries may be a great distance from our shores, we know that our own future is bound to this region by the forces of economics and security, by history and by faith. Today, I want to talk about this change -- the forces that are driving it and how we can respond in a way that advances our values and strengthens our security. Now, already, we've done much to shift our foreign policy following a decade defined by two costly conflicts. After years of war in Iraq, we've removed 100,000 American troops and ended our combat mission there. In Afghanistan, we've broken the Taliban's momentum, … 14 Overview of US foreign policy Event Segmentation (#MESpeech) • Participants asked to assess quality of segmentation by ET-LDA and LCSeg (an HMM-based event segmentation tool, trained on 15 states HMM) – Participants: 5 graduate students – Method: questionnaire • ET-LDA performed consistently better than baselines (lower Pk values) Pk Prob. that a random pair of words incorrectly separated by segment boundary 15 Examples of Specific/General tweets • ReaganDebate – Specific Something the #GOP candidates won't mention about Reagan - Reagan grew the size of the federal government tremendously. #reagandebate Yes, we need to talk about jobs and teachers needing jobs! #Reagandebate – General Huntsman said Ronnie!! Take a shot! #GOPDebate #tcot #ReaganDebate Wow, Ron Paul. Really, you think airlines would give a rip about security? Free market nonsense. #reagandebate 16 Topic Influence Prediction (#MESpeech) • Prediction of topical influences (whether tweets are strongly/weakly influenced by the event) from the event on the un-seen tweets in our test set (20% of total tweets). • Baseline: LDA on event and tweets, then measure by JSdivergence, deeming top ones as strongly influenced tweets • Human study to evaluate the “goodness” of prediction results – (e.g., do you think this tweet is strongly correlated to this segment of the event?) The improvements are statistically significant 17 Conclusion • Motivated joint modeling for event-tweet alignment • Developed ET-LDA model • Provided evaluations on two tweet datasets – Demonstrated that ET-LDA significantly outperformed the traditional models For details: Web: Twitter: @hyheng Thank you! 18 18