ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback Yuheng Hu

advertisement
ET-LDA: Joint Topic Modeling
for Aligning Events and their
Twitter Feedback
Yuheng Hu (@hyheng) Arizona State Univ.
Ajita John Avaya Labs
Fei Wang IBM T.J Watson Research
Subbarao Kambhampati Arizona State Univ.
1
2
Motivation
Republican Primary Debate, 09/07/2011
Tweets tagged with #ReaganDebate
Which part of the event did a tweet refer to?
What were the topics of the event and tweets?
Applications: Event playback/Analysis, Sentiment Analysis, Advertisement, etc
3
Event-Tweet Alignment: The Problem
• Given an event’s transcript S and its associated
tweets T
– Find the segment s (s ∈ S) which is topically
referred by tweet t (t ∈ T) [Could be a general
tweet]
• Alignment requires:
1. Extracting topics in the tweets and event
2. Segmenting the event into
topically coherent chunks
3. Classify the tweets
--General vs. Specific
4
Event-Tweet Alignment: A Model
5
Event-Tweet Alignment: Challenges
• Both topics and Segments are latent
• Tweets are topically influenced by
the content of the event. A tweet’s
words’ topics can be
– general (high-level and constant
across the entire event), or
– specific (concrete and relate to
specific segments of the event)
• General tweet = weakly influenced
by the event
• Specific tweet = strongly influenced
by the event
• An event is formed by discrete
sequentially-ordered segments,
each of which discusses a particular
set of topics
6
Event-Tweet Alignment: Approaches
• Prior work
– Event Segmentation
• HMM-based, etc
– Topics Modeling
• LDA, PLSI
• Possible Solution
– Apply LDA to event and
Tweets separately
– Measure the closeness
by JS-divergence of their
topic distributions
– Problem: Event and and
its twitter feeds are
modeled largely
independently
• Our Solution: Joint
Modeling
– ET-LDA (event-tweets
LDA) considers an event
and its Twitter feeds jointly
and characterizes the topic
influences between them in
a fully Bayeisan model
• Potential advantages
– Tweets provide a richer
context about the topic
evolution in the event
– Can measure the influence
of the event on the
twitterati
7
ET-LDA
8
ET-LDA Model
Tweets
Event
Determine event
segmentation
Determine
which
segment a
tweet
(word)
refers to
Determine
tweet type
Determine
word’s topic
in event
Tweets
word’s topic
9
ET-LDA Model
For more details of the inference, please
10
refer to our paper: http://bit.ly/MBHjyZ
Learning ET-LDA: Gibbs sampling
Coupling between a and b makes the posterior computation
of latent variables is intractable
For more details of the inference, please
11
refer to our paper: http://bit.ly/MBHjyZ
Experimental Evaluation
Evaluation Plan for ET-LDA
• Performance of topic
extraction
• Performance of topic
influence prediction
• Performance of event
segmentation
Experimental Setup
• Tweets for President
Obama’s speech on the
Middle East (#MESpeech) &
Republican Primary debate
in the US (#ReaganDebate),
expanded by search
snippets
• Event transcripts from New
York Times
• Tweets expanded with
search snippets for context
12
Topics Extraction (#MESpeech)
MESpeech: specific topics are sensitive to the event’s context
13
and keep evolving as the event progresses
Examples of segments of (#MESpeech)
7 segments
• 1st segment
Thank you. Thank you. (Applause.) Thank you very much. Thank you. Please, have a seat.
Thank you very much. I want to begin by thanking Hillary Clinton, who has traveled so much
these last six months that she is approaching a new landmark – one million frequent flyer miles.
I count on Hillary every single day, and I believe that she will go down as one of the finest
Secretaries of State in our nation's history.
•
2nd
segment
Introduction
The State Department is a fitting venue to mark a new chapter in American diplomacy. For
six months, we have witnessed an extraordinary change taking place in the Middle East and
North Africa. Square by square, town by town, country by country, the people have risen up to
demand their basic human rights. Two leaders have stepped aside. More may follow. And
though these countries may be a great distance from our shores, we know that our own future
is bound to this region by the forces of economics and security, by history and by faith.
Today, I want to talk about this change -- the forces that are driving it and how we can
respond in a way that advances our values and strengthens our security.
Now, already, we've done much to shift our foreign policy following a decade defined by
two costly conflicts. After years of war in Iraq, we've removed 100,000 American troops and
ended our combat mission there. In Afghanistan, we've broken the Taliban's momentum, … 14
Overview of US foreign policy
Event Segmentation (#MESpeech)
• Participants asked to assess quality of
segmentation by ET-LDA and LCSeg (an
HMM-based event segmentation tool,
trained on 15 states HMM)
– Participants: 5 graduate students
– Method: questionnaire
• ET-LDA performed consistently better
than baselines (lower Pk values)
Pk Prob. that a
random pair of
words incorrectly
separated by
segment boundary
15
Examples of Specific/General tweets
• ReaganDebate
– Specific
Something the #GOP candidates won't mention about Reagan - Reagan
grew the size of the federal government tremendously. #reagandebate
Yes, we need to talk about jobs and teachers needing jobs!
#Reagandebate
– General
Huntsman said Ronnie!! Take a shot! #GOPDebate #tcot #ReaganDebate
Wow, Ron Paul. Really, you think airlines would give a rip about security?
Free market nonsense. #reagandebate
16
Topic Influence Prediction
(#MESpeech)
• Prediction of topical
influences (whether tweets
are strongly/weakly
influenced by the event) from
the event on the un-seen
tweets in our test set (20% of
total tweets).
• Baseline: LDA on event and
tweets, then measure by JSdivergence, deeming top
ones as strongly influenced
tweets
• Human study to evaluate the
“goodness” of prediction
results
– (e.g., do you think this tweet is
strongly correlated to this
segment of the event?)
The improvements are
statistically significant
17
Conclusion
• Motivated joint modeling
for event-tweet alignment
• Developed ET-LDA model
• Provided evaluations on
two tweet datasets
– Demonstrated that ET-LDA
significantly outperformed
the traditional models
For details: yuheng@asu.edu
Web: http://bit.ly/Mkie7l
Twitter: @hyheng
Thank you!
18
18
Download