Committee Members Dr. Subbarao Kambhampati, Chair Dr. Eric Horvitz, Dr. John Krumm

advertisement
Committee Members
Dr. Subbarao Kambhampati, Chair
Dr. Eric Horvitz,
Dr. John Krumm
Dr. Huan Liu
Dr. Hari Sundaram
Since the dawn of civilization, people congregated
in town squares to discuss events
The emergence of social media has now created a sprawling virtual town square,
whose scope is vast, and whose chatter can be captured!
opening exciting possibilities for analyzing what people are actually saying..
debate
i-5 bridge collapse
Superbowl
Obama’s selfie
What’s the relation between
event and tweets?
Which part of the event did a
tweet refer to?
What were the topics of the
event and tweets?
What were the sentiments of
the event elicited on tweets?
How to characterize the
crowds’ tweeting behavior
?
How to detect an event from
social media responses
How to predict the crowds'
engagement in future events
How to find social media responses about the events
How to model relations between event and its responses
How to address
these challenges?
How to link social media responses to events
How to infer topics and sentiments of social media responses
How to characterize the crowds’ behavior in response to events
How to distill insights about event based on social media responses
How to predict future development of event
How to predict crowds’ engagement in future event
The event
master
“Fox News Unveils New State-Of-The-Art
Newsroom” – the Verge, Oct 17, 2013
Tweets volume on Egypt & Morsi
12k ~ per hour
Most existing event analytics solutions
are primitive. Simply combining other
solutions ignores connections between
events and responses
Given the vast amounts of social
media responses and complex nature
of events, we need automated tools to
conduct in-depth analysis
Task 1: Event
sensemaking
Event topics, segments,
Event-tweet alignment,
Event sentiments
Task 2: Event
recognition
Trending events
with associated
Twitter responses
Task 3: Event
engagement
prediction
Predict user’s
engagement in
future events
Specific
Specific
Specific
General
Specific
General
Specific
General
General
ET-LDA
[AAAI’12, ICWSM’12, MMW’12]
Event-tweets alignment
Frequency of specific tweets
Evolution of specific tweets
ET-LDA
[AAAI’12, ICWSM’12, MMW’12]
Specific
Specific
Specific
General
Specific
General
Specific
General
General
SocSent
[IJCAI’13]
Fire happened at 5
St and Pike, heard
sirens, lots smoke
DeMA
[CHI’13]
Hey Mike: we found this event may be of
interest to you based on our prediction
on your potential engagement ! Our
predictions were made based on your
Twitter engagement history.
Regards,
Alice
Alice
[under review]
Eventics, automated toolbox to conduct
in-depth analysis of 3 core tasks in
event analytics
ET-LDA & SocSent for Event sensemaking
DeMA for Event recognition
Alice for Event engagement prediction
Our toolbox enables a richer
perspective about
How people respond to events on Twitter
What factors affect crowd’s engagement in events
Motivation
Republican Primary Debate, 09/07/2011
Tweets tagged with #ReaganDebate
What’s the relation between an event and tweets?
Which part of the event did a tweet refer to?
What were the topics of the event and tweets?
How to characterize the crowds’ tweeting behavior?
Given an event’s transcript S, and its
associated tweets T
– Characterize the event in terms of its topics and segments, and its
influences (w.r.t the nature and magnitude) on the crowds’ Twitter
responding behavior
Requirements:
1.
2.
3.
4.
Extract topics in the event and tweets
Segment the event into topically coherent chunks
Establish the alignment between the event and tweets
Measure the influence of the event on its associated tweets
Both topics and segments are latent
Tweets are topically influenced by the content of
the event. A tweet’s topics can be general (highlevel and constant across the entire event), or
specific (concrete and relate to specific segments
of the event)
An event is formed by discrete sequentiallyordered segments, each of which discusses a
particular set of topics
Applying existing event segmentation tools
e.g., time-windows
For each <tweet, segment> pair, measuring
similarities e.g., TF-IDF
Counting related tweets for each segment
Unfortunately, these approaches are not
able to discover latent topics/segments,
besides they model event and its Twitter
responses independently
ET-LDA (joint Event and
Tweets LDA) is a
hierarchical fully Bayesian
model, which jointly
models an event and its
Twitter responses via
their inter-dependency,
i.e., topical influences
Yuheng Hu, Ajita John, Fei Wang, Subbarao Kambhampati. “ET-LDA: Joint Topic Modeling for Aligning
Events and their Twitter Feedback.” In AAAI Conference on Artificial Intelligence (AAAI) 2012
Yuheng Hu, Ajita John, Doree Duncan Seligmann, Fei Wang. “What were the Tweets about? Topical
Associations between Public Events and Twitter Feeds.” ICWSM’12
Yuheng Hu, Ajita John, Doree Duncan Seligmann. “Event Analytics via Social Media.” In Proc. ACM
Multimedia 2011 Workshop on Social and Behavioral Networked Media Access (SBNMA) , 2011
Event transcript
………………
………………
……....................
………………
………………
……………………
……………………
……………………
Event
Determine segment
topics
θ(s)~Dirichlet(α), or
θ(s)~ 𝛿 (θ(s-1),θ(s)),
Determine event
segmentation
C(s)~Bernoulli(𝛿)
Determine which
segment a tweet (word)
refers to
S(t) ~ Categorical(γ)
Tweets
Determine tweet type
C(t)~Bernoulli(λ)
General topics
Ψ(t)~Dirichlet(α)
Determine word’s
topic in event
Zs~multinomial(θ)
Tweets word’s topic
Zt~multinomial(ψ) or
Zt~multinomial(θ)
We need to infer P(Zs, Zt, Cs , Cs, St | Ws, Wt )
How joint distribution looks like:
Gibbs sampling
approximates the
posterior
distribution by
iteratively
updating each
latent variable
given the
remaining
variables
30
1.
Event segmentation
2.
Topic extraction
3.
Alignment
MESpeech
Pk
ReaganDebate
ET-LDA
LCSeg
ET-LDA
LCSeg
0.295
0.361
0.31
0.397
Pk = probability that a randomly chosen pair of words from the event will
be incorrectly separated by a hypothesized segment boundary
ReaganDebate
S1
S2
S3
ET-LDA
0.51
0.61
0.69
LDA
0.48
0.51
0.52
Performance based on Likert scale
MESpeech
S1
S2
S3
S4
S5
ET-LDA
0.49
0.51
0.56
0.58
0.63
LDA
0.48
0.49
0.54
0.51
0.57
ReaganDebate
ET-LDA
0.51
0.52
0.57
0.62
0.61
LDA
0.48
0.49
0.51
0.51
0.58
Performance based on Likert scale
rapid increase
from 33% to 54%
Controversial
topic mentioned,
the responses
were pronounced
most responses were either
tangential or about the highlevel themes
Observation 1: crowds’ responses tended to be general
and steady before the event; after the event, while during
the event, they were more specific and episodic.
People can talk about
things that have been
discussed before or being
discussed currently
People can also talk
about things which
are expected to be
discussed later
ET-LDA alignment
Observation 2: topical context of the tweets did not always
correlate with the timeline of the event – an event segment
can be referred to by specific tweets at any time irrespective
of whether it has already occurred or is occurring currently
or will occur later on
Specific
Something the #GOP candidates won't mention about Reagan - Reagan
grew the size of the federal government tremendously. #reagandebate
Yes, we need to talk about jobs and teachers needing jobs! #Reagandebate
General
Boring #GOPDebate #tcot #ReaganDebate
Ron Paul. Gogogog :) . #reagandebate
39
Proposed Work
Robustness
of ET-LDA
Predictive power
of ET-LDA
Proposed Work
What other
tasks can we do
based on this
alignment?
Specific
Specific
Specific
General
Specific
General
Specific
General
What were the sentiments
General
elicited by the segments and
topics of the event on Twitter?
Applications: Event analysis, Stock market, Advertisement
45
45
+
—
—
+
Is this sufficient?
Unfortunately, NO..
46
How to
overcome these
challenges?
47
+
—
—
+
Yuheng Hu, Fei Wang, Subbarao Kambhampati. “Listen to the Crowd: Automated Analysis of
Events via Aggregated Twitter Sentiment.” In International Joint Conference on Artificial
Intelligence (IJCAI) 2013
tweet
Regulation
From prior
topic
Tweet-event
alignment from
ET-LDA
Regulation
From prior
sentiment
T
Labels for small
tweets
sentiment
Regulation
From prior
term
We require that the
factors respect the
prior knowledge to
the extent possible.
topic
factorization
sentiment
tweet
sentiment
term
tweet
segment
terms
segment
tweet
segment
Sentiment lexicon
G
G0
T
X
S
R0
F
F0
R0 regulates G, T and S together
T X S represents segmentsentiment matrix
G X T X S represents
tweets-sentiment matrix
G0
tweet
segment
0.53
0.2
0.01
…
0.05
0.5
0.3
…
0.4
0.23
0.21
…
0.06
0.2
0.12
…
sentiment
term
F0
1
1
0
0
1
0
1
Obtain F0 sentiment lexicon from MPQA
corpus. F0(i, 1) = 1 if word i is possible, and
F0(i, 2) = 1 for negative sentiment
sentiment
tweets
R0
0
Obtain G0 sentiment lexicon from ET-LDA
inference. Each row represent nt tweets and
its columns represent ns segments of the
event. the content is the posterior
probability of a tweet referring to the
segments.
1
0
1
0
0
1
0
1
Ask people to label the sentiment for a few
tweets (e.g., less than 1000) for the
purposes of capturing some domain-specific
connotations
Multiplicative update rules
The coupling between G,T, S, F makes
it difficult to find optimal solutions for
all factors simultaneously.
We adopt an alternating optimization
scheme [Ding et al., 2006]
Ψ is the Lagrangian multipliers which enforce nonnegativity constraints on F, C represents terms
irrelevant to F
Baselines:
1. LexRatio: Counts the ratio of sentiment words
from subjectivity lexicon in a tweet to
SocSent
determine its sentiment orientation [Wilson etimproves
al., 2009] other
2. MinCuts: Utilizes contextual information via the
minimum-cutby
framework to improve
approaches
polarity-classification accuracy [Pang and Lee, 2004]
7.3% to 18.8%
3. MFLK: Supervised matrix factorization method [Li et al. 2009]
SocSent utilizes the partially available knowledge on tweet-event alignment
from ET-LDA to improve the quality of sentiment classification in both events.
SocSent improves the three baselines with a range of
6.5% to 17.3% for both datasets
F0 for sentiment Lexicon, R0 for tweets labels, G0 for prior tweet/event
alignment knowledge from ET-LDA.
58
Proposed Work
Proposed Work
Fire happened at 5
St and Pike, heard
sirens, lots smoke
How to detect events from
social media responses
Given a set of tweets
Find an event where it consists of a set of topically-related trending
features extracted from tweets at a given time, where trending is a time interval
over which the rate of change of momentum is positive
Challenges
1. Be versatile
DeMA is an unsupervised feature-pivot online event detector,
which recognizes trending events their associated Twitter
responses from a stream of noisy Twitter message, with 3 steps:
1. Trending feature identification
2. Trending feature ranking
3. Trending feature grouping
Yuheng Hu, Shelly Farnham, Andrés Monroy-Hernández. “Whoo.ly: Facilitating
Information Seeking For Hyperlocal Communities Using Social Media.” In Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems (CHI) 2013
Description
of Events
Terms
Time of trending
Westminster
Dog Show
Westminster, dog, show, club
Oct 2nd 2012 10:10am
Gas leaked
Gas, Leak, Pike, 10th, St, Pine,
Blocked, Siren
May 23rd 2012 9:55am
Opening of
Gluten-free
kitchen
Gluten, Free, Dedicated,
Kitchen, Bar, Cap, Hill
June 2nd 2012 8:08am
Sea Toy Fair
#Toyfairsea, Starwars, Hasbro,
Lego
Aug 18th 2012 11:23am
Hey Mike: we found this event may be of
interest to you based on our prediction
on your potential engagement! Our
predictions were made based on your
Twitter engagement history.
Regards,
Alice
How to predict crowds'
engagement in future events
Involvement
Intimacy
Interest
Interaction
influence
Alice is a statistical framework
which can be used to understand
people’s engagement with
events that are trending in
a local community, and to
Yuheng Hu, Shelly Farnham. “Understanding and Predicting People’s Community Engagement in Social
Media” Under submission
predict the engagement on
unseen events
Methods
Precision
Recall
F-1
SVM
0.88
0.75
0.81
Accuracy of results on binary classification
Class
#Users
Accuracy
0
1611
80.41%
1
2013
84.45%
2
1255
72.11%
3
43
74.01%
Accuracy of results on multiclass
classification
Proposed Work
Proposed Work
Proposed Work
Robustness
of ET-LDA
Predictive power
of ET-LDA
Proposed Work
Proposed Work
Proposed Work
Download