NAACL Tutorial Social Media Predictive Analytics Svitlana Volkova , Benjamin Van Durme

advertisement
NAACL Tutorial
Social Media Predictive Analytics
Svitlana Volkova1, Benjamin Van Durme1,2,
David Yarowsky1 and Yoram Bachrach3
1Center for
Language and Speech Processing,
Johns Hopkins University,
2Human Language Technology Center of Excellence,
3Microsoft Research Cambridge
Tutorial Schedule
Part I: Theoretical Session (2:00 – 4:30pm)
Batch Prediction
Online Inference
Coffee Break (3:30 – 4:00pm)
Dynamic Learning and Prediction
Part II: Practice Session (4:30 – 5:30pm)
Code and Data
Tutorial Materials
• Slides:
– http://www.cs.jhu.edu/~svitlana/slides.pptx
• Code and Data:
– https://bitbucket.org/svolkova/queryingtwitter
– https://bitbucket.org/svolkova/attribute
– https://bitbucket.org/svolkova/psycho-demographics
• References:
– http://www.cs.jhu.edu/~svitlana/references.pdf
Social Media Obsession
Diverse
Billions of messages
Millions of users
What do they
think and feel?
What is their
demographics and
personality?
What do
they like?
What do
they buy?
Where do
they go?
First: a comment on privacy and ethics…
Why is language in social media so
interesting?
•
•
•
•
Very Short – 140 chars
Lexically divergent
Abbreviated
Multilingual
Why is language in social media so
challenging?
• Data drift
• User activeness => generalization
• Topical sparsity => relationship, politics
• Dynamic streaming nature
DEMO
Predictive Analytics Services
• Social Network Prediction –
https://apps.facebook.com/snpredictionapp/
• Twitter Psycho-Demographic Profile and Affect Inference –
http://twitterpredictor.cloudapp.net (pswd: twitpredMSR2014)
• My personality Project – http://mypersonality.org/wiki/doku.php
• You Are What You Like – http://youarewhatyoulike.com/
• Psycho-demographic trait predictions –
http://applymagicsauce.com/
• IBM Personality – https://watson-pi-demo.mybluemix.net
• World Well Being Project – http://wwbp.org
Applications: Retail
Personalized marketing
• Detecting opinions and emotions
users express about products or
services within targeted
populations
Personalized
recommendations and search
• Making recommendations based on
user emotions, demographics and
personality
Applications: Advertising
vs.
Online targeted advertising
• Targeting ads based on
predicted user demographics
• Matching the emotional tone
the user expects
Deliver adds fast
Deliver adds to a
true crowd
vs.
vs.
Applications: Polling
Real-time live polling
vs.
• Mining political opinions
• Voting predictions within certain demographics
Large-scale passive
polling
• Passive poling
regarding products
and services
Applications: Health
Large-scale real-time healthcare analytics
• Identifying smokers, drug addicts, healthy eaters,
people into sports (Paul and Dredze 2011)
• Monitoring flue-trends, food poisonings, chronic
illnesses (Culotta et. al. 2015)
Applications: HR
Recruitment and human resource
management
• Estimating emotional stability and
personality of the potential and
current employees
• Measuring the overall well-being of
the employees e.g., life satisfaction,
happiness (Schwartz et. al. 2013;
Volkova et. al., 2015)
• Monitor depression and stress level
(Coppersmith et. al. 2014)
User Attribute Prediction Task
Communications
Political Preference
…
…
Rao et al., 2010; Conover et al., 2011,
Pennacchiotti and Popescu, 2011;
Zamal et al., 2012; Cohen and Ruths,
2013; Volkova et. al, 2014
Age
…
.
.
.
Rao et al., 2010; Zamal et al., 2012;
Cohen and Ruth, 2013; Nguyen et al.,
2011, 2013; Sap et al., 2014
Gender
…
Garera and Yarowsky, 2009; Rao et
al., 2010; Burger et al., 2011; Van
Durme, 2012; Zamal et al., 2012;
Bergsma and Van Durme, 2013
AAAI 2015 Demo (joint work with Microsoft Research)
Income, Education Level, Ethnicity, Life Satisfaction, Optimism,
Personality, Showing Off, Self-Promoting
Tweets Revealing User Attributes
?
?
?
?
Supervised Models
Classification: binary (SVM) – gender, age, political, ethnicity
• Goswami et. al., 2009; Rao et al. 2010; Burger et al. 2011; Mislove et al.
2012; Nguyen et al. 2011; Nguyen et al. 2013;
• Pennacchiotti and Popescu 2011; Connover et. al. 2011; Filippova et. al.
2012; Van Durme 2012; Bergsma et. al. 2012, 2013; Bergsma and Van
Durme 2013;
• Zamal et al. 2012; Ciot et. al. 2013; Cohen and Ruths 2013;
• Schwartz et. al. 2013; Sap et. al., 2014; Kern et. al., 2014; Schwartz et. al.
2013; Golbeck et. al. 2011; Kosinski et. al. 2013;
• Volkova et. al. 2014; Volkova et al. 2015.
Unsupervised and Generative Models
• name morphology for gender & ethnicity prediction - Rao et al. 2011;
• large-scale clustering - Bergsma et. al. 2013; Culotta et. al. 2015;
• demographic language variations - Eisenstein et al. 2010; O’Connor et
al. 2010; Eisenstein et. al. 2014.
*Rely on more than lexical features e.g., network, streaming
Existing Approaches ~1K Tweets*
….
…
….
…
Tweets as….a
…
document
….
…
….
Does an average Twitter user produce
…
thousands of tweets?
….
…
….
…
….
…
*Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu,
2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013
How Active are Twitter Users?
Attributed Social Network
User Local Neighborhoods a.k.a. Social Circles
Approaches
Static (Batch)
Prediction
• Offline
training
• Offline
predictions
+ Neighbor
content
Streaming
(Online) Inference
Dynamic (Iterative)
Learning and Prediction
• Offline training
+ Online
predictions over
time
• Exploring 6 types
of neighborhoods
•
•
+
+
+
Online predictions
Relying on neighbors
Iterative re-training
Active learning
Rationale annotation
Data drift
Streaming nature
Model generalization
Topical sparsity
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i.
How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i.
How to learn and predict on the fly?
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i.
How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i.
How to learn and predict on the fly?
How to get data? Twitter API
• Twitter API: https://dev.twitter.com/overview/api
• Twitter API Status:https://dev.twitter.com/overview/status
• Twitter API Rate Limits:
https://dev.twitter.com/rest/public/rate-limits
Querying Twitter API
• Twitter Developer Account => access key and token
https://dev.twitter.com/oauth/overview/applicationowner-access-tokens
twitter = Twython(APP_KEY, APP_SECRET,
OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
I. Access 1% Twitter Firehouse and sample from it
II. Query Twitter API to get:
 user timelines (up to 3200 tweets) from userIDs
 tweet json objects from tweetIDs
 lists of friendIDs (5000 per query) from userIDs
JSON Objects
Add predictions: sentiment,
attributes, emotions
MongoDB: http://docs.mongodb.org/manual/tutorial/getting-started/
How to get labeled data?
• Supervised classification in a new domain:
– Labeled data ≈ ground truth
– Costly and time consuming to get!
UL
Attribute
Model
ΦA(u)
UP
• Ways to get ≈“ground truth” annotations:
 Fun psychological tests (voluntarily): myPersonality project
 Profile info: Facebook e.g., relationship, gender, age but sparse for Twitter
 Self reports: “I am a republican…” (Volkova et al. 2013), “Happy
##th/st/nd/rd birthday to me” (Zamal et. al. 2012), “I have been diagnosed
with …” (Coppersmith et. al. 2014), “I am a writer …” (Beller at. al., 2014)
 Distant supervision: following Obama vs. Romney (Zamal et. al. 2012),
emotion hashtags (Mohammad et. al, 2014), user name (Burger et. al., 2011)
 Crowdsourcing: subjective perceived annotations (Volkova et. al.2015),
rationales (Bergsma et. al., 2013, Volkova et. al, 2014; 2015)
Twitter Social Graph
friend
reply
hashtag
retweet
follower @mention
10 - 20 neighbors
of 6 types per user
I.
Candidate-Centric (distant
supervision)
1,031 users
II. Geo-Centric (self-reports)
270 users
III. Politically Active (distant
supervision)*
371 users (Dem; Rep)
IV. Age (self-reports)*
387 users (18 – 23; 23 - 25)
V. Gender (name)*
384 users (Male; Female)
Balanced datasets
What types of neighbors lead to the best
attribute predictions?
*Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013
Code, data and trained models for gender, age, political preference prediction
http://www.cs.jhu.edu/~svitlana/
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i.
How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i.
How to learn and predict on the fly?
Classification Model
• Logistic regression = max entropy = log linear models
– Map discrete inputs w to binary output y
y = { M, F }
Labeled users
(Training)
Vocabulary size
hair
eat
coo
l
wor
k
…
xbox
Femal
e
1
1
0
0
…
0
Male
0
hair
0
1
eat
0
0
coo
1
l
1
wor
1
k
…
…
…
1
xbox
1
0
1
0
0
…
1
Male
Test user
?
• Other options: SVM, NB
wi = {0,1}
Feature vector
http://scikit-learn.org/stable/modules/generated/
sklearn.linear_model.LogisticRegression.html
Features (I)
• Lexical:
– normalized counts/binary ngrams (Goswami el. al. 2010; Rao et. al.
2010; Pennacchiotti and Popescu 2011; Ngyen et. al. 2013; Ciot et. al.
2013; Van Durme 2012; Kern et. al. 2014; Volkova et. al. 2014;
Volkova and Van Durme 2015)
– class-based highly predictive (Bergsma and Van Durme 2013),
rationales (Volkova and Yarowsky 2014); character-based
(Peersman et. al. 2011), stems, co-stems, lemmas (Zamal et. al.
2012; Cohen et. al. 2014)
• Socio-linguistic, syntactic and stylistic:
– syntax and style (Shler et. al. 2006; Cheng at. al., 2011), smiles,
excitement, emoticons and psycho-linguistic (Rao et. al. 2010;
Marquardt et. al. 2014; Kokkos et. sl. 2014; Hovy 2015)
– lexicon features (Sap et. al. 2014); linguistic inquiry and word
count (LIWC) (Mukherjee et. al. 2010; Fink et. al. 2012)
Features (II)
• Communication behavior: response/retweet/tweet frequency,
retweeting tendency (Connover et. al. 2011; Golbeck et. al. 2011;
Pennacchiotti and Popescu 2011; Preotic at. al. 2015)
• Network structure: follower-following ratio, neighborhood size,
in/out degree, degree of connectivity (Bamman et. al. 2012;
Filippova 2012; Zamal et. al. 2012, Culotta et. al. 2015)
• Other: likes (Bachrach et. al. 2012; Kosinski et. al. 2014), name or
census (Burger et. al. 2011; Liu and Ruths 2013), links/images
(Rosenthal and McKeown 2011)
• Topics: word embeddings, LDA topics, word clusters (Preotic at.
al. 2015)
Female
hair
eat
cool
work
…
xbox
RT
1
1
0
0
…
0
0.3
neigh images
30
0.5
….
….
Batch Experiments
• Log-linear word unigram models:
(I) Users vs. (II) Neighbors and (III) User-Neighbor
F = argmax a P ( A = a T )
• Evaluate different neighborhood types:
– varying neighborhood size n=[1, 2, 5, 10] and
content amount t=[5, 10, 15, 25, 50, 100, 200]
– 10-fold cross validation with 100 random
restarts for every n and t parameter combination
User Model
ì
1
ï
D if
u ³ 0.5
u
q
f
F =í
1+ e
ï
otherwise.
î R
t : Washington Post Columnist…
ft vi : [ w1 = 1, w2 = 1,… , wn = 0]
t : We're watching you House @GOP
vi
ft vk = [ w1 = 1, w2 = 1,… , wn = 0 ]
vj
vk - ?
t :… Ron Paul not a fan of Chris Christie
ft j : [ w1 = 0, w2 = 0,… , wn = 0 ]
v
Train Graph
Test Graph
HLTCOE Text Meeting, June 09 2014
Neighbor Model
ì
1
ï D if
³ 0.5
N ( u)
N ( u)
q
f
F =í
1+ e
ï
otherwise.
î R
t : Obama: I'd defend @MajorCBS
ft
N ( vi )
t :@FoxNews: WATCH LIVE
= [ w1, w2 ,… , wn ]
ft
vi
vj
N ( vk )
= [ w1, w2 ,… , wn ]
vk - ?
t : The Lyin King #RepMovies
N v
ft ( j ) = [ w1, w2 ,… , wn ]
Train Graph
Test Graph
Joint User-Neighbor Model
ì
1
ï D if
³ 0.5
u+N ( u)
u+N ( u)
q
f
F
=í
1+ e
ï
otherwise.
î R
t : Washington Post Columnist…
Obama: I'd defend @MajorCBS
v +N ( vi )
ft i
= [ w1, w2 ,… , wn ]
t :@FoxNews: WATCH LIVE
We're watching you House @GOP
vi
ft
vj
vk +N ( vk )
= [ w1, w2 ,… , wn ]
vk - ?
t :… Ron Paul not a fan of Christie
The Lyin King #RepublicanMovies
ft
v j +N ( v j )
= [ w1, w2 ,… , wn ]
Train Graph
Test Graph
Learning on user and neighbor features jointly (not prefixing features)
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i.
How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i.
How to learn and predict on the fly?
0.80
User: 0.82
?
20
Neighbor: 0.63
15
Neighbors
Gender Prediction
friend.counts
usermention.binary
10
0.73
Accuracy
0.60
0.70
5
0
50
100
Tweets
150
200
retweet.counts
usermention.counts
20
User-Neigh: 0.73
0.50
Useruni
Userbin
bi
User
Usertri
UserOnlyZLR
Neighbors
15
10
5
5
10
20
50 100
Tweets Per User
500
0
50
100
Tweets
150
40
200
Lexical Markers for Gender
Gender Prediction Quality
Approach
Rao et al., 2010
Users Tweets
Features
Accuracy
1K
405
BOW+socioling
0.72
Burger et al., 2011
184K
22
username, BOW
0.92
Zamal et al., 2012
384
10K
neighbor BOW
0.80
−
BOW, clusters
0.90
Bergsma et al., 2013 33.8K
JHU models
383
200/2K BOW user/neigh
0.82/0.73
• This is not a direct comparison => Twitter data sharing
restrictions
• Poor generalization: different datasets = different
sampling and annotation biases
Age Prediction
Neighbor: 0.72
15
Neighbors
0.70
20
User: 0.77
?
18 – 23
follower.counts
friend.counts
10
23 – 25
Accuracy
0.60
5
0
50
100
Tweets
150
200
friend.counts
retweet.counts
20
User-Neigh: 0.77
Useruni
Userbin
bi
User
5
10
20
50 100
Tweets Per User
Usertri
UserOnlyZLR
500
Neighbors
0.50
15
10
5
0
50
100
Tweets
150
200
Lexical Markers for Age
Age Prediction Quality
Approach
Users Tweets
?
Groups
Features
Accuracy
Rao et al., 2010
2K
1183
<=30; > 30
BOW+socioling
0.74
Zamal et al., 2012
386
10K
18 – 23; 23 - 25
neighbor BOW
0.80
JHU models
381
200/2K 18 – 23; 23 - 25 BOW/neighbors 0.77/0.74
• This is not a direct comparison!
• Performance for different age groups
• Sampling and annotation biases
Political Preference
friend.counts
retweet.binary
20
Neighbor: 0.91
User: 0.89
0.91
Neighbors
0.85
?
15
10
Accuracy
0.65
0.75
5
0
50
100
Tweets
friend.counts
retweet.binary
20
150
200
usermention.binary
User-Neigh: 0.92
0.55
Useruni
Userbin
bi
User
5
10
20
50 100
Tweets Per User
Usertri
UserOnlyZLR
500
Neighbors
15
10
5
0
50
100
Tweets
150
200
Lexical Markers for Political
Preferences
Model Generalization
Geo-centric
Accuracy
1.00
0.72
0.57
Active
0.87 0.92
0.91
0.89
0.80
0.60
Cand-centric
0.67
0.75
0.69
0.40
0.20
0.00
User
Neighbor
User-Neighbor
• Political preference classification is not easy!
• Topical sparsity: average users rarely tweet about politics
Political Preference Prediction Quality
Politically Active Users (sampling/annotation bias)
Approach
Users
Tweets
Features
Accuracy
Bergsma et al., 2013
400
5K
BOW, clusters
0.82
Pennacchiiotti 2011
10.3K
−
BOW, network
0.89
Conover et al., 2011
1K
1K
BOW, network
0.95
Zamal et al., 2012
400
1K
neighbor BOW
0.91
JHU active
371
200
BOW user/neigh 0.89/0.92
1,051
200
BOW user/neigh 0.72/0.75
JHU cand centric
Random /Average Users
JHU geo-centric
270
200
Cohen et al., 2013
262
1K
BOW user/neigh 0.57/0.67
BOW, network
0.68
Querying more neighbors with less
tweets is better than querying more
tweets from the existing neighbors
Limited
Twitter
API Calls
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Summary: Static Prediction
• Features: Binary (political) vs. count-based features (age, gender)
• Homophily: “neighbors give you away” => users with no content
• Attribute assortativity: similarity with neighbors depends on
attribute types
• Content from more neighbors per user >> additional content from
the existing neighbors
• Generalization of the classifiers
N
UN
Friend
Friend
Follower
Mention
Retweet
Mention
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i.
How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i.
How to learn and predict on the fly?
Iterative Bayesian Predictions
?
(
)
Pt 2 R Tt 2 = 0.62
(
)
(
)
Pt k R Tt k = 0.77
(
)
Pt 2 R Tt k-1 = 0.65
Pt1 R Tt1 = 0.52
?
t1
t0
P (a = R T ) =
Posterior
t2
t k-1
Õ P ( tk a = R ) × P ( a = R )
Õ P (t
k
k
k
…
tk
Class prior
a = R ) × P ( a = R ) + Õ P ( tk a = D ) × P ( a = D )
k
Likelihood
Time
0.6
0.5
1.0
0.5
0.9
0.4
p(Republican|T)
p(Republican|T)
0
20Updates
40
Cand-Centric Graph: Posterior
0.8
0.7
0.6
0.5
blican|T)
20
40
60
0.2
0.1
0
20
40
Tweet Stream (T)
60
?
?
Time
Time
0.4
0.3
0.3
0.0
0
0.5
60
t0
t1
t 2 t k-1
…
t0
t1
t2
t k-1
…
User Stream
500
User-Neighbor
500
Dem
Rep
400
Users
Users classified
correctly
Users
Cand-Centric: Prediction Time (1)
0.95
0.75
300
0
5
10
Time in Weeks
Dem
Rep
400
300
15
0
1 2 3 4
Time in Weeks
Prediction confidence: 0.95 vs. 0.75
Democrats are easier to predict than republicans
5
Cand-Centric Graph: Prediction Time (2)
How much time does it take to classify 100 users with
75% confidence?
Compare: User Stream vs. Joint User-Neighbor Stream
Weeks (log scale)
Cand-centric
Geo-Centric
100
12
10
20
19
1.2
1
Active
3.5
8.9
3.2
1.1
0.1
0.01
0.001
0.02
0.01
0.002 0.001
60
Batch vs. Online Performance
User Batch
Neighbor Batch
1.0
Accuracy
0.8
0.72
0.86
0.75
0.67
0.75
0.57
0.6
User Stream
1.0
0.99
0.88
0.89
0.8
0.6
0.4
0.2
0.2
0.0
0.0
Geo
0.99 0.99
0.84
0.4
Cand
User-Neighbor Stream
Active
Cand
Geo
Active
?
61
Summary: Online Inference
• Homophily: Neighborhood content is useful*
• Lessons learned from batch predictions:
– Age: user-follower or user-mention joint stream
– Gender: user-friend joint stream
– Political: user-mention and user-retweet joint stream
• Streaming models >> batch models
• Activeness: tweeting frequency matters a lot!
• Generalization of the classifiers: data sampling and
annotation biases
*Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b;
Golbeck et al., 2011; Zamal et al., 2012; Volkova et. al., 2014
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i.
How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i.
How to learn and predict on the fly?
Iterative Batch Learning
?
t2
t1
…
…
R
tm
D
?
t1
t2
…
t1
t2
…
Pt1 ( R t1 ) = 0.52
t0
t1
t2
tm
Unlabeled
?
tm
Labeled
t1
 Iterative Batch
Retraining (IB)
 Iterative Batch
with Rationale
Filtering (IBR)
Pt k ( R t1 … tm ) = 0.77
…
tk
Time
u
F ( u, t 1 )
 Active Without
Oracle (AWOO)
…
F ( n, t 1 )
ni Î N
Unlabeled
Labeled
Active Learning
 Active With
Rationale Filtering
(AWR)
…
 Active With Oracle
(AWO)
Pt ( R t1 ) = 0.5 Pt ( R t1 … t5 ) = 0.55 Pt ( R t1 … t100 ) = 0.77 > q
0
k-1
1
t0
1-Jan-2011
t1
1-Feb-2011
t k-1
1-Nov-2011
tk
Time
1-Dec-2011
Annotator Rationales
feature norms
(psychology),
feature sparsity
Rationales are explicitly highlighted ngrams in tweets that best
justified why the annotators made their labeling decisions
Bergsma and Van Durme, 2013; Volkova and Yarowsky, 2014; Volkova and Van Durme, 2015
Alternative: Rationale Weighting
• Annotator rationales for gender, age and political:
http://www.cs.jhu.edu/~svitlana/rationales.html
• Multiple languages: English, Spanish
• Portable to other languages
Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales. Svitlana
Volkova and David Yarowsky. NIPS Workshop on Personalization: Methods and Applications 2014.
Performance Metrics
• Accuracy over time:
#correctly classified TD + TR
Aq ,t =
=
# above threshold
D+ R
• Find optimal models:
– Data steam type (user, friend, user + friend)
– Time (more correctly classified users faster)
– Prediction quality (better accuracy over time)
Results: Iterative Batch Learning
user + friend
300
1.0
250
0.8
200
0.6
150
0.4
100
0.2
50
0.0
Mar
Jun
Sep
Correctly classified
user
Accuracy
Correctly classified
user
IBR: higher precision
user + friend
300
1.0
250
0.8
200
0.6
150
0.4
100
0.2
50
0.0
Mar
Jun
Sep
Time: # correctly classified users increases over time
IB faster, IBR slower
Data stream selection:
User + friend stream > user stream
Accuracy
IB: higher recall
Results: Active Learning
user
300
1.0
250
0.8
200
0.6
150
0.4
100
0.2
50
0.0
Mar
Jun
Sep
Correctly classified
user + friend
Accuracy
Correctly classified
user
AWR: higher precision
user + friend
300
1.0
250
0.8
200
0.6
150
0.4
100
0.2
50
0.0
Mar
Jun
Sep
Time:
Unlike IB/IBR models, AWOO/AWR models classify
more users correctly faster (in Mar) but then plateaus
Accuracy
AWOO: higher recall
Results: Model Quality
IB: user
IBR: user
0.9
0.8
0.7
0.6
0.9
0.8
0.7
0.6
0.5
0.5
Mar
1.0
AWOO: user
AWR: user
1.0
Accuracy
Accuracy
1.0
Jun
Sep
IBR: user + friend
IB: user + friend
0.9
1.0
Accuracy
Accuracy
user + friend > user
batch < active
0.8
0.7
0.9
0.8
0.7
0.6
0.6
0.5
0.5
Mar
Jun
Sep
Mar
Jun
Sep
AWR: user + friend
AWOO: user + friend
Mar
Jun
Sep
Active with Oracle Annotations
Thousands of tweets in training
16.8
34.0
30.5
63.9
47.9
103.0
71.3
157.4
122.9
271.0
Apr
Jun
Aug
Oct
Dec
250
200
150
100
50
Feb
50
Cumul. requests
to Oracle
Oracle is
100% correct
Correctly classified
user 1.0
friend 1.7
Users in training for user only model
160
182
198
213
234
350
275
200
user
user + friend
125
50
Feb
Apr
Jun
Aug
Oct
Dec
Summary: Dynamic Learning and
Prediction
• Active learning > iterative batch
• N, UN > U: “neighbors give you away”
• Higher confidence => higher precision,
lower confidence => higher recall (as expected)
• Rationales significantly improve results
Practical Recommendations:
Models for Targeted Advertising
Models
without
rationale
filtering
IB, AWOO
Lower
confidence
threshold 0.55
Data steam
(user, friend
or joint)
User + Friend > User
Time
(correctly
classified
users
faster)
Prediction
quality
(better accuracy
over time)
Models with
rationale
filtering
IBR, AWR
Higher
confidence
threshold 0.95
Recap: Why these models are good?
• Models streaming nature of social media
• Limited user content => take advantage of
neighbor content
• Actively learn from crowdsourced rationales
• Learn on the fly => data drift
• Predict from multiple streams => topical sparsity
• Flexible extendable framework:
– More features: word embeddings, interests, profile info,
tweeting behavior
Software Requirements
• Python: https://www.python.org/downloads/ python –V
• Pip: https://pip.pypa.io/en/latest/installing.html
python get-pip.py
• Twython: https://pypi.python.org/pypi/twython/
pip install twython
• matplotlib 1.3.1:
http://sourceforge.net/projects/matplotlib/files/matplotlib/
• numpy 1.8.0: http://sourceforge.net/projects/numpy/files/NumPy/
• scipy 0.13: http://sourceforge.net/projects/scipy/files/scipy/
• scikit-learn 0.14.1: http://sourceforge.net/projects/scikit-learn/files/
python
python
python
python
-c
-c
-c
-c
"import
"import
"import
"import
sklearn; print sklearn.__version__"
numpy; print numpy.version.version"
scipy; print scipy.version.version"
matplotlib; print matplotlib.__version__"
Part II. Practice Session Outline
• Details on data collection and annotation
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for online inference
– Bayesian updates from multiple data streams
Part II. Practice Session Outline
• Details on data collection and annotation
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for online inference
– Bayesian updates from multiple data streams
JHU: Data Overview and Annotation
Scheme
Political Preferences:
– Candidate-Centric = 1,031 users (follow candidates)
– Geo-Centric = 270 users (self-reports in DE, MD, VA)
– Politically Active* = 371 users (active & follow cand)
hashtag
friend
Age (self-reports)*: 387 users
reply
Gender (name)*: 384 users
10 - 20 neighbors of each of 6 types
retweet
follower @mention
Explain relationships
Details on Twitter data collection:
http://www.cs.jhu.edu/~svitlana/data/data_collection.pdf
*Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013
Links to Download JHU Attribute Data
• How does the data look like?
– graph_type.neighbor_type.tsv e.g., cand-centric.follower.tsv
• JHU gender and age:
http://www.cs.jhu.edu/~svitlana/data/graph_gender_age.tar.gz
• JHU politically active*:
http://www.cs.jhu.edu/~svitlana/data/graph_zlr.tar.gz
• JHU candidatecentric:http://www.cs.jhu.edu/~svitlana/data/graph_cand.tar.gz
• JHU geocentric:http://www.cs.jhu.edu/~svitlana/data/geo_cand.tar.gz
Code to query Twitter API
• Repo: https://bitbucket.org/svolkova/queryingtwitter
– get lists of friends/followers for a user
– 200 recent tweets for k randomly sampled
retweeted or mentioned users
– tweets for a list of userIDs
userIDs/tweetIDs
JSON
Objects
Extract text fields
time, #friends
Tweet
Collection
Part II. Practice Session Outline
• Data and annotation schema description
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference:
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for streaming inference:
– Bayesian updates from multiple data streams
MSR: Psycho-Demographic Annotations
via Crowdsourcing
5K profiles
Trusted
crowd
$6/hour
quality
control
UL
5K
Attribute
Models
ΦA(u)
UP
Ethnicity
Gender
Children
Age
Life…
Income
Optimism
Education
Political
Religion
Relationship
Intelligence
0.0
Millions!
0.5
1.0
Cohen's Kappa
(2% random sample)
MSR: Emotion Annotations via
Distant Supervision
6 Ekman’s Emotions hashtags (Mohammad et al.’14)
+ emotion synonym hashtags
Part II. Practice Session
• Data and annotation schema description
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference:
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for streaming inference:
– Bayesian updates from multiple data streams
How to get MSR models and code?
https://bitbucket.org/svolkova/psycho-demographics
1.
2.
3.
Load models for 15 psycho-demographic attributes + emotions
Extract features from input tweets
Apply pre-trained models to make predictions for input tweets
Predictive Models
Supervised text classification
Log-linear models
ì
1
ï a0
³ 0.5,
-q × f
F ( u) = í
1+ e
ïî a otherwise.
1
User-based:
• Lexical: normalized binary/count-based ngrams
• Affect: emotions, sentiments
Tweet-based:
• BOW + Negation, Stylistic +0.3F1
• Socio-linguistic and stylistic:
• Elongations Yaay, woooow,
• Capitalization COOL, Mixed Punctuation ???!!!
• Hashtags and Emoticons
Tweet-based: Emotion Prediction
6 classes: joy, sadness, fear, surprise, disgust, anger
F1=0.78 (Roberts’12 0.67, Qadir’13 0.53,
Mohammad’14 0.49)
Disgust
Anger
Joy
Fear
Surprise
Sadness
0.92
0.80
0.79
0.77
0.64
0.62
0.00
0.50
1.00
F1 score (higher is better)
User-Based: Attribute Prediction
ì
1
ï a0
³ 0.5,
-q × f
F ( u) = í EmoSentOut
1+ e
ïî a EmoSentDiff
1 otherwise.
Above + Lexical
EmoSentOut
EmoSentDiff
Above + Lexical
0.97
Gender
0.95
0.77
Education
0.88
0.75
Income
0.85
0.73
Life Satisfaction
0.84
0.72
Optimism
0.83
0.72
Intelligence
0.83
0.72
Age
0.83
0.72
Political
0.82
0.66
Children
0.8
0.63
Religion
0.74
0.63
Relationship
0.85
0.84
0.83
0.83
0.83
0.82
0.8
0.74
0.74
ROC AUC
ROC AUC
0.6
0.4
0.2
0.0
1.0
1.2
0.8
1.0
0.6
0.8
0.4
0.6
0.2
0.2
0.4
0.0
0.0
0.74
ROC AUC
1.2
Relationship
+0.04
+0.05
+0.11
+0.12
+0.12
+0.11
+0.08
+0.17
+0.10
+0.08
+0.11
+0.11
0.88
1.0
0.95
Ethnicity
0.8
0.93
0.97
0.9
Race
Gender
Gender
Education
Education
Intelligence
Income
Income
ife Satisfaction
Life Satisfaction
Optimism
Optimism
Intelligence
Children
Age
Political
Political
Age
Children
Relationship
Religion
Religion
Ethnicity
Gain over
BOW
2
Column Z-Score Dissatisfied
-2
AUC ROC Pessimist
2
Satisfied
Optimist
life_satisf
optimism
relationship
religion
political
children
age
gender Male
income
education
race
intelligence
l
o
r
r
p
c
a
g
i
e
r
i
Emotion and Opinion Features
neutral
positive
joy
e_score
surprise
anger
s_score
neutralfear
sadness
positive
joy
negative
e_score
surprise
fear
Predicted Attributes
1/3 attributes AUC >=75%
sadness
disgust
Female
1
0.76
0.76
0.58
0.58
0.65
0.66
0.72
0.76
0.75
0.76
0.69
0.73
negative
No Kids
Below 25 y.o.
0
Column Z-Score
disgust
1
anger
0
s_score
-2
Color Key
and Histogram
0
Count
Color Key
and Histogram
0
Count
Predicting Demographics from User
Outgoing Emotions and Opinions
How to get JHU models and code?
https://bitbucket.org/svolkova/attribute
 Ex1: Train and test batch models
 Ex2: Train a model from a training file and save it
 Ex3: Predict an attribute using a pre-trained model and plot
iterative updates
 Ex4: Predict and plot iterative updates for multiple attributes
using pre-trained models from a single communication stream
 Ex5: Predict and plot iterative updates for multiple attributes
from multiple communication streams
Ex1. Train/Test Batch Models
• Run as e.g., for gender:
• Customize features and model type/parameters:
Accuracy
Ex2. Save Pre-trained Models
• Run as e.g., age:
• Customize features (process.py), model type and
parameters (predict.py)
Part II. Practice Session
• Data and annotation schema description
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference:
– User-based: psycho-demographics
– Tweet-based: emotions, opinions
• Python examples for streaming inference:
– Bayesian updates from multiple data streams
Recap: Iterative Bayesian Updates
?
(
)
Pt 2 R Tt 2 = 0.62
(
)
(
)
Pt k R Tt k = 0.77
(
)
Pt 2 R Tt k-1 = 0.65
Pt1 R Tt1 = 0.52
?
t1
t0
P (a = R T ) =
Posterior
t2
t k-1
Õ P ( tk a = R ) × P ( a = R )
Õ P (t
k
k
k
…
tk
Class prior
a = R ) × P ( a = R ) + Õ P ( tk a = D ) × P ( a = D )
k
Likelihood
Time
Ex3. Iterative Updates for a Single
Attribute from a Single Stream
Ex4. Iterative Updates for Multiple
Attributes from a Single Stream
Steps:
1. Loading Models
2. Processing data
3. Setting up train/test priors
4. Making Predictions
5. Plotting results
Joint User-Neighbor Streams
friend
reply
hashtag
retweet
follower @mention
Ex5. Iterative Updates for Multiple
Attributes from Joint Streams
Likelihood
Posterior
Questions?
http://www.cs.jhu.edu/~svitlana/
svitlana@jhu.edu
References: http://www.cs.jhu.edu/~svitlana/references.pdf
Slides: http://www.cs.jhu.edu/~svitlana/slides.pptx
Download