NAACL Tutorial Social Media Predictive Analytics Svitlana Volkova , Benjamin Van Durme

NAACL Tutorial Social Media Predictive Analytics Svitlana Volkova1, Benjamin Van Durme1,2, David Yarowsky1 and Yoram Bachrach3 1Center for Language and Speech Processing, Johns Hopkins University, 2Human Language Technology Center of Excellence, 3Microsoft Research Cambridge Tutorial Schedule Part I: Theoretical Session (2:00 – 4:30pm) Batch Prediction Online Inference Coffee Break (3:30 – 4:00pm) Dynamic Learning and Prediction Part II: Practice Session (4:30 – 5:30pm) Code and Data Tutorial Materials • Slides: – http://www.cs.jhu.edu/~svitlana/slides.pptx • Code and Data: – https://bitbucket.org/svolkova/queryingtwitter – https://bitbucket.org/svolkova/attribute – https://bitbucket.org/svolkova/psycho-demographics • References: – http://www.cs.jhu.edu/~svitlana/references.pdf Social Media Obsession Diverse Billions of messages Millions of users What do they think and feel? What is their demographics and personality? What do they like? What do they buy? Where do they go? First: a comment on privacy and ethics… Why is language in social media so interesting? • • • • Very Short – 140 chars Lexically divergent Abbreviated Multilingual Why is language in social media so challenging? • Data drift • User activeness => generalization • Topical sparsity => relationship, politics • Dynamic streaming nature DEMO Predictive Analytics Services • Social Network Prediction – https://apps.facebook.com/snpredictionapp/ • Twitter Psycho-Demographic Profile and Affect Inference – http://twitterpredictor.cloudapp.net (pswd: twitpredMSR2014) • My personality Project – http://mypersonality.org/wiki/doku.php • You Are What You Like – http://youarewhatyoulike.com/ • Psycho-demographic trait predictions – http://applymagicsauce.com/ • IBM Personality – https://watson-pi-demo.mybluemix.net • World Well Being Project – http://wwbp.org Applications: Retail Personalized marketing • Detecting opinions and emotions users express about products or services within targeted populations Personalized recommendations and search • Making recommendations based on user emotions, demographics and personality Applications: Advertising vs. Online targeted advertising • Targeting ads based on predicted user demographics • Matching the emotional tone the user expects Deliver adds fast Deliver adds to a true crowd vs. vs. Applications: Polling Real-time live polling vs. • Mining political opinions • Voting predictions within certain demographics Large-scale passive polling • Passive poling regarding products and services Applications: Health Large-scale real-time healthcare analytics • Identifying smokers, drug addicts, healthy eaters, people into sports (Paul and Dredze 2011) • Monitoring flue-trends, food poisonings, chronic illnesses (Culotta et. al. 2015) Applications: HR Recruitment and human resource management • Estimating emotional stability and personality of the potential and current employees • Measuring the overall well-being of the employees e.g., life satisfaction, happiness (Schwartz et. al. 2013; Volkova et. al., 2015) • Monitor depression and stress level (Coppersmith et. al. 2014) User Attribute Prediction Task Communications Political Preference … … Rao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013; Volkova et. al, 2014 Age … . . . Rao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013; Nguyen et al., 2011, 2013; Sap et al., 2014 Gender … Garera and Yarowsky, 2009; Rao et al., 2010; Burger et al., 2011; Van Durme, 2012; Zamal et al., 2012; Bergsma and Van Durme, 2013 AAAI 2015 Demo (joint work with Microsoft Research) Income, Education Level, Ethnicity, Life Satisfaction, Optimism, Personality, Showing Off, Self-Promoting Tweets Revealing User Attributes ? ? ? ? Supervised Models Classification: binary (SVM) – gender, age, political, ethnicity • Goswami et. al., 2009; Rao et al. 2010; Burger et al. 2011; Mislove et al. 2012; Nguyen et al. 2011; Nguyen et al. 2013; • Pennacchiotti and Popescu 2011; Connover et. al. 2011; Filippova et. al. 2012; Van Durme 2012; Bergsma et. al. 2012, 2013; Bergsma and Van Durme 2013; • Zamal et al. 2012; Ciot et. al. 2013; Cohen and Ruths 2013; • Schwartz et. al. 2013; Sap et. al., 2014; Kern et. al., 2014; Schwartz et. al. 2013; Golbeck et. al. 2011; Kosinski et. al. 2013; • Volkova et. al. 2014; Volkova et al. 2015. Unsupervised and Generative Models • name morphology for gender & ethnicity prediction - Rao et al. 2011; • large-scale clustering - Bergsma et. al. 2013; Culotta et. al. 2015; • demographic language variations - Eisenstein et al. 2010; O’Connor et al. 2010; Eisenstein et. al. 2014. *Rely on more than lexical features e.g., network, streaming Existing Approaches ~1K Tweets* …. … …. … Tweets as….a … document …. … …. Does an average Twitter user produce … thousands of tweets? …. … …. … …. … *Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013 How Active are Twitter Users? Attributed Social Network User Local Neighborhoods a.k.a. Social Circles Approaches Static (Batch) Prediction • Offline training • Offline predictions + Neighbor content Streaming (Online) Inference Dynamic (Iterative) Learning and Prediction • Offline training + Online predictions over time • Exploring 6 types of neighborhoods • • + + + Online predictions Relying on neighbors Iterative re-training Active learning Rationale annotation Data drift Streaming nature Model generalization Topical sparsity Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly? Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly? How to get data? Twitter API • Twitter API: https://dev.twitter.com/overview/api • Twitter API Status:https://dev.twitter.com/overview/status • Twitter API Rate Limits: https://dev.twitter.com/rest/public/rate-limits Querying Twitter API • Twitter Developer Account => access key and token https://dev.twitter.com/oauth/overview/applicationowner-access-tokens twitter = Twython(APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET) I. Access 1% Twitter Firehouse and sample from it II. Query Twitter API to get:  user timelines (up to 3200 tweets) from userIDs  tweet json objects from tweetIDs  lists of friendIDs (5000 per query) from userIDs JSON Objects Add predictions: sentiment, attributes, emotions MongoDB: http://docs.mongodb.org/manual/tutorial/getting-started/ How to get labeled data? • Supervised classification in a new domain: – Labeled data ≈ ground truth – Costly and time consuming to get! UL Attribute Model ΦA(u) UP • Ways to get ≈“ground truth” annotations:  Fun psychological tests (voluntarily): myPersonality project  Profile info: Facebook e.g., relationship, gender, age but sparse for Twitter  Self reports: “I am a republican…” (Volkova et al. 2013), “Happy ##th/st/nd/rd birthday to me” (Zamal et. al. 2012), “I have been diagnosed with …” (Coppersmith et. al. 2014), “I am a writer …” (Beller at. al., 2014)  Distant supervision: following Obama vs. Romney (Zamal et. al. 2012), emotion hashtags (Mohammad et. al, 2014), user name (Burger et. al., 2011)  Crowdsourcing: subjective perceived annotations (Volkova et. al.2015), rationales (Bergsma et. al., 2013, Volkova et. al, 2014; 2015) Twitter Social Graph friend reply hashtag retweet follower @mention 10 - 20 neighbors of 6 types per user I. Candidate-Centric (distant supervision) 1,031 users II. Geo-Centric (self-reports) 270 users III. Politically Active (distant supervision)* 371 users (Dem; Rep) IV. Age (self-reports)* 387 users (18 – 23; 23 - 25) V. Gender (name)* 384 users (Male; Female) Balanced datasets What types of neighbors lead to the best attribute predictions? *Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013 Code, data and trained models for gender, age, political preference prediction http://www.cs.jhu.edu/~svitlana/ Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly? Classification Model • Logistic regression = max entropy = log linear models – Map discrete inputs w to binary output y y = { M, F } Labeled users (Training) Vocabulary size hair eat coo l wor k … xbox Femal e 1 1 0 0 … 0 Male 0 hair 0 1 eat 0 0 coo 1 l 1 wor 1 k … … … 1 xbox 1 0 1 0 0 … 1 Male Test user ? • Other options: SVM, NB wi = {0,1} Feature vector http://scikit-learn.org/stable/modules/generated/ sklearn.linear_model.LogisticRegression.html Features (I) • Lexical: – normalized counts/binary ngrams (Goswami el. al. 2010; Rao et. al. 2010; Pennacchiotti and Popescu 2011; Ngyen et. al. 2013; Ciot et. al. 2013; Van Durme 2012; Kern et. al. 2014; Volkova et. al. 2014; Volkova and Van Durme 2015) – class-based highly predictive (Bergsma and Van Durme 2013), rationales (Volkova and Yarowsky 2014); character-based (Peersman et. al. 2011), stems, co-stems, lemmas (Zamal et. al. 2012; Cohen et. al. 2014) • Socio-linguistic, syntactic and stylistic: – syntax and style (Shler et. al. 2006; Cheng at. al., 2011), smiles, excitement, emoticons and psycho-linguistic (Rao et. al. 2010; Marquardt et. al. 2014; Kokkos et. sl. 2014; Hovy 2015) – lexicon features (Sap et. al. 2014); linguistic inquiry and word count (LIWC) (Mukherjee et. al. 2010; Fink et. al. 2012) Features (II) • Communication behavior: response/retweet/tweet frequency, retweeting tendency (Connover et. al. 2011; Golbeck et. al. 2011; Pennacchiotti and Popescu 2011; Preotic at. al. 2015) • Network structure: follower-following ratio, neighborhood size, in/out degree, degree of connectivity (Bamman et. al. 2012; Filippova 2012; Zamal et. al. 2012, Culotta et. al. 2015) • Other: likes (Bachrach et. al. 2012; Kosinski et. al. 2014), name or census (Burger et. al. 2011; Liu and Ruths 2013), links/images (Rosenthal and McKeown 2011) • Topics: word embeddings, LDA topics, word clusters (Preotic at. al. 2015) Female hair eat cool work … xbox RT 1 1 0 0 … 0 0.3 neigh images 30 0.5 …. …. Batch Experiments • Log-linear word unigram models: (I) Users vs. (II) Neighbors and (III) User-Neighbor F = argmax a P ( A = a T ) • Evaluate different neighborhood types: – varying neighborhood size n=[1, 2, 5, 10] and content amount t=[5, 10, 15, 25, 50, 100, 200] – 10-fold cross validation with 100 random restarts for every n and t parameter combination User Model ì 1 ï D if u ³ 0.5 u q f F =í 1+ e ï otherwise. î R t : Washington Post Columnist… ft vi : [ w1 = 1, w2 = 1,… , wn = 0] t : We're watching you House @GOP vi ft vk = [ w1 = 1, w2 = 1,… , wn = 0 ] vj vk - ? t :… Ron Paul not a fan of Chris Christie ft j : [ w1 = 0, w2 = 0,… , wn = 0 ] v Train Graph Test Graph HLTCOE Text Meeting, June 09 2014 Neighbor Model ì 1 ï D if ³ 0.5 N ( u) N ( u) q f F =í 1+ e ï otherwise. î R t : Obama: I'd defend @MajorCBS ft N ( vi ) t :@FoxNews: WATCH LIVE = [ w1, w2 ,… , wn ] ft vi vj N ( vk ) = [ w1, w2 ,… , wn ] vk - ? t : The Lyin King #RepMovies N v ft ( j ) = [ w1, w2 ,… , wn ] Train Graph Test Graph Joint User-Neighbor Model ì 1 ï D if ³ 0.5 u+N ( u) u+N ( u) q f F =í 1+ e ï otherwise. î R t : Washington Post Columnist… Obama: I'd defend @MajorCBS v +N ( vi ) ft i = [ w1, w2 ,… , wn ] t :@FoxNews: WATCH LIVE We're watching you House @GOP vi ft vj vk +N ( vk ) = [ w1, w2 ,… , wn ] vk - ? t :… Ron Paul not a fan of Christie The Lyin King #RepublicanMovies ft v j +N ( v j ) = [ w1, w2 ,… , wn ] Train Graph Test Graph Learning on user and neighbor features jointly (not prefixing features) Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly? 0.80 User: 0.82 ? 20 Neighbor: 0.63 15 Neighbors Gender Prediction friend.counts usermention.binary 10 0.73 Accuracy 0.60 0.70 5 0 50 100 Tweets 150 200 retweet.counts usermention.counts 20 User-Neigh: 0.73 0.50 Useruni Userbin bi User Usertri UserOnlyZLR Neighbors 15 10 5 5 10 20 50 100 Tweets Per User 500 0 50 100 Tweets 150 40 200 Lexical Markers for Gender Gender Prediction Quality Approach Rao et al., 2010 Users Tweets Features Accuracy 1K 405 BOW+socioling 0.72 Burger et al., 2011 184K 22 username, BOW 0.92 Zamal et al., 2012 384 10K neighbor BOW 0.80 − BOW, clusters 0.90 Bergsma et al., 2013 33.8K JHU models 383 200/2K BOW user/neigh 0.82/0.73 • This is not a direct comparison => Twitter data sharing restrictions • Poor generalization: different datasets = different sampling and annotation biases Age Prediction Neighbor: 0.72 15 Neighbors 0.70 20 User: 0.77 ? 18 – 23 follower.counts friend.counts 10 23 – 25 Accuracy 0.60 5 0 50 100 Tweets 150 200 friend.counts retweet.counts 20 User-Neigh: 0.77 Useruni Userbin bi User 5 10 20 50 100 Tweets Per User Usertri UserOnlyZLR 500 Neighbors 0.50 15 10 5 0 50 100 Tweets 150 200 Lexical Markers for Age Age Prediction Quality Approach Users Tweets ? Groups Features Accuracy Rao et al., 2010 2K 1183 <=30; > 30 BOW+socioling 0.74 Zamal et al., 2012 386 10K 18 – 23; 23 - 25 neighbor BOW 0.80 JHU models 381 200/2K 18 – 23; 23 - 25 BOW/neighbors 0.77/0.74 • This is not a direct comparison! • Performance for different age groups • Sampling and annotation biases Political Preference friend.counts retweet.binary 20 Neighbor: 0.91 User: 0.89 0.91 Neighbors 0.85 ? 15 10 Accuracy 0.65 0.75 5 0 50 100 Tweets friend.counts retweet.binary 20 150 200 usermention.binary User-Neigh: 0.92 0.55 Useruni Userbin bi User 5 10 20 50 100 Tweets Per User Usertri UserOnlyZLR 500 Neighbors 15 10 5 0 50 100 Tweets 150 200 Lexical Markers for Political Preferences Model Generalization Geo-centric Accuracy 1.00 0.72 0.57 Active 0.87 0.92 0.91 0.89 0.80 0.60 Cand-centric 0.67 0.75 0.69 0.40 0.20 0.00 User Neighbor User-Neighbor • Political preference classification is not easy! • Topical sparsity: average users rarely tweet about politics Political Preference Prediction Quality Politically Active Users (sampling/annotation bias) Approach Users Tweets Features Accuracy Bergsma et al., 2013 400 5K BOW, clusters 0.82 Pennacchiiotti 2011 10.3K − BOW, network 0.89 Conover et al., 2011 1K 1K BOW, network 0.95 Zamal et al., 2012 400 1K neighbor BOW 0.91 JHU active 371 200 BOW user/neigh 0.89/0.92 1,051 200 BOW user/neigh 0.72/0.75 JHU cand centric Random /Average Users JHU geo-centric 270 200 Cohen et al., 2013 262 1K BOW user/neigh 0.57/0.67 BOW, network 0.68 Querying more neighbors with less tweets is better than querying more tweets from the existing neighbors Limited Twitter API Calls Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ? Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ? Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ? Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ? Summary: Static Prediction • Features: Binary (political) vs. count-based features (age, gender) • Homophily: “neighbors give you away” => users with no content • Attribute assortativity: similarity with neighbors depends on attribute types • Content from more neighbors per user >> additional content from the existing neighbors • Generalization of the classifiers N UN Friend Friend Follower Mention Retweet Mention Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly? Iterative Bayesian Predictions ? ( ) Pt 2 R Tt 2 = 0.62 ( ) ( ) Pt k R Tt k = 0.77 ( ) Pt 2 R Tt k-1 = 0.65 Pt1 R Tt1 = 0.52 ? t1 t0 P (a = R T ) = Posterior t2 t k-1 Õ P ( tk a = R ) × P ( a = R ) Õ P (t k k k … tk Class prior a = R ) × P ( a = R ) + Õ P ( tk a = D ) × P ( a = D ) k Likelihood Time 0.6 0.5 1.0 0.5 0.9 0.4 p(Republican|T) p(Republican|T) 0 20Updates 40 Cand-Centric Graph: Posterior 0.8 0.7 0.6 0.5 blican|T) 20 40 60 0.2 0.1 0 20 40 Tweet Stream (T) 60 ? ? Time Time 0.4 0.3 0.3 0.0 0 0.5 60 t0 t1 t 2 t k-1 … t0 t1 t2 t k-1 … User Stream 500 User-Neighbor 500 Dem Rep 400 Users Users classified correctly Users Cand-Centric: Prediction Time (1) 0.95 0.75 300 0 5 10 Time in Weeks Dem Rep 400 300 15 0 1 2 3 4 Time in Weeks Prediction confidence: 0.95 vs. 0.75 Democrats are easier to predict than republicans 5 Cand-Centric Graph: Prediction Time (2) How much time does it take to classify 100 users with 75% confidence? Compare: User Stream vs. Joint User-Neighbor Stream Weeks (log scale) Cand-centric Geo-Centric 100 12 10 20 19 1.2 1 Active 3.5 8.9 3.2 1.1 0.1 0.01 0.001 0.02 0.01 0.002 0.001 60 Batch vs. Online Performance User Batch Neighbor Batch 1.0 Accuracy 0.8 0.72 0.86 0.75 0.67 0.75 0.57 0.6 User Stream 1.0 0.99 0.88 0.89 0.8 0.6 0.4 0.2 0.2 0.0 0.0 Geo 0.99 0.99 0.84 0.4 Cand User-Neighbor Stream Active Cand Geo Active ? 61 Summary: Online Inference • Homophily: Neighborhood content is useful* • Lessons learned from batch predictions: – Age: user-follower or user-mention joint stream – Gender: user-friend joint stream – Political: user-mention and user-retweet joint stream • Streaming models >> batch models • Activeness: tweeting frequency matters a lot! • Generalization of the classifiers: data sampling and annotation biases *Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b; Golbeck et al., 2011; Zamal et al., 2012; Volkova et. al., 2014 Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly? Iterative Batch Learning ? t2 t1 … … R tm D ? t1 t2 … t1 t2 … Pt1 ( R t1 ) = 0.52 t0 t1 t2 tm Unlabeled ? tm Labeled t1  Iterative Batch Retraining (IB)  Iterative Batch with Rationale Filtering (IBR) Pt k ( R t1 … tm ) = 0.77 … tk Time u F ( u, t 1 )  Active Without Oracle (AWOO) … F ( n, t 1 ) ni Î N Unlabeled Labeled Active Learning  Active With Rationale Filtering (AWR) …  Active With Oracle (AWO) Pt ( R t1 ) = 0.5 Pt ( R t1 … t5 ) = 0.55 Pt ( R t1 … t100 ) = 0.77 > q 0 k-1 1 t0 1-Jan-2011 t1 1-Feb-2011 t k-1 1-Nov-2011 tk Time 1-Dec-2011 Annotator Rationales feature norms (psychology), feature sparsity Rationales are explicitly highlighted ngrams in tweets that best justified why the annotators made their labeling decisions Bergsma and Van Durme, 2013; Volkova and Yarowsky, 2014; Volkova and Van Durme, 2015 Alternative: Rationale Weighting • Annotator rationales for gender, age and political: http://www.cs.jhu.edu/~svitlana/rationales.html • Multiple languages: English, Spanish • Portable to other languages Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales. Svitlana Volkova and David Yarowsky. NIPS Workshop on Personalization: Methods and Applications 2014. Performance Metrics • Accuracy over time: #correctly classified TD + TR Aq ,t = = # above threshold D+ R • Find optimal models: – Data steam type (user, friend, user + friend) – Time (more correctly classified users faster) – Prediction quality (better accuracy over time) Results: Iterative Batch Learning user + friend 300 1.0 250 0.8 200 0.6 150 0.4 100 0.2 50 0.0 Mar Jun Sep Correctly classified user Accuracy Correctly classified user IBR: higher precision user + friend 300 1.0 250 0.8 200 0.6 150 0.4 100 0.2 50 0.0 Mar Jun Sep Time: # correctly classified users increases over time IB faster, IBR slower Data stream selection: User + friend stream > user stream Accuracy IB: higher recall Results: Active Learning user 300 1.0 250 0.8 200 0.6 150 0.4 100 0.2 50 0.0 Mar Jun Sep Correctly classified user + friend Accuracy Correctly classified user AWR: higher precision user + friend 300 1.0 250 0.8 200 0.6 150 0.4 100 0.2 50 0.0 Mar Jun Sep Time: Unlike IB/IBR models, AWOO/AWR models classify more users correctly faster (in Mar) but then plateaus Accuracy AWOO: higher recall Results: Model Quality IB: user IBR: user 0.9 0.8 0.7 0.6 0.9 0.8 0.7 0.6 0.5 0.5 Mar 1.0 AWOO: user AWR: user 1.0 Accuracy Accuracy 1.0 Jun Sep IBR: user + friend IB: user + friend 0.9 1.0 Accuracy Accuracy user + friend > user batch < active 0.8 0.7 0.9 0.8 0.7 0.6 0.6 0.5 0.5 Mar Jun Sep Mar Jun Sep AWR: user + friend AWOO: user + friend Mar Jun Sep Active with Oracle Annotations Thousands of tweets in training 16.8 34.0 30.5 63.9 47.9 103.0 71.3 157.4 122.9 271.0 Apr Jun Aug Oct Dec 250 200 150 100 50 Feb 50 Cumul. requests to Oracle Oracle is 100% correct Correctly classified user 1.0 friend 1.7 Users in training for user only model 160 182 198 213 234 350 275 200 user user + friend 125 50 Feb Apr Jun Aug Oct Dec Summary: Dynamic Learning and Prediction • Active learning > iterative batch • N, UN > U: “neighbors give you away” • Higher confidence => higher precision, lower confidence => higher recall (as expected) • Rationales significantly improve results Practical Recommendations: Models for Targeted Advertising Models without rationale filtering IB, AWOO Lower confidence threshold 0.55 Data steam (user, friend or joint) User + Friend > User Time (correctly classified users faster) Prediction quality (better accuracy over time) Models with rationale filtering IBR, AWR Higher confidence threshold 0.95 Recap: Why these models are good? • Models streaming nature of social media • Limited user content => take advantage of neighbor content • Actively learn from crowdsourced rationales • Learn on the fly => data drift • Predict from multiple streams => topical sparsity • Flexible extendable framework: – More features: word embeddings, interests, profile info, tweeting behavior Software Requirements • Python: https://www.python.org/downloads/ python –V • Pip: https://pip.pypa.io/en/latest/installing.html python get-pip.py • Twython: https://pypi.python.org/pypi/twython/ pip install twython • matplotlib 1.3.1: http://sourceforge.net/projects/matplotlib/files/matplotlib/ • numpy 1.8.0: http://sourceforge.net/projects/numpy/files/NumPy/ • scipy 0.13: http://sourceforge.net/projects/scipy/files/scipy/ • scikit-learn 0.14.1: http://sourceforge.net/projects/scikit-learn/files/ python python python python -c -c -c -c "import "import "import "import sklearn; print sklearn.__version__" numpy; print numpy.version.version" scipy; print scipy.version.version" matplotlib; print matplotlib.__version__" Part II. Practice Session Outline • Details on data collection and annotation – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for online inference – Bayesian updates from multiple data streams Part II. Practice Session Outline • Details on data collection and annotation – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for online inference – Bayesian updates from multiple data streams JHU: Data Overview and Annotation Scheme Political Preferences: – Candidate-Centric = 1,031 users (follow candidates) – Geo-Centric = 270 users (self-reports in DE, MD, VA) – Politically Active* = 371 users (active & follow cand) hashtag friend Age (self-reports)*: 387 users reply Gender (name)*: 384 users 10 - 20 neighbors of each of 6 types retweet follower @mention Explain relationships Details on Twitter data collection: http://www.cs.jhu.edu/~svitlana/data/data_collection.pdf *Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013 Links to Download JHU Attribute Data • How does the data look like? – graph_type.neighbor_type.tsv e.g., cand-centric.follower.tsv • JHU gender and age: http://www.cs.jhu.edu/~svitlana/data/graph_gender_age.tar.gz • JHU politically active*: http://www.cs.jhu.edu/~svitlana/data/graph_zlr.tar.gz • JHU candidatecentric:http://www.cs.jhu.edu/~svitlana/data/graph_cand.tar.gz • JHU geocentric:http://www.cs.jhu.edu/~svitlana/data/geo_cand.tar.gz Code to query Twitter API • Repo: https://bitbucket.org/svolkova/queryingtwitter – get lists of friends/followers for a user – 200 recent tweets for k randomly sampled retweeted or mentioned users – tweets for a list of userIDs userIDs/tweetIDs JSON Objects Extract text fields time, #friends Tweet Collection Part II. Practice Session Outline • Data and annotation schema description – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference: – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for streaming inference: – Bayesian updates from multiple data streams MSR: Psycho-Demographic Annotations via Crowdsourcing 5K profiles Trusted crowd $6/hour quality control UL 5K Attribute Models ΦA(u) UP Ethnicity Gender Children Age Life… Income Optimism Education Political Religion Relationship Intelligence 0.0 Millions! 0.5 1.0 Cohen's Kappa (2% random sample) MSR: Emotion Annotations via Distant Supervision 6 Ekman’s Emotions hashtags (Mohammad et al.’14) + emotion synonym hashtags Part II. Practice Session • Data and annotation schema description – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference: – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for streaming inference: – Bayesian updates from multiple data streams How to get MSR models and code? https://bitbucket.org/svolkova/psycho-demographics 1. 2. 3. Load models for 15 psycho-demographic attributes + emotions Extract features from input tweets Apply pre-trained models to make predictions for input tweets Predictive Models Supervised text classification Log-linear models ì 1 ï a0 ³ 0.5, -q × f F ( u) = í 1+ e ïî a otherwise. 1 User-based: • Lexical: normalized binary/count-based ngrams • Affect: emotions, sentiments Tweet-based: • BOW + Negation, Stylistic +0.3F1 • Socio-linguistic and stylistic: • Elongations Yaay, woooow, • Capitalization COOL, Mixed Punctuation ???!!! • Hashtags and Emoticons Tweet-based: Emotion Prediction 6 classes: joy, sadness, fear, surprise, disgust, anger F1=0.78 (Roberts’12 0.67, Qadir’13 0.53, Mohammad’14 0.49) Disgust Anger Joy Fear Surprise Sadness 0.92 0.80 0.79 0.77 0.64 0.62 0.00 0.50 1.00 F1 score (higher is better) User-Based: Attribute Prediction ì 1 ï a0 ³ 0.5, -q × f F ( u) = í EmoSentOut 1+ e ïî a EmoSentDiff 1 otherwise. Above + Lexical EmoSentOut EmoSentDiff Above + Lexical 0.97 Gender 0.95 0.77 Education 0.88 0.75 Income 0.85 0.73 Life Satisfaction 0.84 0.72 Optimism 0.83 0.72 Intelligence 0.83 0.72 Age 0.83 0.72 Political 0.82 0.66 Children 0.8 0.63 Religion 0.74 0.63 Relationship 0.85 0.84 0.83 0.83 0.83 0.82 0.8 0.74 0.74 ROC AUC ROC AUC 0.6 0.4 0.2 0.0 1.0 1.2 0.8 1.0 0.6 0.8 0.4 0.6 0.2 0.2 0.4 0.0 0.0 0.74 ROC AUC 1.2 Relationship +0.04 +0.05 +0.11 +0.12 +0.12 +0.11 +0.08 +0.17 +0.10 +0.08 +0.11 +0.11 0.88 1.0 0.95 Ethnicity 0.8 0.93 0.97 0.9 Race Gender Gender Education Education Intelligence Income Income ife Satisfaction Life Satisfaction Optimism Optimism Intelligence Children Age Political Political Age Children Relationship Religion Religion Ethnicity Gain over BOW 2 Column Z-Score Dissatisfied -2 AUC ROC Pessimist 2 Satisfied Optimist life_satisf optimism relationship religion political children age gender Male income education race intelligence l o r r p c a g i e r i Emotion and Opinion Features neutral positive joy e_score surprise anger s_score neutralfear sadness positive joy negative e_score surprise fear Predicted Attributes 1/3 attributes AUC >=75% sadness disgust Female 1 0.76 0.76 0.58 0.58 0.65 0.66 0.72 0.76 0.75 0.76 0.69 0.73 negative No Kids Below 25 y.o. 0 Column Z-Score disgust 1 anger 0 s_score -2 Color Key and Histogram 0 Count Color Key and Histogram 0 Count Predicting Demographics from User Outgoing Emotions and Opinions How to get JHU models and code? https://bitbucket.org/svolkova/attribute  Ex1: Train and test batch models  Ex2: Train a model from a training file and save it  Ex3: Predict an attribute using a pre-trained model and plot iterative updates  Ex4: Predict and plot iterative updates for multiple attributes using pre-trained models from a single communication stream  Ex5: Predict and plot iterative updates for multiple attributes from multiple communication streams Ex1. Train/Test Batch Models • Run as e.g., for gender: • Customize features and model type/parameters: Accuracy Ex2. Save Pre-trained Models • Run as e.g., age: • Customize features (process.py), model type and parameters (predict.py) Part II. Practice Session • Data and annotation schema description – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference: – User-based: psycho-demographics – Tweet-based: emotions, opinions • Python examples for streaming inference: – Bayesian updates from multiple data streams Recap: Iterative Bayesian Updates ? ( ) Pt 2 R Tt 2 = 0.62 ( ) ( ) Pt k R Tt k = 0.77 ( ) Pt 2 R Tt k-1 = 0.65 Pt1 R Tt1 = 0.52 ? t1 t0 P (a = R T ) = Posterior t2 t k-1 Õ P ( tk a = R ) × P ( a = R ) Õ P (t k k k … tk Class prior a = R ) × P ( a = R ) + Õ P ( tk a = D ) × P ( a = D ) k Likelihood Time Ex3. Iterative Updates for a Single Attribute from a Single Stream Ex4. Iterative Updates for Multiple Attributes from a Single Stream Steps: 1. Loading Models 2. Processing data 3. Setting up train/test priors 4. Making Predictions 5. Plotting results Joint User-Neighbor Streams friend reply hashtag retweet follower @mention Ex5. Iterative Updates for Multiple Attributes from Joint Streams Likelihood Posterior Questions? http://www.cs.jhu.edu/~svitlana/ svitlana@jhu.edu References: http://www.cs.jhu.edu/~svitlana/references.pdf Slides: http://www.cs.jhu.edu/~svitlana/slides.pptx

NAACL Tutorial Social Media Predictive Analytics Svitlana Volkova , Benjamin Van Durme

Related documents

Products

Support

NAACL Tutorial Social Media Predictive Analytics Svitlana Volkova , Benjamin Van Durme

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib