Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of Information and Decision Sciences University of Illinois at Chicago kzhang6@uic.edu Agenda • • • • • Introduction Problem statement Methodology Experiments and results Conclusion and future work A World-Class Education, A World-Class City Co-authors • Yi Yang, Ph.D. student at Northwestern University • Aaron Sun, Research Scientist, Samsung Research America • Hengchang Liu, Assistant Professor at University of Science and Technology of China A World-Class Education, A World-Class City Introduction • User generated content on social media platforms • Data analysis for intelligent marketing decisions • Voice of consumers – Positive / negative aspects A World-Class Education, A World-Class City Problem Statement • Given a sentence (usually, it is user-generated content on social media platforms, such as comments on Facebook, tweets on Twitter, review on Amazon.com, etc.), we classify it into one of three categories: – – Positive: directly or indirectly praise something, e.g. “I love it! (^_^)” Negative: directly or indirectly criticize something, e.g. “We don’t like it at all. ” – Objective: No sentiments, or express a fact. e.g. “Apple will release a new iPhone in next two months.” A World-Class Education, A World-Class City Previous Work • Bag-of-word approaches – Collecting keywords [5, 7, 21, 26] • Rule-based methods – From the perspective of language characteristics [6, 22] • Machine learning based methods – Sentence-level and document-level [7, 8, 10, 29] • However, – None of them considers user social effects… A World-Class Education, A World-Class City Methodology • Systematic framework • Classification problem • 4 major features: – – – – Peer influence User preference User profile Textual sentiment A World-Class Education, A World-Class City Methodology 1 – User Preference (UserPref) • User preference can somehow reflects user sentiments. • Item-based collaborative filtering on user-item matrix – Row: user (millions) – Column: brand (thousands) – The element mij is 1 if user i “likes” brand j, otherwise 0 m11, m12, …………, m1n m21, m22, …………, m2n …………… mm1, mm2, ……….., mmn Note: “like” – like a brand on Facebook, following a brand on Twitter, give a high rating for a product on Amazon, etc. A World-Class Education, A World-Class City Methodology 1 – User Preference (UserPref) • Two important issues using collaborative filtering – Data sparsity • Integrate multiple low-lever items into fewer high-lever items – “Mac” and “iPhone” “Computer and Electronics” – Similarity calculation and preference prediction • Which similarity measure is better? – Cosine, Pearson correlation, Tanimoto correlation,log-likelihood based, Euclidean distance-based. • Weighted sum strategy to approximate user preference A World-Class Education, A World-Class City Methodology 2 – Peer Influence (PeerInf) • Herding behavior in social psychology. – We assume that if most of previous comments in one discussion are positive, it is likely to give a positive comment, and similarly for the negative case. – We randomly pick 1, 000 posts from 5 different Facebook pages and 1, 000 discussion threads from 5 different airlines on the Flyertalk.com forum. The average number of comments per post and per thread is 794 and 32, respectively. – The sentiments are identified by the state-of-the-art textual algorithm. A World-Class Education, A World-Class City Methodology 2 – Peer Influence A World-Class Education, A World-Class City Methodology 2 – Peer Influence Modeling A World-Class Education, A World-Class City Methodology 3 – User Profile (GenCat) • Female are more positive than male and fashion page has a higher percentage of positive sentiments than politician page on Facebook and Twitter. Name (Topic) Gender Positive ratio Number of comments + tweets Barack Obama (Politician) M 0.61 F 0.69 Chicago Bulls (Sports) M 0.68 F 0.79 DKNY (Fashion) M 0.94 F 0.96 6,837,096 462,092 14,284 A World-Class Education, A World-Class City Methodology 4 – Textual Sentiment (TextSent) • State-of-the-art textual sentiment identification algorithm • Ensemble method integrating three individual algorithms – Semantic rules based on language characteristics – Numeric strength computing – Bag-of-word • Accuracy: ~86% A World-Class Education, A World-Class City Experiments and Results • Data collection – – – – Facebook: posts, comments, likes, user profile Twitter: tweets, follower, user profile Amazon: product and reviews Flyertalk (airline discussion forum): discussions • Data cleaning – Remove spam users A World-Class Education, A World-Class City Experiments and Results • The features of learning model for 4 datasets and their differences. Topic is modified based on the raw Facebook category. “×”: missed; “√”: existing. Data source TextSent Facebook Comments Twitter Tweets Amazon Product reviews Flyertalk Airline discussions UserPref PeerInf GenCat Gender Topic User-post likes on category √ √ Predefined category User-category following √ √ Predefined category User-product rating √ × Product category × √ × Airline types A World-Class Education, A World-Class City Experiments and Results • Similarity measure check. – MAE and RMSE to compare the average estimated error between real preference and predicted preference • Hadoop-based collaborative filtering implemented by Mahout. – Takes 34 and 21 minutes to approximate user preferences for Facebook and Twitter – Can NOT complete in 10 hours for single CPU. A World-Class Education, A World-Class City Experiments and Results • Facebook data • Twitter data • Amazon.com data A World-Class Education, A World-Class City Experiments and Results • Classification accuracy (SS: semantic + syntactic features used in [28]) A World-Class Education, A World-Class City Conclusion and Future Work • We propose a systematic framework to identify social media sentiments by modeling user social effects: user preference, peer influence, user profile, and textual sentiment itself. • However, – More networked data could be incorporated. – More efficient algorithms to calculate user preference. A World-Class Education, A World-Class City Thank you A World-Class Education, A World-Class City