A World-Class Education, A World-Class City

advertisement
Click to Add Title
A Systematic Framework for Sentiment
Identification by Modeling User Social Effects
Kunpeng Zhang
Assistant Professor
Department of Information and Decision Sciences
University of Illinois at Chicago
kzhang6@uic.edu
Agenda
•
•
•
•
•
Introduction
Problem statement
Methodology
Experiments and results
Conclusion and future work
A World-Class Education, A World-Class City
Co-authors
• Yi Yang, Ph.D. student at Northwestern University
• Aaron Sun, Research Scientist, Samsung Research
America
• Hengchang Liu, Assistant Professor at University of
Science and Technology of China
A World-Class Education, A World-Class City
Introduction
• User generated content on social media platforms
• Data analysis for intelligent marketing decisions
• Voice of consumers
– Positive / negative aspects
A World-Class Education, A World-Class City
Problem Statement
•
Given a sentence (usually, it is user-generated content on
social media platforms, such as comments on Facebook,
tweets on Twitter, review on Amazon.com, etc.), we
classify it into one of three categories:
–
–
Positive: directly or indirectly praise something, e.g. “I love it! (^_^)”
Negative: directly or indirectly criticize something, e.g. “We don’t like it at all.
”
–
Objective: No sentiments, or express a fact. e.g. “Apple will release a new iPhone
in next two months.”
A World-Class Education, A World-Class City
Previous Work
• Bag-of-word approaches
– Collecting keywords [5, 7, 21, 26]
• Rule-based methods
– From the perspective of language characteristics [6, 22]
• Machine learning based methods
– Sentence-level and document-level [7, 8, 10, 29]
• However,
– None of them considers user social effects…
A World-Class Education, A World-Class City
Methodology
• Systematic framework
• Classification problem
• 4 major features:
–
–
–
–
Peer influence
User preference
User profile
Textual sentiment
A World-Class Education, A World-Class City
Methodology 1 – User Preference (UserPref)
• User preference can somehow reflects user sentiments.
• Item-based collaborative filtering on user-item matrix
– Row: user (millions)
– Column: brand (thousands)
– The element mij is 1 if user i “likes”
brand j, otherwise 0
m11, m12, …………, m1n
m21, m22, …………, m2n
……………
mm1, mm2, ……….., mmn
Note: “like” – like a brand on Facebook, following a brand on Twitter, give a high rating for a product on
Amazon, etc.
A World-Class Education, A World-Class City
Methodology 1 – User Preference (UserPref)
• Two important issues using collaborative filtering
– Data sparsity
• Integrate multiple low-lever items into fewer high-lever items
– “Mac” and “iPhone”  “Computer and Electronics”
– Similarity calculation and preference prediction
• Which similarity measure is better?
– Cosine, Pearson correlation, Tanimoto correlation,log-likelihood based,
Euclidean distance-based.
• Weighted sum strategy to approximate user preference
A World-Class Education, A World-Class City
Methodology 2 – Peer Influence (PeerInf)
• Herding behavior in social psychology.
– We assume that if most of previous comments in one discussion
are positive, it is likely to give a positive comment, and
similarly for the negative case.
– We randomly pick 1, 000 posts from 5 different Facebook
pages and 1, 000 discussion threads from 5 different airlines on
the Flyertalk.com forum. The average number of comments
per post and per thread is 794 and 32, respectively.
– The sentiments are identified by the state-of-the-art textual
algorithm.
A World-Class Education, A World-Class City
Methodology 2 – Peer Influence
A World-Class Education, A World-Class City
Methodology 2 – Peer Influence Modeling
A World-Class Education, A World-Class City
Methodology 3 – User Profile (GenCat)
• Female are more positive than male and fashion page has a
higher percentage of positive sentiments than politician
page on Facebook and Twitter.
Name (Topic)
Gender Positive ratio Number of comments + tweets
Barack Obama
(Politician)
M
0.61
F
0.69
Chicago Bulls
(Sports)
M
0.68
F
0.79
DKNY (Fashion)
M
0.94
F
0.96
6,837,096
462,092
14,284
A World-Class Education, A World-Class City
Methodology 4 – Textual Sentiment (TextSent)
• State-of-the-art textual sentiment identification
algorithm
• Ensemble method integrating three individual
algorithms
– Semantic rules based on language characteristics
– Numeric strength computing
– Bag-of-word
• Accuracy: ~86%
A World-Class Education, A World-Class City
Experiments and Results
• Data collection
–
–
–
–
Facebook: posts, comments, likes, user profile
Twitter: tweets, follower, user profile
Amazon: product and reviews
Flyertalk (airline discussion forum): discussions
• Data cleaning
– Remove spam users
A World-Class Education, A World-Class City
Experiments and Results
• The features of learning model for 4 datasets and their
differences. Topic is modified based on the raw Facebook
category. “×”: missed; “√”: existing.
Data source TextSent
Facebook
Comments
Twitter
Tweets
Amazon
Product reviews
Flyertalk
Airline discussions
UserPref
PeerInf
GenCat
Gender
Topic
User-post likes on category
√
√
Predefined category
User-category following
√
√
Predefined category
User-product rating
√
×
Product category
×
√
×
Airline types
A World-Class Education, A World-Class City
Experiments and Results
• Similarity measure check.
– MAE and RMSE to compare the average estimated error
between real preference and predicted preference
• Hadoop-based collaborative filtering implemented
by Mahout.
– Takes 34 and 21 minutes to approximate user
preferences for Facebook and Twitter
– Can NOT complete in 10 hours for single CPU.
A World-Class Education, A World-Class City
Experiments and Results
• Facebook data
• Twitter data
• Amazon.com data
A World-Class Education, A World-Class City
Experiments and Results
• Classification accuracy (SS: semantic + syntactic features
used in [28])
A World-Class Education, A World-Class City
Conclusion and Future Work
• We propose a systematic framework to identify
social media sentiments by modeling user social
effects: user preference, peer influence, user
profile, and textual sentiment itself.
• However,
– More networked data could be incorporated.
– More efficient algorithms to calculate user preference.
A World-Class Education, A World-Class City
Thank you
A World-Class Education, A World-Class City
Download