Visual Sentiment Analysis –UI Framework

advertisement
VISA: A VIsual Sentiment Analysis System
Dongxu Duan1 Weihong Qian1 Shimei Pan2
Lei Shi3 Chuang Lin4
1 IBM
Research
— China
2
3 Institute of Software
IBM T. J. Watson
Research Center Chinese Academy of Sciences
Sept. 2012
4
Tsinghua
University
What is Sentiment Analysis
• Sentiment analysis or opinion mining refers to the application
of natural language processing, computational linguistics,
and text analytics to identify and extract subjective information
in source materials. ---- From Wikipedia
• A survey of sentiment analysis works by Pang and Lee in 2008:
“Opinion mining and sentiment analysis”, cited 1189 times in Google
Scholar, including 326 references
A probably earliest study:
2
Motivation
The truth: sentiment analysis is becoming even more important
– Corporate
* Brand analysis, sales campaign design, etc.
* Crisis relationship management
– Government
• As we all know ..
Observations:
– Sentiment analysis technologies are going deeper and versatile:
* Aspect-oriented, domain-specific lexicon expansion, MT technology
– The average users are still leveraging rather simple sentiment
results
• It’s hard for them (even domain expert) to understand sophisticated SA results
– There is big gap and huge potential for sentiment visualization
(visual opinion mining)
3
Agenda
• Related Works
• Research Problem and Challenges
• Sentiment-Tuple based Data Model
• VISA System Framework
• Visualization Optimizations
• Cases
• User Studies
• Summary
4
Basic Sentiment Representation
• Raw text/table or simple visualization
Brand Association Map
COBRA (COrporate Brand and Reputation
Analysis)
Behal et al. (HCI 2009)
Opinion Observer
Liu et al. (KDD 2005); Liu et al. (IW3C2 2005)
Visual Sentiment Analysis of RSS News Feeds
Wanner et al. (VISSW 2009)
Pulse: Mining Customer Opinions from
Free Text
Gamon et al. (IDA 2005)
Visualizing Sentiments in Financial Texts
Ahmad and Almas (IV2005)
Visual Analysis of Conflicting Opinions
Chen et al. (VAST 2006)
Who Votes For What? A Visual Query
Language for Opinion Data
Draper and Riesenfeld
(Vis 2008)
Visual Opinion Analysis of Customer
Feedback Data
Summary Report of printers
Scatterplot of customer
reviews on printers
Oelke et al. (VAST 2009)
Circular Correlation Map
OpinionSeer: Interactive Visualization of
Hotel Customer Feedback
Wu et al. (InfoVis 2010)
Taking the Pulse of the Web: Assessing
Sentiment on Topics in Online Media
Brew et al. (WebSci 2010)
Understanding Text Corpora with Multiple
Facets
Shi et al. (VAST 2010)
Research Problem
• Can we design a sentiment visualization system that:
– Show how the sentiment evolves over time (trend)
– Visualize both the sentiment analysis results and the structured
facet data, e.g. profile of the reviewer (facet)
– Rather than only showing which document or feature tends to be
positive or negative, also demonstrate how the positives/
negatives are described in documents (context)
• Most existing sentiment visualization fails to meet all
the requirements simultaneously
– Our VISA design is based on the TIARA prototype, which already
brings together most features (trend, context, facet switching)
18
Retrospect on TIARA Visualization
(Emergency Room Record)
19
Challenges for TIARA Sentiment Visualization
• Failure of the document trend visualization
– Binary/ternary/scored classification of document-level
sentiments will drop valuable pieces
BUT: It has BED BUGS and they BITE me!!!
20
Challenges for TIARA Sentiment Visualization
• Keyword Summarization
– Content visualized are keywords summarized from all the text,
not echoing the sentiment-centric design
• Structured Facet
– Sentiment-aware facet associations and distributions
– Spatial (location) information
• Comparison
– Categorical, temporal comparison, and sentiment comparison
as well
• Compatibility with sentiment analysis engines
– Consumability of all kinds of sentiment analysis results
21
Sentiment Tuple
• {Aspect, feature, opinion, polarity}
– Aspect: a sub-topic shared by some document
In a hotel review, the room, the view, or the service
– Feature: specific object the users are commenting
Entity, person, location, or abstract concepts
– An opinion is a particular word or phrase describing a feature
– Polarity of the opinion word/phrase in the context
{ “view”, + }
aspect: feature: opinion: polarity
aspect: feature: opinion: polarity
……
Sentiment
Analysis Model
aspect: feature: opinion: polarity
aspect: feature: opinion: polarity
……
aspect: feature: opinion: polarity
aspect: feature: opinion: polarity
……
Aggregate
Keyword Summarization (TIARA)
kth document in
the collection
{…, P(Ti | Dk), …}
A set of topics
A set of topic
probabilities
{T1, …Ti,… TN }
A set of keywords
Rank the topics to present
most valuable ones first
{W1, …, Wj, …, WM}
A set of word probabilities
Select keyword sub-set for each
time segment for content summary {…, P(Wj | Ti), …}
{…} t-1, {…, Wj, …}t, {…} t+1,
VISA Sentiment Keyword Summarization
kth document in
the collection
{…, P(Ti | Dk), …}
Aspects/Hotels
A set of topic
probabilities
{C1, …Ci,… CN }
A set of sentiment keywords
(opinions/features)
Let user select to compare
aspects of a hotel or
an aspect of several hotels
{W1, …, Wj, …, WM}
A set of word probabilities
Select keyword sub-set for each time
{…, P(Wj | Ti), …}
segment for sentiment summary
{…} t-1, {…, Wj, …}t, {…} t+1,
VISA Mashup Visualization
Search
Filters
Sentiment
Tuple Trend
SentimentCentric
Document
Ranking
Sentiment
Snippets
Facet
Correlations
VISA Sentiment Visualization Framework
• Offline:
– Document pre-processing
– Sentiment analysis
– Meta data parsing
– Indexing
• Online:
– Data Retrieval
– Visualization
– Interactions
26
Offline Analysis
Data Analysis Framework
Raw Data
Reader
Filter
OpenNLP
Extractor
Segment Extractor
Sentence Extractor


Text Extractor

Dictionary
Sentiment
Entity Class
No/Not
StatisticManager
Entity Policy
Sentiment Data
Meta Data
aspect: feature: opinion: polarity
Index
IndexWriter
Offline Analysis
Raw Data
Reader
3rd Party Sentiment Analysis
Framework
Sentiment Data
Meta Data
aspect: feature: opinion: polarity
Index
IndexWriter
Data Server
VISA
Hermes
HttpServlet
Query Parser
Data Adapter
Data Retrieval
Lucene
Index
Sentiment Trend Optimizations
• Sentiment tuple based negative/positive/(neutral) trends
Time Sensitive Feature/Opinion words
Y axis: sentiment value
Positive
Negative
X axis: time
Sentiment-Centric Interactions
Case Study ---- Summarizing Hotel Reviews
• Initial View
32
Case Study ---- Summarizing Hotel Reviews
• Switch to
”Family”
type only
(traveling in this
type)
33
Case Study ---- Summarizing Hotel Reviews
• Click on the
“Free”
sentiment
word
(want to enjoy
the free time
or free
breakfast?)
• It’s 30 min
distance
from the
harbor!
34
Case Study ---- Summarizing Hotel Reviews
• For two
selected
hotels
• Drill down
to the
“cleanliness”
and “room”
aspects
• Switch to
the negative
sentiments
35
Case Study ---- Summarizing Hotel Reviews
• Comparing
the recent
reviews
36
Case Study ---- NFL on Twitter
• Crawling tweets from Twitter on the topic of National
Football League (NFL), from 03/2011 to 08/2011. (when
the famous lock out happened)
• 665360 tweets from 307973 users, with an average
length of 16.8 words.
• Tweet collection pre-processing:
– Classify into 5 content topics: “season play”, “player draft”,
“lockout bad”, “lockout end” and “football return”.
– Categorize according to the subject of the sentiments – 32 NFL
teams, by manually creating relevant subject keyword list for
each team (full/nick name, city, stadium, head, owner and super
stars)
37
Case Study ---- NFL on Twitter
• Overview of sentiments on content topics
– Reach peak in July when the new CBA signed
38
Case Study ---- NFL on Twitter
• Subject-comparing view on 4 NFL Teams
– “Green Bay Packers”, “Pittsburgh Steelers”, “New York Jets”, “New England Patriots”
– A very large RED “CBA” for the Steelers: the only team to vote “NO” to CBA
– “Brett Favre” for the Packers: the former NFL all-star quarterback in Packers, who has
claimed to return for several times. The fans are tired of the similar news at all.
39
User Study ---- Setup
• Subject
TripAdvisor
– VISA System with all functionalities
– TripAdvisor.com
– A plain text editor with search function
• Data
Text Editor
– HK hotel cases with 3 hotels’ reviews
– Both structured (ratings) and unstructured
(review comments) data inputs
• User
– 12 users (7 male, 5 female), age 26~35
– Each is given a gift as incentive
• Task
VISA
– TI: look up specific sentiment-related
information of a hotel (e.g. traveler’s ratings).
– T2: summarize opinions on a general aspect of
a hotel (e.g. the view of a hotel)
• Procedure
– Within-subject design: user perform all tasks
with all the systems
– Record user demographics, time of completion
and satisfactions and open-ended questions
40
User Study ---- Objective Results
• Three metrics: Elapsed time (in minutes), task
completion rate and task correctness.
Significant advantages of VISA
over the compared systems
(t-test significance p< 0.004~
0.034)
3
2.5
2
1.5
VISA
TripAdvisor
TextEditor
1
0.5
0
Time(min)
Completion
Correctness
VISA
1.66
1
0.75
TripAdvisor
2.94
0.81
0.42
TextEditor
2.69
0.86
0.67
41
User Study ---- Subjective Results
• Three metrics: Usefulness, userability and satisfaction.
5
4
3
2
VISA
TripAdvisor
TextEditor
1
0
Usefulness
Usability
Satisfaction
VISA
4.58
4.08
4.29
TripAdvisor
2.46
2.67
2.38
2.5
2.33
2.17
TextEditor
Subjective Evaluation Results
42
User Study ---- Open Surveys
• Why VISA is thought better than the baseline systems:
– “mash-up visualizations” and “rich interactions”
– “Mash-up visualizations provide more information and it’s
quite intuitive”, “rich interactions make it easy to search
what I want to know”
– Improvements to VISA: “it now needs some learning
efforts to use VISA”, “It could introduce better UI design
and richer interactions”.
43
Summary
• We have presented the VISA system for generic
sentiment visualization purpose
– The backend core is the new sentiment-tuple definition, as well
as the faceted data model
– In visualization, we introduce several critical optimizations over
TIARA in sentiment visualization scenarios: sentiment-tuple
based trending, sentiment keywords, comparison, sentiment in
document context, interactions
– Evaluated with two real-life case studies
– Conduct formal user study to compare with two baseline
systems and demonstrate the clear advantage
44
Thai
Korean
Traditional Chinese
Russian
Gracias
Thank You
English
Spanish
Obrigado
Brazilian Portuguese
Arabic
Danke
German
Grazie
Italian
Simplified Chinese
Merci
French
Japanese
Tamil
45
Hindi
Download