Opinion Observer: Analyzing and Comparing Opinions on the Web

advertisement
Opinion Observer: Analyzing
and Comparing Opinions on
the Web
Bing Liu, Minqing Hu, Junsheng
Cheng
Paper Presentation:Vinay Goel
Introduction
 Web: excellent source of consumer opinions
 Online customer reviews of products
 Useful information to customers and product
manufacturers
 Novel framework for analyzing and comparing
customer opinions
 Technique based on language pattern matching
to extract product features
Opinion Observer
Technical Tasks
 Identify product features that customers have
expressed their opinions on
 For each feature, identify whether the opinion is
positive or negative
 Review Format (2) - Pros, Cons and detailed
review
 The paper proposes a technique to identify
product features from pros and cons in this
format
Problem Statement
Let P={P1,P2 … Pn} be a set of products
that the user is interested in
Each product Pi has a set of reviews Ri
={r1,r2 … rk}
Each review rj is a sequence of sentences
rj= {sj1,sj2 … sjm}
Product Feature
 A product feature f in rj is an attribute/component
of the product that has been commented on in rj
 If f appears in rj, explicit feature
“The battery life of this camera is too short”
 If f does not appear in rj but is implied, implicit
feature
“This camera is too large” (size)
Opinions and features
 Opinion segment of a feature
Set of consecutive sentences that expresses a positive
or negative opinion on f
“The picture quality is good, but the battery life is short”
 Positive opinion set of a feature (Pset)
Set of opinion segments of f that expresses positive
opinions about f from all the reviews of the product
Nset can be defined similarly
Visualizing Opinion Comparison
Automated opinion analysis
Explicit and
implicit features
Synonyms
Granularity of
features
Extracting Product Features Labeling
 Perform POS tagging and remove digits
“<V>included<N>MB<V>is<Adj>stingy”
 Replace actual feature words with [feature]
“<V>included<N>[feature]<V>is<Adj>stingy”
 Use n-gram to produce shorter segments
“<V>included<N>[feature]<V>is”
“<N>[feature]<V>is<Adj>stingy”
 Distinguish duplicate tags
“<N1>[feature]<N2>usage”
 Perform word stemming
Rule Generation
 Association Rule Mining
 Only need rules that have [feature] on the righthand-side (<N1>,<N2> --> [feature])
 Consider the sequence of items in the
conditional part (left-hand-side) of each rule
 Generate language patterns
(<N1>[feature]<N2>)
Feature Refinement strategies
 There may be a more likely feature in the
sentence segment but not extracted by any
pattern
“slight hum from subwoofer when not in use”
 Frequent-Noun
Only a noun replaces another noun
 Frequent-Term
Any type replacement
Semi-Automated Tagging of Reviews
Extracting Reviews from Web Pages
Non trivial task
MDR-2
System finds patterns from page containing
reviews
System uses these patterns to extract reviews
from other pages of the site
System Architecture
Experimental Results
Experimental Results
Amount of time saved by Semi-automatic
tagging is around 45%
Group synonyms using WordNet (52%
recall and 100% precision)
Does not handle context dependent synonyms
Conclusion
 Novel visual analysis system
 Supervised pattern discovery method
 Interactive correction of errors of the automatic
system
 Improve techniques, study strength of opinions
Download