Opinion Observer: Analyzing and Comparing Opinions on the Web Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation:Vinay Goel Introduction Web: excellent source of consumer opinions Online customer reviews of products Useful information to customers and product manufacturers Novel framework for analyzing and comparing customer opinions Technique based on language pattern matching to extract product features Opinion Observer Technical Tasks Identify product features that customers have expressed their opinions on For each feature, identify whether the opinion is positive or negative Review Format (2) - Pros, Cons and detailed review The paper proposes a technique to identify product features from pros and cons in this format Problem Statement Let P={P1,P2 … Pn} be a set of products that the user is interested in Each product Pi has a set of reviews Ri ={r1,r2 … rk} Each review rj is a sequence of sentences rj= {sj1,sj2 … sjm} Product Feature A product feature f in rj is an attribute/component of the product that has been commented on in rj If f appears in rj, explicit feature “The battery life of this camera is too short” If f does not appear in rj but is implied, implicit feature “This camera is too large” (size) Opinions and features Opinion segment of a feature Set of consecutive sentences that expresses a positive or negative opinion on f “The picture quality is good, but the battery life is short” Positive opinion set of a feature (Pset) Set of opinion segments of f that expresses positive opinions about f from all the reviews of the product Nset can be defined similarly Visualizing Opinion Comparison Automated opinion analysis Explicit and implicit features Synonyms Granularity of features Extracting Product Features Labeling Perform POS tagging and remove digits “<V>included<N>MB<V>is<Adj>stingy” Replace actual feature words with [feature] “<V>included<N>[feature]<V>is<Adj>stingy” Use n-gram to produce shorter segments “<V>included<N>[feature]<V>is” “<N>[feature]<V>is<Adj>stingy” Distinguish duplicate tags “<N1>[feature]<N2>usage” Perform word stemming Rule Generation Association Rule Mining Only need rules that have [feature] on the righthand-side (<N1>,<N2> --> [feature]) Consider the sequence of items in the conditional part (left-hand-side) of each rule Generate language patterns (<N1>[feature]<N2>) Feature Refinement strategies There may be a more likely feature in the sentence segment but not extracted by any pattern “slight hum from subwoofer when not in use” Frequent-Noun Only a noun replaces another noun Frequent-Term Any type replacement Semi-Automated Tagging of Reviews Extracting Reviews from Web Pages Non trivial task MDR-2 System finds patterns from page containing reviews System uses these patterns to extract reviews from other pages of the site System Architecture Experimental Results Experimental Results Amount of time saved by Semi-automatic tagging is around 45% Group synonyms using WordNet (52% recall and 100% precision) Does not handle context dependent synonyms Conclusion Novel visual analysis system Supervised pattern discovery method Interactive correction of errors of the automatic system Improve techniques, study strength of opinions