Opinion Mining : A Multifaceted Problem Lei Zhang University of Illinois at Chicago Some slides are based on Prof. Bing Liu’s presentation Introduction Most text information processing methods (e.g. web search, text mining) work with factual information but not deal with opinion information. Opinion Mining Computational study expressed in text of opinions, sentiments Why opinion mining now? mainly because of the Web, we can get huge volumes of opinionated text Why opinion mining is important Whenever we need to make a decision, we would like to hear other’s advice. In the past. Individual : Friends or family. Business : Surveys and consultants. Word of mouth on the Web People can express their opinions in reviews, forum discussions, blogs… A popular problem Intellectually challenging & major applications A popular research topic in recent years in NLP (Natural Language Processing) and Web data mining A lot of companies in US. It touches every aspect of NLP and is well-scoped. Potentially it would be a major application for NLP But this problem is NOT easy. An example review “I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …” What we see? Opinions, targets of opinions, and opinion holders Target entity Definition (entity): an entity e is a product, person, event or organization. e is represented as a hierarchy of components, sub-components, and so on. Each node represents a component and is associated with a set of attributes of the component. An opinion can be expressed on any node or attribute of the node. To simplify our discussion, we use the term features to represent both components and attributes. What is an opinion An opinion is a quintuple (ej, fjk, soijkl, hi, tl), where ej is a target entity. fjk is a feature of the entity ej. soijkl is the sentiment value of the opinion of the opinion holder hi on feature fjk of entity ej at time tl. soijkl is +ve, -ve, or neu, or a more granular rating. hi is an opinion holder. tl is the time when the opinion is expressed. Opinion mining objective Objective: Discover all quintuples (ej, fjk, soijkl, hi, tl), i.e., mine the five corresponding pieces of information in each quintuple, and Or, solve some simpler problems. With given an opinionated document, the quintuples, Unstructured Text Structured Data Traditional data and visualization tools can be used to slice, dice and visualize the results in all kinds of ways Enable qualitative and quantitative analysis. Sentiment classification: doc-level Classify a document (e.g., a review) based on the overall sentiment expressed by opinion holder Classes: positive, or negative (and neutral) It assumes Each document focuses on a single entity and contains opinions from a single opinion holder. Subjectivity analysis : sentence-level Sentence-level sentiment analysis has two tasks: Subjectivity classification: Subjective or objective. Objective: e.g., “I bought an iPhone a few days ago.” Subjective: e.g., “It is such a nice phone.” Sentiment classification: For subjective sentences or clauses, classify positive or negative. Positive: e.g., “It is such a nice phone.” Negative: e.g., “The screen is bad.” Feature-based sentiment analysis Sentiment classification at both document and sentence (or clause) levels are NOT sufficient, they do not tell what people like and/or dislike A positive opinion on an entity does not mean that the opinion holder likes everything. An negative opinion on an entity does not mean that the opinion holder dislikes everything. Feature-based opinion summary “I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …” …. Feature based summary: Feature1: Touch screen Positive: 212 The touch screen was really cool. The touch screen was so easy to use and can do amazing things. … Negative: 6 The screen is easily scratched. I have a lot of difficulty in removing finger marks from the touch screen. … Feature2: battery life … Visual comparison Summary of reviews of Cell Phone 1 + _ Voice Comparison of reviews of + Cell Phone 1 Cell Phone 2 _ Screen Battery Size Weight Feature-based opinion summary Opinion mining is challenging “This past Saturday, I bought a Nokia phone and my girlfriend bought a Moto phone with Bluetooth. We called each other when we got home. The voice on my phone was not so clear, worse than my previous phone. The battery life was long. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality. So my purchase was a real disappointment. I returned the phone yesterday.” Opinion mining is a multifaceted problem (ej, fjk, soijkl, hi, tl), ej - an entity: Named entity extraction (more) fjk - a feature of ej: Information extraction soijkl is sentiment: Sentiment determination hi is an opinion holder: Information/Data Extraction tl is the time: Data Extraction Co-reference resolution Relation extraction Synonym match (voice = sound quality) … Entity extraction (competing entities) An entity can be a product, service, organization or event in opinion document. person, “This past Saturday, I bought a Nokia phone and my girlfriend bought a Moto phone with Bluetooth.” Nokia and Moto(Motorola) are entities. Why we need entity extraction Without knowing the entity, the piece of opinion has little value. Companies want to know the competitors in the market. This is the first step to understand the competitive landscape from opinion documents. Related work Named entity recognition (NER) Aims to identity entities such as names of persons, organizations and locations in natural language text. Our problem is similar to NER problem, but with some differences. 1. 2. 3. 4. 5. Fine grained entity classes (products, service) rather than coarse grained entity classes (people, location, organization ) Only want a specific type: e.g. a particular type of drug names. Neologism : e.g. “Sammy” (Sony) , “SE” (Sony-Ericsson) Feature sparseness (lack of contextual patterns) Data noise (over-capitalization , under-capitalization) NER methods Supervised learning methods The current dominant technique for addressing the NER problem Hidden Markov Models (HMM) Maximum Entropy Models (ME) Support Vector Machines (SVM) Conditional Random Field (CRF) Shortcomings: Rely on large sets of labeled examples. Labeling is labor-intensive and time-consuming. NER methods Unsupervised learning methods Mainly clustering. Gathering named entities from clustered groups based on the similarity of context. The techniques rely on lexical resources (e.g., WordNet), on lexical patterns and on statistics computed on a large unannotated corpus. Shortcomings: low precision and recall for the result NER methods Semi-supervised learning methods Show promise for identifying and labeling entities. Starting with a set of seed entities, semi-supervised methods use either class specific patterns to populate an entity class or distributional similarity to find terms similar to the seeds. Specific methods: Bootstrapping Co-traning Distributional similarity Set expansion problem To find competing entities, the extracted entities must be relevant, i.e., they must be of the same class/type as the user provided entities. The user can only provide a few names because there are so many different brands and models. Our problem is actually a set expansion problem, which expands a set of given seed entities. Set expansion problem Given a set Q of seed entities of a particular class C, and a set D of candidate entities, we wish to determine which of the entities in D belong to C. That is, we “grow” the class C based on the set of seed examples Q. This is a classification problem. However, in practice, the problem is often solved as a ranking problem. Distributional similarity Distributional similarity is classical method for set expansion problem. It compares the similarity of the word distribution of the surround words of a candidate entity and the seed entities, and then ranking the candidate entities based on the similarity values. Our experiment shows this approach is inaccurate. Positive and unlabeled learning model (PU learning model) A two-class classification model. Given a set P of positive examples of a particular class and a set U of unlabeled examples (containing hidden positive and negative cases), a classifier is built using P and U for classifying the data in U or future test cases. The set expansion problem can be mapped into PU learning exactly. S-EM algorithm S-EM is an algorithm under PU learning model. It is based on Naïve Bayes classification and Expectation Maximum (EM) algorithm. The main idea of S-EM is to use spy technique to identify some reliable negatives (RN) from the unlabeled set U, and then use an EM algorithm to learn from P, RN and U-RN . We use classification score to rank entities. S-EM algorithm (Liu et.al, ICML 2002) Our algorithm (Li, Zhang, et al., ACL 2010) Given positive set P and unlabelled set U, S-EM produces a Bayesian classifier C, which is used to classify each vector u U and to assign a probability p (+|u) to indicate the likelihood that u belongs to the positive class. Entity ranking Rank candidate d : Let Md be the median of {P(+|Vector 1), P(+|Vector 2), P(+|Vector 3), ……, P(+|Vector n)}. The final score (fs) for d is defined as: fs (d )=Md * log ( 1 + n ) Where n is the frequency count of candidate entity d in the corpus. A high fs (d) implies a high likelihood that d is in the expanded entity set. Candidate entities with higher median score and higher frequency count in the corpus will be ranked high. Feature extraction “The voice on my phone was not so clear, worse than my previous phone. The battery life was long” Feature indicators (1) Dependency relation Opinions words modify object features, e.g., “This camera takes great pictures” Exploits the dependency relations of Opinions and features to extract Features. Given a set of seed opinion words (no feature input), we can extract features and also opinion words iteratively. Extraction rules Feature extraction (2) Part-whole relation pattern A part-whole pattern indicates one object is part of another object. It is a good indicator for features if the class concept word (the “whole” part) is known. (3) “No” pattern a specific pattern for product review and forum posts. People often express their comments or opinions on features by this short pattern (e.g. no noise) Feature ranking Rank extracted feature candidates by feature importance. If a feature candidate is correct and important, it should be ranked high. For unimportant feature or noise, it should be ranked low. Two major factors affecting the feature importance. Feature relevance: it describes how possible a feature candidate is a correct feature. Feature frequency: a feature is important, if appears frequently in opinion documents. HITS algorithm for feature relevance There is a mutual enforcement relation between opinion words, part-whole relation and “no” patterns and features. If an adjective modifies many correct features, it is highly possible to be a good opinion word. Similarly, if a feature candidate can be extracted by many opinion words, part-whole patterns, or “no” pattern, it is also highly likely to be a correct feature. The Web page ranking algorithm HITS is applicable. Our algorithm ( Zhang, et al., COLING 2010) (1)Extract features by dependency relation, part-whole pattern etc. (2)Compute feature score using HITS without considering frequency. (3)The final score function considering the feature frequency S = S(f) * log (freq(f)) freq(f) is the frequency count of feature f. and S(f) is the authority score of feature f. Identify opinion orientation For each feature, we identify the sentiment or opinion orientation expressed by a reviewer. Almost all approaches make use of opinion words and phrases(Lexicon-based method). Some opinion words have context independent orientations, e.g., “great”. Some other opinion words have context dependent orientations, e.g., “short” Many ways to use opinion words. Machine learning methods for sentiment classification at the sentence and clause levels are also applicable. Aggregation of opinion words Input: a pair (f, s), where f is a feature and s is a sentence that contains f. Output: whether the opinion on f in s is positive, negative, or neutral. Two steps: Step 1: split the sentence if needed based on BUT words (but, except that, etc). Step 2: work on the segment sf containing f. Let the set of opinion words in sf be w1, .., wn. Sum up their orientations (1, -1, 0), and assign the orientation to (f, s) accordingly. wi .o n Step 2 can be changed to i1 d (w , f ) i with better results. wi.o is the opinion orientation of wi. d(wi, f) is the distance from f to wi. Basic opinion rules (Liu, Ch. in NLP Handbook) Negation rules: A negation word or phrase usually reverses the opinion expressed in a sentence. Negation words include “no” “not”, etc. e.g. “ this cellphone is not good.” But-clause rules: A sentence containing “but” also needs special treatment. The opinion before “but” and after “but” are usually the opposite to each other. Phrases such as “except that” “except for” behave similarly. e.g. “ I love Nicholas Cage but I really have no desire to see the Sorcerer’s Apprentice ” More… Two main types of opinion Direct Opinions: direct sentiment expressions on some entity or feature e.g., “the picture quality of this camera is great.” Comparative Opinions: Comparisons expressing similarities or differences of more than one entity or feature . Usually stating an ordering or preference. e.g., “car x is cheaper than car y.” Comparative opinions Gradable Non-Equal Gradable: Relations of the type greater or less than e.g: “optics of camera A is better than that of camera B” Equative: Relations of the type equal to e.g: “camera A and camera B both come in 7MP ” Superlative: Relations of the type greater or less than all others e.g: “camera A is the cheapest camera available in market” Mining comparative opinions (Jindal and Liu, SIGIR 2006; Ding, Liu, Zhang, KDD 2009) Objective: Given an opinionated document d,. Extract comparative opinions: (O1, O2, F, po, h, t), where O1 and O2 are the object sets being compared based on their shared features F, po is the preferred object set of the opinion holder h, and t is the time when the comparative opinion is expressed. Thank you