Nikolay Archak,Anindya Ghose,Panagiotis G. Ipeirotis ------------------------------------------------------------ Class Presentation By: Arunava Bhattacharya INDEX •Introduction •Importance of Consumer product reviews •Opinion mining problems •Possible Solutions •Background •Proposed Model •Proposed Algorithm •Experimental Results •Related Works Importance of consumer product reviews Consumer product reviews has significant impact on consumer buying decisions and consumer generated product information on Internet attract more product interest than vendor information Reasons: •More user oriented •Evaluate the product from user’s perspective •Often considered trustworthy by the customers Opinion Mining Problems •Earlier methods failed to achieve high accuracy Reasons: •Targeted primarily at evaluating the polarity of the review. •Review sentiments were classified as +ive or –ive by looking for occurrences of specific sentiment phrases. Possible Solutions •Identify not only the opinions of the customers but also examine the importance of these opinions. •Capture reliably the pragmatic meaning of the customer evaluations. •E.g: Is “Good battery life” better than “nice battery life” ? •Follow a hedonic regression model in which weight of individual feature determine the overall price of a product. Background Hedonic Regressions •The hedonic model assumes that differentiated goods can be described by vectors of objectively measured features. •Designed to estimate the value that different product aspects contribute to a consumer’s utility. •A backpacking tent can be decomposed to characteristics such as weight(w),capacity(c), and pole material(p).Tent utility can be given by the function u(w,c,p,..). •Weakness: Identify manually product features and measurement scales of them. Product Feature Identification •Part of speech tagger: Identify the word is a noun or adjective. Nouns and noun phrases are popular candidates for product features. •Search for statistical patterns in the text (words and phrases that appear frequently in the review). •Hybrid Model: POS tagger is used as a preprocessing step before applying association rule mining algorithm to discover noun and noun phrases. Mining Consumer Opinions •Feature mining technique is used to identify product features. •Algorithms extract sentences that give positive or negative opinions for a product feature. •A summary is produced using the discovered information. Such techniques fail to the strength of the underlying evaluations. Proposed Model Identifying Customer Opinions •Each n features can be expressed by a noun chosen from the set of all nouns appeared in the review. •Consumers typically use adjectives such as “Bad”, “Good”, “Amazing” to evaluate the quality. So a syntactic dependency parser is used to identify the adjectives. •Result is pairs of product features and their respective evaluations. These pairs are referred as Opinion Phrases. Structuring the opinion phrase space I •Model multiple sets of n product features as elements of a vector space with basis f1,….,fn. This is called feature space(F). • Construct evaluations as a vector space with basis e1,e2,….,em and it is called evaluation space(E). •Review Space(R) is constructed by the tensor product of evaluation and feature space: R=F E Structuring the opinion phrase space II •Set of opinion phrases fi ej form a basis of review space and is called the basis (V) of review space. •Weight of the opinion phrase ‘phrase’ in review ‘rev’ for product ‘pro’ is given by: w(phrase,rev,prod)=N(phrase,rev,prod)+s ∑y€V (N(y,rev,prod)+s) --(1) N(y,rev,prod)=number of occurrence s of opinion phrase y, in r for product p S=‘smoothing ‘ constant Econometric model of product reviews I •Product demand can be modeled as a function of product characteristics and price: ln(Dkt )=ak + βln(pkt )+€kt ---------(2) Dkt = Demand for product p at time t Pkt = Price of product p at time t β = Price elasticity ak = Product specific constant term •Drawback: Can not evaluate seperately different product characteristics. Mixes all product feature in single term ak . Econometric model of product reviews II Solution: •Repalce ak = α + ψ(Wkt ) ---------(3) Where α= time product invarient constant Wkt = all opinions for product k available at time t, including all reviews before t. ψ=Bilinear form of features and evaluations Ψ((Wkt )= ∑ phraseєV ψ(x).w(phrase,reviews t ,product k ) = ∑i=1n ∑j=1m ψ(fi ej ).w((fi ej ), reviews t , product k ) Econometric model of product reviews III •Using Equations 2 and 3 we can extend the linear model: ln(Dkt )= α + βln(pkt )+ ψ(Wkt ) +€kt Drawback: Large number of parameters and require a very large training set of product reviews to estimate. Solution: Reduce the model dimension by placing a rank constraint on the matrix ψ. In other words ψ(x) can be decomposed as a product of feature component and the evaluation component. ψ(shots fantastic)=γ(shots)δ(fantastic) Econometric model of product reviews IV •Using the rank 1 approximation of the tensor product fuctional we can rewrite the eqn. 3 as: ln(Dkt )= α + β.pkt + γ T .Wkt . δ +€kt -----(4) γ = Vector containing n elements corresponding to weight of each product feature. δ= Vector containing the implicit score that each evaluation assigns to a product feature. • Decrease the total number of parameters but loss the linearity of the original model. Proposed Algorithm Algorithm: •Based on the observation that if one of the vectors γ or δ is fixed the equation becomes linear. •Steps: 1. Set δ to a vector of initial feature weights 2. Minimize the fit function by choosing the optimal evaluation weights(γ) assuming that the feature weights (δ) are fixed. 3. Minimize the fit function by choosing the optimal feature weights(δ) assuming that the evaluation weights(γ) are fixed. 4. Repeat step 2 and 3 until the algorithm converges. Experimental Evaluation Data • The data set covered “Camera & Photo” (115 products) and “Audio & Video” (127 products) from Amazon.com. •Each observation contains the collection date, the product ID, the price(with possible discounts) ,suggested retail price, the sales rank of the product and rating. •Amazon Web Services are also used to collect the full set of reviews for each product. •Each product on both category had about 20 reviews on average. Selecting feature and Evaluation words •Steps: 1. Used a part of speech tagger to analyze the reviews and assign a part of speech tag to each word. 2. Selected a subset of approximately 30 nouns to use as product features. For example “Camera & Photo” category the set of features included “battery/batteries”, “screen/lcd/display” ,”software” etc. 3. Extracted the adjectives that evaluated the selected product features by a syntactic dependency parser. Kept the list of 30 most frequent adjectives to create the evaluation space. Words like “amazing”, ”bad”, “great” appeared here. Experimental Setup I •Amazon.com reports the sales rank instead of product demand. •Using the following Pareto relationship convert sales rank into product demand: ln(D)=a + b.ln(S)--------------------(5) Where D=Unobserved product demand S= Its observed sales rank a>0 ,b<0 are industry specific parameters. •Include both the suggested retail price (P1) and the price on amazon.com (P2) because prices will influence product demand. •Include the review rating variable(R). Experimental Setup II •Modify the equation (4) as the following: ln(Skt )=α+β1 .Rkt +β2 .ln(P1kt) + β3 .ln(P2kt) + ∑ i=1m ∑ j=1n W ktij . γi. δj + єkt = α+β. ykt + γT . W kt . δ + єkt --------(6) Here W kt is the review matrix and W ktij is calculated using equation (1). Experimental Results •After obtaining the review matrix this model can predict future sales •This model can identify the product feature weights and the evaluation scores associated with the adjectives , within the context of an electronic market. Experimental Results •Feature and Evaluation table for “Camera & Photo” •Higher score in Evaluation table means increase in sale and therefore negative since sales rank on amazon.com is inversely proportional to demand. Experimental Results •Partial effects for the “Camera & Photo “ product category. •Negative sign implies decrease in sales rank and means higher sales. Evaluation Conclusions •Results show that this model can identify the features important to the customers. • Implicit evaluation scores for each adjective can be derived. •Evaluations like “best camera”, “excellent camera”, “perfect camera” have a negative effect on demand. •Weak positive opinions like nice and decent are also evaluated in negative manner. Related Work •The feature selection in this model is very close to the one presented by Hu and Liu (2004). •Opinion strength analysis by Popescu and Etzioni(2005). •Das and Chen’s examination on bulletin board on Yahoo which combines economic methods with text mining(2006). •Ghose and Ipeirotis ‘s work on econometric analysis(2006). Thank You