Arunava_Presentation

advertisement
Nikolay Archak,Anindya Ghose,Panagiotis G. Ipeirotis
------------------------------------------------------------
Class Presentation By: Arunava Bhattacharya
INDEX
•Introduction
•Importance of Consumer product reviews
•Opinion mining problems
•Possible Solutions
•Background
•Proposed Model
•Proposed Algorithm
•Experimental Results
•Related Works
Importance of consumer product
reviews
Consumer product reviews has significant impact on
consumer buying decisions and consumer generated
product information on Internet attract more product
interest than vendor information
Reasons:
•More user oriented
•Evaluate the product from user’s perspective
•Often considered trustworthy by the customers
Opinion Mining Problems
•Earlier methods failed to achieve high accuracy
Reasons:
•Targeted primarily at evaluating the polarity of
the review.
•Review sentiments were classified as +ive or –ive
by looking for occurrences of specific sentiment
phrases.
Possible Solutions
•Identify not only the opinions of the customers but
also examine the importance of these opinions.
•Capture reliably the pragmatic meaning of the
customer evaluations.
•E.g: Is “Good battery life” better than “nice battery
life” ?
•Follow a hedonic regression model in which weight
of individual feature determine the overall price of a
product.
Background
Hedonic Regressions
•The hedonic model assumes that differentiated
goods can be described by vectors of objectively
measured features.
•Designed to estimate the value that different
product aspects contribute to a consumer’s utility.
•A backpacking tent can be decomposed to
characteristics such as weight(w),capacity(c), and
pole material(p).Tent utility can be given by the
function u(w,c,p,..).
•Weakness: Identify manually product features and
measurement scales of them.
Product Feature Identification
•Part of speech tagger: Identify the word is a noun
or adjective. Nouns and noun phrases are popular
candidates for product features.
•Search for statistical patterns in the text (words
and phrases that appear frequently in the review).
•Hybrid Model: POS tagger is used as a
preprocessing step before applying association rule
mining algorithm to discover noun and noun
phrases.
Mining Consumer Opinions
•Feature mining technique is used to identify
product features.
•Algorithms extract sentences that give positive or
negative opinions for a product feature.
•A summary is produced using the discovered
information.
Such techniques fail to the strength of the
underlying evaluations.
Proposed Model
Identifying Customer Opinions
•Each n features can be expressed by a noun chosen
from the set of all nouns appeared in the review.
•Consumers typically use adjectives such as “Bad”,
“Good”, “Amazing” to evaluate the quality. So a
syntactic dependency parser is used to identify the
adjectives.
•Result is pairs of product features and their respective
evaluations. These pairs are referred as Opinion
Phrases.
Structuring the opinion phrase space I
•Model multiple sets of n product features as
elements of a vector space with basis f1,….,fn. This is
called feature space(F).
• Construct evaluations as a vector space with basis
e1,e2,….,em and it is called evaluation space(E).
•Review Space(R) is constructed by the tensor
product of evaluation and feature space:
R=F E
Structuring the opinion phrase space II
•Set of opinion phrases fi ej form a basis of review
space and is called the basis (V) of review space.
•Weight of the opinion phrase ‘phrase’ in review ‘rev’
for product ‘pro’ is given by:
w(phrase,rev,prod)=N(phrase,rev,prod)+s
∑y€V (N(y,rev,prod)+s) --(1)
N(y,rev,prod)=number of occurrence s of opinion
phrase y, in r for product p
S=‘smoothing ‘ constant
Econometric model of product reviews I
•Product demand can be modeled as a function of
product characteristics and price:
ln(Dkt )=ak + βln(pkt )+€kt ---------(2)
Dkt = Demand for product p at time t
Pkt = Price of product p at time t
β = Price elasticity
ak = Product specific constant term
•Drawback: Can not evaluate seperately different
product characteristics. Mixes all product feature in
single term ak .
Econometric model of product reviews II
Solution:
•Repalce ak = α + ψ(Wkt ) ---------(3)
Where α= time product invarient constant
Wkt = all opinions for product k available at
time t, including all reviews before t.
ψ=Bilinear form of features and evaluations
Ψ((Wkt )= ∑ phraseєV ψ(x).w(phrase,reviews t ,product k )
= ∑i=1n ∑j=1m ψ(fi
ej ).w((fi
ej ), reviews t , product k )
Econometric model of product reviews III
•Using Equations 2 and 3 we can extend the linear model:
ln(Dkt )= α + βln(pkt )+ ψ(Wkt ) +€kt
Drawback: Large number of parameters and require a very
large training set of product reviews to estimate.
Solution: Reduce the model dimension by placing a rank
constraint on the matrix ψ. In other words ψ(x) can be
decomposed as a product of feature component and the
evaluation component.
ψ(shots fantastic)=γ(shots)δ(fantastic)
Econometric model of product reviews IV
•Using the rank 1 approximation of the tensor
product fuctional we can rewrite the eqn. 3 as:
ln(Dkt )= α + β.pkt + γ T .Wkt . δ +€kt -----(4)
γ = Vector containing n elements corresponding to
weight of each product feature.
δ= Vector containing the implicit score that each
evaluation assigns to a product feature.
• Decrease the total number of parameters but loss
the linearity of the original model.
Proposed Algorithm
Algorithm:
•Based on the observation that if one of the vectors γ or δ
is fixed the equation becomes linear.
•Steps:
1. Set δ to a vector of initial feature weights
2. Minimize the fit function by choosing the optimal
evaluation weights(γ) assuming that the feature
weights (δ) are fixed.
3. Minimize the fit function by choosing the optimal
feature weights(δ) assuming that the evaluation
weights(γ) are fixed.
4. Repeat step 2 and 3 until the algorithm converges.
Experimental Evaluation
Data
• The data set covered “Camera & Photo” (115 products)
and “Audio & Video” (127 products) from Amazon.com.
•Each observation contains the collection date, the
product ID, the price(with possible discounts) ,suggested
retail price, the sales rank of the product and rating.
•Amazon Web Services are also used to collect the full set
of reviews for each product.
•Each product on both category had about 20 reviews on
average.
Selecting feature and Evaluation words
•Steps:
1. Used a part of speech tagger to analyze the reviews
and assign a part of speech tag to each word.
2. Selected a subset of approximately 30 nouns to use as
product features. For example “Camera & Photo”
category the set of features included “battery/batteries”,
“screen/lcd/display” ,”software” etc.
3. Extracted the adjectives that evaluated the selected
product features by a syntactic dependency parser.
Kept the list of 30 most frequent adjectives to create the
evaluation space. Words like “amazing”, ”bad”, “great”
appeared here.
Experimental Setup I
•Amazon.com reports the sales rank instead of product
demand.
•Using the following Pareto relationship convert sales
rank into product demand:
ln(D)=a + b.ln(S)--------------------(5)
Where D=Unobserved product demand
S= Its observed sales rank
a>0 ,b<0 are industry specific parameters.
•Include both the suggested retail price (P1) and the
price on amazon.com (P2) because prices will influence
product demand.
•Include the review rating variable(R).
Experimental Setup II
•Modify the equation (4) as the following:
ln(Skt )=α+β1 .Rkt +β2 .ln(P1kt) + β3 .ln(P2kt) +
∑ i=1m ∑ j=1n W ktij . γi. δj + єkt
= α+β. ykt + γT . W kt . δ + єkt --------(6)
Here W kt is the review matrix and W ktij is
calculated using equation (1).
Experimental Results
•After obtaining the review matrix this model can
predict future sales
•This model can identify the product feature
weights and the evaluation scores associated with
the adjectives , within the context of an electronic
market.
Experimental Results
•Feature and Evaluation table for “Camera & Photo”
•Higher score in Evaluation table means increase in sale
and therefore negative since sales rank on amazon.com
is inversely proportional to demand.
Experimental Results
•Partial effects for the
“Camera & Photo “
product category.
•Negative sign implies
decrease in sales rank
and means higher sales.
Evaluation Conclusions
•Results show that this model can identify the features
important to the customers.
• Implicit evaluation scores for each adjective can be
derived.
•Evaluations like “best camera”, “excellent camera”,
“perfect camera” have a negative effect on demand.
•Weak positive opinions like nice and decent are also
evaluated in negative manner.
Related Work
•The feature selection in this model is very close to
the one presented by Hu and Liu (2004).
•Opinion strength analysis by Popescu and
Etzioni(2005).
•Das and Chen’s examination on bulletin board on
Yahoo which combines economic methods with
text mining(2006).
•Ghose and Ipeirotis ‘s work on econometric
analysis(2006).
Thank You
Download