Product feature extraction from customer reviews is an important task in the field of opinion mining.
Extracted features help to assess feature based opinions written by the customers who bought a particular product and gave their valuable opinions concerning their satisfactions and criticisms. This helps future customers and vendors to know about the pros and cons of the product under consideration.
Due to unstructured format of the opinion text in most of the cases, it is necessary to formulate ways to extract product features both implicitly and explicitly visible in plain text reviews. In this paper, a process is discussed where frequent words associated with specific product features are identified with the help of a previously managed corpus of product reviews.
The process involves finding out the keywords or Ngrams that are frequently associated with the specific product features. These frequent associations are then normalized within each product feature scope with the popular tf.idf metric. Two different classification techniques are applied to associate unclassified review lines with appropriate product feature classes using the found word associations. Results are then evaluated by comparing with the human identified product features.
It has been seen to be widely popular among customers to study reviews of features for specific products prior to buying. Both expert and relatively
Naïve customers look for feature oriented opinions of other customers and their experiences with a purchased product. Due to vast collection of reviews and their unstructured presentation in the web, it is quite inconvenient and time consuming for the customers to summarize opinions related to specific product features by reading plain text reviews one after another. In the opinion lines, some of these features are mentioned explicitly such as “This product is very reasonably priced” to indicate a
‘ Price’ feature, whereas, most of the features are implicitly visible within the text. An example can be
“I do travel a lot so they get banged about.” to indicate a ‘ Portability’ feature.
The task of processing product reviews is very challenging because in most of the cases, the written reviews contain a informal use of terms, expressions and in some cases, also grammatically incorrect sentences and misspelled words. This poses a potential threat to the training of a system. Sometimes, very indicative terms for a product feature are used by few expert users infrequently, which are difficult to identify. Also, some terms can be associated with more than one product feature in similar or different product domains. Sentences containing general comments or descriptions of events that do not relate to any specific product feature, but only expresses opinion about the product can also be found. These issues complicate the task of identifying product features in customer reviews.
There are many websites that publishes reviews written by customers. Opinion Sites such as epinions, cnet, e-commerce sites such as amazon, blogs etc. are very popular sources where reviews can be found written by the customers expressing their opinion after buying or using a product. Most of these reviews are written in plain text, some of them have options where customers can provide their overall rating or recommendation by means of a ‘thumbs up/thumbs down’ notation. But, these reviews do not contain formatted opinions indicating a positive or negative recommendation specific to the product features. A system that can identify various implicit and explicit product features can contribute significantly towards product feature based review and recommendation classification. A process has been approached in this paper by using the frequent keywords and N-grams that are most likely to appear with the specific product features.
Yi et al. [1],[5] worked on designing a sentiment analyzer that extracts topic specific features. To select candidate feature terms, they took into consideration three term selection heuristics to extract noun phrases of specific patterns. They applied a mixture language model algorithm and a likelihood ratio algorithm on the extracted noun phrases to select the product features. Their approach is restricted to finding out explicit product features only.
Liu et al. [2] used association rule mining based on
Apriori algorithm to extract frequent phrases. To reduce the list of extracted phrases and identify potential product features they applied compactness pruning on phrases having more than one word by determining whether or not they appear together. For extracted single words, they applied a redundancy pruning where a candidate feature is pruned if it is a subset of another feature. They also tried to identify infrequent product features by determining the nearby noun phrases of review sentence where a frequent feature is absent and opinion words are present.
Popescu et al. [3] introduced an unsupervised information extraction system- OPINE that extracts explicit product features by recursively identifying parts and properties of a given product. It extracts noun phrases and computes pointwise mutual information between the phrases and meronymy discriminators associated with the product classes.
Their system distinguishes between part and property by WordNet’s IS-A hierarchy. Again, their approach is limited to extracting explicit product features.
Ghani et al. [4] approached explicit product feature extraction as a classification problem. They used
Naive Bayes with a multi-view semi-supervised algorithm. In their process, output of an unsupervised seed generation algorithm is combined with unlabeled data that is used by the semi-supervised algorithm to extract product attributes and values which are later linked together by dependency information and correlation score. For implicitly mentioned attributes, they used labeled data for their training corpus, trained the system and then performed classification using baseline, naive bayes and Expectation-
Maximization which is an iterative statistical technique for maximum likelihood estimation.
The approach described in this paper is different from those mentioned above in the aspects that it uses statistical methods by counting frequency of the Ngrams and then calculating tf.idf weight score to assign a review line to a product feature. It differs from the approaches above because it does not extract noun phrases and apply pruning to reduce candidate features, or does not try to identify part and property relations. Also, the described approach does not differentiate between explicit and implicit features and works equally for both.
Pang et al. [6] worked on evaluating machine learning approaches in classifying documents based on the positive and negative sentiments. They applied
Bayes’ rule to derive their Naive Bayes classifier. An adaptation of their Naive Bayes classifier has been utilized and described later in this paper for finding product features for reviews.
The described process is divided into 3 sections consisting of tasks related to corpora creation; normalized weight calculation to identify frequent Ngrams associated with product features, and product feature identification using a classification scheme.
A corpus is created by obtaining reviews on a particular type of product from Amazon, using
Amazon AWS. The review texts are then split into individual sentences that indicate individual product feature. If more than one product feature is present in a single sentence, the sentence is segmented accordingly. Also, complex and compound sentences are segmented if they contain separate feature information. These units relating to specific product features are then tagged manually with feature titles.
To avoid noisy information, general sentences that do not relate to any product feature are removed.
For this experiment, 120 reviews have been collected by searching for product ‘harddisk’ in amazon.co.uk. Among them, 100 random reviews are used for training while keeping the rest 20 for testing.
Frequent N-grams are counted within specific feature scopes obtained from different reviews. Both unigrams and bigrams are taken as N-grams for consideration in the counting process. If all the words in a N-gram are function words, that N-gram is removed to eliminate N-grams that are common to any text and do not carry product feature specific terms.
To normalise the use of product specific but not feature specific N-grams, the tf.idf metric has been used.
If n i,j
is the occurrence of term t i
{ }
in document d j having k terms, | D | is the number of document in the corpus, d : t i
∈ d is the number of documents where term t i
appears, then tf.idf weight can be calculated by multiplying term frequency, tf inverse document frequency, the followings: idf i
where tf i,j i,j
with
and idf i
are tf i , j
= n
k i n
, j k , j
; idf i
= log d :
{ t
D i
∈ d
}
To calculate term frequency, number of occurrences of a N-gram is counted within a product feature scope and is divided by total number of N-grams within that product feature scope. The inverse document frequency is calculated by means of dividing the total number of unique product features previously tagged by a number of product features associated with Ngram after consideration and then taking the logarithm of the quotient. A sample of the identified
N-grams associated with top five popular product features from the training data set using tf.idf weight score is shown is Table 1.
Table 1. Product feature associated Ngrams
Product Features Unigram
Working Smoothness drive
Bigram
the drive
Working Smoothness works western digital
Working Smoothness passport no problems
Working Smoothness problems the same
Usability use to use
Usability drive easy to
Usability
Usability
Outlook
Outlook
Outlook
Outlook
Software Support
passport hard drive
easy the drive
drive
just
good
looks
drive
the drive
of kit
it looks
the same
the software
Software Support
Software Support
Software Support
Accessories
Accessories
Accessories
Accessories
software the drive
use hard drive
just
usb
sync software
the usb
power
case
cable
western digital
usb cable
power supply
Identifying product features from the reviews has been seen as a document classification problem where each of the review lines can be considered as a document to be classified, and the product features are the classes. Two classification schemes have been used. The first one is Naive Bayes Classification.
According to Bayes’ rule,
P ( c | d ) =
P ( c )
P
P
(
( d d
)
| c )
Pang et al.[6] derived their NB classifier by reforming Bayes’ rule as following:
P
NB
( c | d ) : =
P ( c )(
i m
= 1
P (
P ( d ) f i
| c ) n i
( d ) ) where probability of finding a class c given document d is calculated by multiplying probability of finding class c , P(c) with probability of finding feature f given class c and then dividing by probability of finding document d , P( d ). Their feature set is denoted by f={f
1
…f m
} and n i document d .
(d) is the frequency of feature f i
in
Adapting their NB classifier for classifying review lines using N-grams as feature set, considering product features as classes and each review lines as documents, NB classifier can be rewritten as,
P
NB
( c | d ) : =
P ( c )(
i m
= 1
P
P
( d
(
) f i
| c ) w i
( c ) ) where set of N-grams in a review line is denoted by
{f
1
…f m
} and w i
(c) is the tf.idf weight of N-gram f i
for product feature class c . Laplace smoothing is used to avoid getting zero as the result of multiplication.
Because P(d) has no contribution towards selecting a class, it has been ignored. The review line, d is assigned to a product feature class, c * c * = arg max c
P ( c |
where, d )
In another classification scheme, a summation based approach has been used with the tf.idf weight of the Ngrams. A review line is assigned to a class c * where, c * = arg max c
m w i
( c ) w i
(c) is the tf.idf weight of N-gram for product feature class c . f i
in feature set
Tests have been performed within a small scope of
20 reviews kept aside for testing purposes. Table 2 below shows the result of product feature identification using unigram as the selected N-gram. Product features have been sorted based on their popularity found in the training corpus. Few of the product features that were present in the training corpus but not present in the test dataset are omitted. NB denotes
Naive Bayes classification and SB signifies
Summation Based classification.
Table 2. Result of product feature identification using Unigram
Precision Recall F-measure
Product Features
Working Smoothness
Usability
Capacity
Outlook
Software Support
Accessories
Compatibility
Working Speed
Size
Portability
Longevity
Customer Support
Noise
Product Information
Average
NB SB NB SB NB SB
0.4118
1.0000
0.2917
0.2857
0.3415
0.4444
0.1429
0.3333
0.0909
0.2500
0.1111
0.2857
0.2000
1.0000
0.2500
0.6667
0.2222
0.8000
0.2857
0.5000
0.3333
0.6667
0.3077
0.5714
0.5000
0.8000
0.4167
0.6667
0.4545
0.7273
0.7500
0.5385
0.2500
0.6364
0.3750
0.5833
0.4000
1.0000
0.2857
0.1429
0.3333
0.2500
1.0000
0.5000
1.0000
1.0000
1.0000
0.6667
0.1429
0.8000
0.2000
1.0000
0.1667
0.8889
1.0000
0.0000
0.3333
0.0000
0.5000
0.0000
1.0000
0.0000
1.0000
NA
NA
1.0000
0.5000
1.0000
0.2000
0.1250
0.2857
0.2222
0.4000
0.3750
0.5000
1.0000
0.4444
0.5455
0.0000
0.2000
0.0000
0.5000
NA 0.2857
0.4095
0.6462
0.2965
0.5671
0.3785
0.5593
Table 3. Result of product feature identification using Unigram+Bigram
Product features
Precision
NB SB NB
Recall
SB
F-measure
NB SB
Working Smoothness
Usability
Capacity
Outlook
Software Support
Accessories
Compatibility
Working Speed
Size
Portability
Longevity
Customer Support
Noise
Product Information
0.7500
1.0000
0.1250
0.2273
0.2143
0.3704
1.0000
0.4286
0.2727
0.3000
0.4286
0.3529
0.2000
1.0000
0.2500
0.7500
0.2222
0.8571
0.6667
0.5000
0.3333
0.5000
0.4444
0.5000
0.6667
0.8000
0.3333
0.6667
0.4444
0.7273
0.0000
0.7778
0.0000
0.5833
NA 0.6667
1.0000
0.0000
0.1429
0.0000
0.2500
NA
1.0000
0.6250
0.6000
1.0000
0.7500
0.7692
0.4000
0.5000
0.4000
1.0000
0.4000
0.6667
0.0000
0.2500
0.0000
0.5000
0.0000
0.2000
0.0000
1.0000
0.0000
1.0000
0.0000
0.1250
NA
NA
NA
0.3333
0.3333
0.2222
0.5000
0.3750
0.5000
1.0000
0.5000
0.5455
0.0000
0.2000
0.0000
0.5000
0.0000
0.2857
Average 0.4417
0.5469
0.2112
0.5823
0.3654
0.5100
Table 3 shows the result of product feature identification when both unigrams and bigrams are used as N-grams.
Table 4. Accuracy Rate of classification
N-grams
Unigram
Unigram
+Bigram
Accuracy(%)
NB
29.09
19.09
SB
41.82
41.82
Not Classified (%)
NB for different lengths of N-grams tested.
0
0
SB
11.82
7.27
Table 4 gives the overall accuracy rate of identifying product features using the two classification schemes
From the test performed using unigrams only,
Table 2 shows that summation based classification scheme performed better than naïve bayes classification both in terms of precision and recall and thus yielding a better f-measure score. Whereas, Table
3 shows that when both unigrams and bigrams are used, naïve bayes performed slightly better than the other scheme in precision for some of the popular product features. But in case of less popular product features, precision is still very low. On the other hand, summation based classification scheme achieved better results on an average.
Use of only bigrams was also tested but showed poor results because the frequency of the bigrams decreased notably and thus no longer remained indicative of the individual product features. A bigger training corpus might improve performance in this aspect. Also, the assumed statistical independence for
Naive Bayes classification deteriorates significantly.
Use of unigrams and bigrams together yielded slightly better result than using only unigram when summation based classification technique was applied, as can be seen in the average recall score. But average precision is still higher when only unigram is used.
Table 4 shows that the accuracy rate remained unchanged, but the percentage of undetermined classes decreased. Summation based classification technique failed to assign a product feature to all the review lines in cases where the N-grams in a review for test data set were not present in the training corpus. Increasing the size of the training corpus will increase the possibility of avoiding this issue. On the other hand, because of the default probability of finding a product feature class, naive bayes classification scheme was always able to assign a class to a review line, even though the N-grams in the review line to be classified may remain absent in the training corpus. This increases the possibility of wrong classification when Naive Bayes classification scheme is used.
The classification accuracy rates for both the schemes are still very low. Because no lemmatization or stemming had been used, the frequency of the same canonical form of words got distributed and thus contributed less towards indicating product feature associated with a review line. The use of word synonyms might also improve the accuracy as more accurate and relevant frequency count will be possible.
The presence of these frequently associated Ngrams alone might not be sufficient enough to identify product features from unstructured plain text reviews.
More tests in varying domains and with bigger corpus are needed to be done to improve performance.
However, it is quite evident that they can significantly contribute towards identifying implicit and explicit product features. The future continuation of this task will involve fine tuning within the association identification process, involving using synonyms of the found words, lemmatization, stemming and applying other classification techniques to find an optimum solution for identifying product features.
[1] Yi, J., Nasukawa, T., Bunescu, R. and Niblack, W.,
“Sentiment Analyzer: Extracting Sentiments about a Given
Topic using Natural Language Processing Techniques”, In
Proceedings of the IEEE International Conference on Data
Mining (ICDM) , IEEE Computer Society, 2003, pp. 427-
434.
[2] Hu, M. and Liu, B. “Mining Opinion Features in
Customer Reviews”, In Proceedings of AAAI , AAAI Press,
San Jose, USA, July 2004, pp. 755–760.
[3] Popescu, A.-M. and Etzioni, O., “Extracting Product
Features and Opinions from Reviews”, In Proceedings of the Human Language Technology Conference and the
Conference on Empirical Methods in Natural Language
Processing (HLT/EMNLP) , Association for Computational
Linguistics, Vancouver, British Columbia, Canada, 2005, pp. 339-346.
[4] Ghani, R., Probst, K., Liu, Y., Krema, M. and Fano, A.,
“Text Mining for Product Attribute Extraction”, SIGKDD
Explorations Newsletter , 8(1): pp. 41–48, 2006.
[5] Yi, J. and Niblack, W., “Sentiment mining in
WebFountain”, In Proceedings of the International
Conference on Data Engineering (ICDE) , IEEE Computer
Society , 2005, pp. 1073-1083.
[6] Pang, B., Lee, L. and Vaithyanathan, S., “Thumbs up?
Sentiment Classification using Machine Learning
Techniques”, In Proceedings of the Conference on
Empirical Methods in Natural Language Processing
(EMNLP) , Association for Computational Linguistics, 2002, pp. 79-86.