The extraction of reputation reason about products and shops from

advertisement
Extraction of Reasons for Products and Store Satisfaction and Dissatisfaction from
Anonymous BBS
Tomoaki Hirano, Takeshi Kaneko
Musashi Graduate School Of Technology
1-28-1 Tamazutumi Setagaya Tokyo Japan
g0791702@sc.musashi-tech.ac.jp
ABSTRACT
Today companies usually have customer service desks to receive opinions directly from consumers, but since more
than 90 percent of people feeling dissatisfied are silent, there are few consumers that directly state their complaint.
The purpose of this research is to search anonymous bulletin board services for text containing dissatisfaction and
satisfaction information and extracting the reasons of them. Extracting the reasons for dissatisfaction can show
product manufacturers and stores where they need to improve. Moreover, extracting the reasons for satisfaction will
assist product manufacturers and stores in differentiating themselves from other companies in the same industry.
The method used will be to search for dissatisfaction or satisfaction phrases using “common phrases” as hints.
Broadly extracting the “reason expressions” of the feelings from this large volume of phrases makes it possible to
determine if the reason expression is truly an expression of satisfaction or dissatisfaction.
The results for dissatisfaction were not good, but the results for satisfaction were good.
Keywords: Marketing research, BBS, Correspondence analysis
Research Background
Nowadays, much information about the reputation of products and stores is available on BBS and blogs on the
Internet. Normally, companies usually have customer service desks to receive opinions directly from consumers, but
since more than 90 percent of people feeling dissatisfied are silent, there are few consumers that directly state their
complaint. So it is not surprising that much information truly necessary for marketing exists on BBS.
Much research has been done on analyzing reputation by extracting and analyzing evaluation information from text
about products or stores. Examples of this are given in Nozomi Kobayashi (2005) or Kenji Tateishi (2001). However,
just discussing whether it is good or bad makes it difficult to utilize this information in marketing. In addition,
meaningfulness cannot be found from the perspective of reliability by just tabulating the number of good and bad
comments from anonymous BBS.
Research Objective
The objective of this research is to search for text that means dissatisfaction and satisfaction and then extract the
reasons for that to learn where product manufacturers and stores need to make improvement. Moreover, extracting the
reasons for satisfaction will assist product manufacturers and stores in differentiating themselves from other
companies in the same industry.
This research was limited to anonymous BBS because more than a certain number of comments is required. This
again raises the problem of reliability, but the results of this research must be used as obtained. The aim of this
research is to assist with marketing.
Research Method
This research plan is explained using the following example sentences.
A) There was not enough rice with the curry. I was looking forward to it so I was disappointed.
B) I was looking forward to it so I was disappointed.
C) There was not enough rice with the curry.
All of the three of these sentences indicate dissatisfaction, but only A and B directly express dissatisfaction, and only
A and C contain the reason for the dissatisfaction. Actually, these three kinds of sentences exist mixed together and it
is easy to find sentence types A or B. For example, “disappointed” of A and B show dissatisfaction regardless of
domain or theme. Words indicating dissatisfaction like “disappointed” are called “Common expressions” in this study.
Common expressions are divided into “Common dissatisfaction expressions” and “Common satisfaction
expressions”.
So when phrases meaning dissatisfaction or satisfaction are found using common expressions, the reasons are
extracted, and the extracted reasons are used to search for dissatisfied or satisfied phrases that do not contain common
expressions. Hereafter dissatisfaction and satisfaction are called “Feelings.”
Actual sentences, however, are complicated and it is difficult to determine what reasons select from feeling phrases.
That is the reason this research uses a roundabout method.
Expressions that may be feeling reasons are widely extracted from the sentences. These expressions are called
“reason expressions”. When collecting many reason expressions from dissatisfaction or satisfaction sentences, if most
of the reason expressions express dissatisfaction, it truly indicates dissatisfaction.
This research followed the steps given below.
Sample collection
Common expression collection
Dissatisfaction or satisfaction phrase search
Reason expression extraction
Reason expression organization
Reason expression identification
A simple explanation is given below.
First, sample BBS are collected from communities hosing many anonymous BBS. Then, sample sentences are
screened and some of the wording in the samples modified.
Second, two common expressions are collected from many anonymous BBS other than the samples.
Third, sentences having common expressions are searched, extracted, and then if the sentence at that time does not
contain an expression with a particular express that indicates a reason expression, or if the writer of previous and
present sentences is same, the previous sentence is also extracted.
Fourth, reason expressions are widely acquired from the sentences and matched with prepared models. Reason
expressions consist of at least two words. One is the target of the feeling and is called an “element”. The other is the
opinion regarding the element and is called an “evaluation”. Then whether each reason expression exists in a
dissatisfaction or satisfaction sentence is recorded.
Fifth, the reason expressions are screened because the reason expression elements contain many terms that are not
the object of the feeling. The reason expressions are then organized while being mindful of negative expressions.
Finally, the reason expressions are totaled and correspondence analysis is used to determine what kind of feeling
reason expression each reason expression is or is not.
Next, the details of each process are explained.
Sample collection
It is good that sample BBS contain many statements related to feelings. The target of samples was focused on the
restaurant industry, and, in particular, was limited to sushi bars for this research.
Chasen (2000) was used for the morphological analysis. Phrases without sentence structure, with the exact same
meaning, those referring to the BBS itself, and others were omitted from the sample sentences.
Further, notations of two words that have the same readings but not different notations were consolidated if the
words are nouns and do not include different Chinese characters. In Japanese, homonyms containing different
Chinese characters often have different meanings.
In addition, complex nouns that were separated by the morphological analysis were rejoined. Strict rules were
omitted, for example, continuous words of the Japanese syllabary or common nouns were connected. Further,
dependency analysis was conducted for downstream steps. CaboCha (2001) was used for dependency analysis.
Common expression collection
Common dissatisfaction expressions and common satisfaction expressions were collected from many anonymous
BBS other than the samples.
The results were as follows.
[Common dissatisfaction expression] 26 words including ”Disappointed”, “Terrible”, “Bad”
[Common satisfaction expression] 13 words including ”Happy”, “Like it”, “Awesome”
Dissatisfaction or satisfaction phrase search
Sentences containing common expressions were searched and extracted. But what we want is reason expressions,
and these often exist in the previous sentence. If the sentence containing the common expression does not have it or if
it does not have a junction particle, the previous sentence was also extracted. This is because the actual reason
expression exists after the junction particle.
Reason expression extraction
Reason expressions that might express the feeling reason were widely extracted from the feeling sentences. Reason
expressions are composed of two words, where the first one is the object of the feeling and is called an element, and
the second one is the opinion regarding the element and is called the evaluation. Then whether each reason expression
exists in a dissatisfaction or satisfaction sentence is recorded.
Elements are only nouns, and evaluations are nouns-adjectival verb stems, verbs, and adjectives. Reason expressions
always have an evaluation, but to not necessarily have an element. This is because elements are often omitted. In this
case, the element is “unclear”. If the evaluation has a dependency, there must be an element in front of the
dependency.
Reason expression organization
Reason expressions have been acquired, but now they must be screened. This is because some are not the object of a
feeling due to the wide extraction of reason expression elements.
Therefore, reason expressions containing the same elements as the elimination list words are omitted. The
elimination list contains words that often appear in general BBS. These words are not suitable as targets of feelings.
Moreover, if a negative word appears after a common expression, the reason expression extracted from the sentence
containing that common expression is omitted as well. If the negative word appears after the evaluation, however,
dissatisfaction and satisfaction in the record are switched to make dissatisfaction become satisfaction. This is done,
for example, because the sentence “she doesn’t like him”, does not necessarily mean “she hates him”. But for
instance, it is not so unnatural to assume from the sentence “she doesn’t like how short he is”, that “she likes tall
men”. This may be a little crude, but this processing was done to collect the reason expressions.
Reason expression identification
The reason expressions were tabulated with the feelings placed in rows and the elements placed in columns to obtain
a contingency table. The cells of the contingency table represent the number of element and evaluation combinations
recorded for dissatisfaction or satisfaction.
Example of the contingency table is following. (Table.1)
Table1. Partial contingency table for Element and Evaluation combinations and Dissatisfaction or Satisfaction
think->anonymous
eat->anonymous
leave->anonymous
go->shop
bad->manners
big->sushi item
broil->salmon
delicious->sushiro
Dissatisfaction Satisfaction
28
17
2
2
2
1
3
5
1
1
1
3
0
2
0
2
Whether each reason expression truly indicates dissatisfaction or satisfaction was determined using correspondence
analysis based on the contingency table.
There is a reason why the feeling was determined for each element and evaluation. For example, one of the
evaluations is “little”. This generally indicates dissatisfaction in that the amount is a little, but in the case of fat
content “little” generally indicates satisfaction. Therefore, it is unsuitable to set the feeling for each evaluation.
As in the case of “dirty”, however, there are some evaluations where the feeling is almost always set under a
particular domain. For this reason “Unclear” was prepared.
The results of the correspondence analysis are given in the next section.
Research Results
We will look at the reason elements determined to be dissatisfaction or satisfaction to understand the results of the
correspondence analysis.
Fig. 1. Correspondence analysis results
Correspondence analysis allows the optimum simultaneous plotting of row and column items. Correspondence
analysis deals with the data ratio rather than raw data, so the tendencies of each row item and each column item can
be understood without concern for the bias of the row or column.
Now the manner for viewing Figure 1 will be explained. The result is one dimensional because the column is two
dimensional, but the result is shown in a plain to make it easer to read. The vertical and the horizontal axes are the
same.
The numbers in the center of the two red circles represent the plot of the feeling, and the other numbers represent the
element and evaluation combinations. The origin represents the respective averages for row items and column items.
Dissatisfaction is at the upper right, and satisfaction is at the lower left. This means that the more to the upper right
an element and evaluation combination is, the higher the probability the combination truly indicates dissatisfaction.
Conversely, the more to the lower left and element and evaluation combination is, the higher the probability the
combination truly indicates satisfaction.
Since the combination numbers overlap in the graph, the results are shown separately below. There are 85
combinations plotted more to the upper right than dissatisfaction. These are judged to be dissatisfaction was a high
probability. There are 37 combinations that are plotted more to the lower left than satisfaction. These are judged to be
satisfaction with a high probability.
There were 10 examples each of items determined to be suitable as
dissatisfaction or satisfaction, and each 5 examples each of items determined to be unsuitable. These are shown in
Tables 2 and 3.
Table 2. Suitable and unsuitable examples for dissatisfaction
Suitableness
undelicious->anonymous
warm->anonymous
bankrupt->anonymous
short->term
cheat->anonymous
bad->feel
yell->person
bad->treatment
forget->order
big->burden
Unsuitableness
delicious->anonymous
go->sushi bar
eat->sushi
float->salmon roe
do->expectation
Table 3. Suitable and unsuitable examples for satisfaction
Suitableness
tasty->anonymous
float->anonymous
go->Kura sushi bar
broil->salmon
eat->salmon
get->anonymous
big->sushi item
clean->appearance
honest->employee
do->smiling
Unsuitableness
do->kickback
force->task
heavy->task
severe->woman
bad->shop
Considerations
The following consider applies to Tables 2 and 3.
First, in regards to dissatisfaction, the results are not good if the element is unclear. Both “Delicious” and “Not
tasty” are judged as dissatisfaction at the same time. For this reason, the dissatisfaction sentence search was not as
good as expected.
And for other than unclear as well, not only were the results unsuitable, but variation was lacking. Looking at
suitable examples, no elements about sushi were observed. The objects of this sample BBS are almost exclusively
belt-conveyor sushi bar, but even in this case it was poor. (Belt-conveyor sushi bars are generally inexpensive.)
Second, in regards to satisfaction, the results were bad regardless of whether or not the element was unclear.
Looking a suitable examples, there were some elements about sushi and some good memories about going to a sushi
bar. There are some unsuitable examples, but there were almost not unsuitable examples for customers of sushi bars.
Conclusion
The dissatisfaction results were not good, but the satisfaction results were good. Possible reasons for this are given
below.
First, it is easy for dissatisfied writers to get emotional and disregard grammar. Second, when writing satisfaction
sentences, the writer is sincerely satisfied sincerely and are not playing around or joking.
When this research is completed, the author expects to be able to extract specific reasons for dissatisfaction or
satisfaction without using a dictionary or manual work. This will be useful for marketing research.
Future Issues
For the issues up to the presentation are addressed.
The first is the method for searching for dissatisfied and satisfied sentences. Although negative words are taken into
consideration, the current method must be improved to determine if there are common expressions. For example, in
case of the sentence “I went a sushi bar yesterday with a man I like”, this “like” is a common expression that
indicates feeling, but it is not about the sushi bar in the sentence.
The second is how to extract reason expressions. Because we are extracting by only part of speech, the quantity of
the result is too high. Conditions need to be applied to the elements.
The third is the number of sample BBS. The grammar when writing on anonymous BBS is not restricted, so if the
number of samples is increased, it is expected that the peculiar expressions will be buried.
The forth is how to determine that a reason expression has never been seen before but that it is similar to a reason
expression that has already been determined. For instance, if it is possible to determine that “There was not enough
rich (with the curry)” is dissatisfaction, it can also determine that “There was not enough (curry sauce)” is
dissatisfaction. This will immensely increase the number of reason expressions that can be extracted.
Moreover, future issues include also extracting from question and demand BBS. Questions may be useful for finding
latent needs, and demands in and of themselves show what is wanted. Information extraction from BBS is desired for
its usefulness throughout marketing research.
References
Cabocha (2001) http://chasen.org/~taku/software/cabocha/
Chasen (2000) http://chasen-legacy.sourceforge.jp/
Kenji Tateishi (2001), Opinion Information Retrieval from the Internet, IPSJ SIG Notes, Vol.2001, No.69 (NL-144), Page.75-82
Nozomi Kobayashi (2005), Collecting Evaluative Expressions for Opinion Extraction, Journal of Natural Language Processing,
Vol.12, No.3, Page.203-222
Download