Extraction of Reasons for Products and Store Satisfaction and Dissatisfaction from Anonymous BBS Tomoaki Hirano, Takeshi Kaneko Musashi Graduate School Of Technology 1-28-1 Tamazutumi Setagaya Tokyo Japan g0791702@sc.musashi-tech.ac.jp ABSTRACT Today companies usually have customer service desks to receive opinions directly from consumers, but since more than 90 percent of people feeling dissatisfied are silent, there are few consumers that directly state their complaint. The purpose of this research is to search anonymous bulletin board services for text containing dissatisfaction and satisfaction information and extracting the reasons of them. Extracting the reasons for dissatisfaction can show product manufacturers and stores where they need to improve. Moreover, extracting the reasons for satisfaction will assist product manufacturers and stores in differentiating themselves from other companies in the same industry. The method used will be to search for dissatisfaction or satisfaction phrases using “common phrases” as hints. Broadly extracting the “reason expressions” of the feelings from this large volume of phrases makes it possible to determine if the reason expression is truly an expression of satisfaction or dissatisfaction. The results for dissatisfaction were not good, but the results for satisfaction were good. Keywords: Marketing research, BBS, Correspondence analysis Research Background Nowadays, much information about the reputation of products and stores is available on BBS and blogs on the Internet. Normally, companies usually have customer service desks to receive opinions directly from consumers, but since more than 90 percent of people feeling dissatisfied are silent, there are few consumers that directly state their complaint. So it is not surprising that much information truly necessary for marketing exists on BBS. Much research has been done on analyzing reputation by extracting and analyzing evaluation information from text about products or stores. Examples of this are given in Nozomi Kobayashi (2005) or Kenji Tateishi (2001). However, just discussing whether it is good or bad makes it difficult to utilize this information in marketing. In addition, meaningfulness cannot be found from the perspective of reliability by just tabulating the number of good and bad comments from anonymous BBS. Research Objective The objective of this research is to search for text that means dissatisfaction and satisfaction and then extract the reasons for that to learn where product manufacturers and stores need to make improvement. Moreover, extracting the reasons for satisfaction will assist product manufacturers and stores in differentiating themselves from other companies in the same industry. This research was limited to anonymous BBS because more than a certain number of comments is required. This again raises the problem of reliability, but the results of this research must be used as obtained. The aim of this research is to assist with marketing. Research Method This research plan is explained using the following example sentences. A) There was not enough rice with the curry. I was looking forward to it so I was disappointed. B) I was looking forward to it so I was disappointed. C) There was not enough rice with the curry. All of the three of these sentences indicate dissatisfaction, but only A and B directly express dissatisfaction, and only A and C contain the reason for the dissatisfaction. Actually, these three kinds of sentences exist mixed together and it is easy to find sentence types A or B. For example, “disappointed” of A and B show dissatisfaction regardless of domain or theme. Words indicating dissatisfaction like “disappointed” are called “Common expressions” in this study. Common expressions are divided into “Common dissatisfaction expressions” and “Common satisfaction expressions”. So when phrases meaning dissatisfaction or satisfaction are found using common expressions, the reasons are extracted, and the extracted reasons are used to search for dissatisfied or satisfied phrases that do not contain common expressions. Hereafter dissatisfaction and satisfaction are called “Feelings.” Actual sentences, however, are complicated and it is difficult to determine what reasons select from feeling phrases. That is the reason this research uses a roundabout method. Expressions that may be feeling reasons are widely extracted from the sentences. These expressions are called “reason expressions”. When collecting many reason expressions from dissatisfaction or satisfaction sentences, if most of the reason expressions express dissatisfaction, it truly indicates dissatisfaction. This research followed the steps given below. Sample collection Common expression collection Dissatisfaction or satisfaction phrase search Reason expression extraction Reason expression organization Reason expression identification A simple explanation is given below. First, sample BBS are collected from communities hosing many anonymous BBS. Then, sample sentences are screened and some of the wording in the samples modified. Second, two common expressions are collected from many anonymous BBS other than the samples. Third, sentences having common expressions are searched, extracted, and then if the sentence at that time does not contain an expression with a particular express that indicates a reason expression, or if the writer of previous and present sentences is same, the previous sentence is also extracted. Fourth, reason expressions are widely acquired from the sentences and matched with prepared models. Reason expressions consist of at least two words. One is the target of the feeling and is called an “element”. The other is the opinion regarding the element and is called an “evaluation”. Then whether each reason expression exists in a dissatisfaction or satisfaction sentence is recorded. Fifth, the reason expressions are screened because the reason expression elements contain many terms that are not the object of the feeling. The reason expressions are then organized while being mindful of negative expressions. Finally, the reason expressions are totaled and correspondence analysis is used to determine what kind of feeling reason expression each reason expression is or is not. Next, the details of each process are explained. Sample collection It is good that sample BBS contain many statements related to feelings. The target of samples was focused on the restaurant industry, and, in particular, was limited to sushi bars for this research. Chasen (2000) was used for the morphological analysis. Phrases without sentence structure, with the exact same meaning, those referring to the BBS itself, and others were omitted from the sample sentences. Further, notations of two words that have the same readings but not different notations were consolidated if the words are nouns and do not include different Chinese characters. In Japanese, homonyms containing different Chinese characters often have different meanings. In addition, complex nouns that were separated by the morphological analysis were rejoined. Strict rules were omitted, for example, continuous words of the Japanese syllabary or common nouns were connected. Further, dependency analysis was conducted for downstream steps. CaboCha (2001) was used for dependency analysis. Common expression collection Common dissatisfaction expressions and common satisfaction expressions were collected from many anonymous BBS other than the samples. The results were as follows. [Common dissatisfaction expression] 26 words including ”Disappointed”, “Terrible”, “Bad” [Common satisfaction expression] 13 words including ”Happy”, “Like it”, “Awesome” Dissatisfaction or satisfaction phrase search Sentences containing common expressions were searched and extracted. But what we want is reason expressions, and these often exist in the previous sentence. If the sentence containing the common expression does not have it or if it does not have a junction particle, the previous sentence was also extracted. This is because the actual reason expression exists after the junction particle. Reason expression extraction Reason expressions that might express the feeling reason were widely extracted from the feeling sentences. Reason expressions are composed of two words, where the first one is the object of the feeling and is called an element, and the second one is the opinion regarding the element and is called the evaluation. Then whether each reason expression exists in a dissatisfaction or satisfaction sentence is recorded. Elements are only nouns, and evaluations are nouns-adjectival verb stems, verbs, and adjectives. Reason expressions always have an evaluation, but to not necessarily have an element. This is because elements are often omitted. In this case, the element is “unclear”. If the evaluation has a dependency, there must be an element in front of the dependency. Reason expression organization Reason expressions have been acquired, but now they must be screened. This is because some are not the object of a feeling due to the wide extraction of reason expression elements. Therefore, reason expressions containing the same elements as the elimination list words are omitted. The elimination list contains words that often appear in general BBS. These words are not suitable as targets of feelings. Moreover, if a negative word appears after a common expression, the reason expression extracted from the sentence containing that common expression is omitted as well. If the negative word appears after the evaluation, however, dissatisfaction and satisfaction in the record are switched to make dissatisfaction become satisfaction. This is done, for example, because the sentence “she doesn’t like him”, does not necessarily mean “she hates him”. But for instance, it is not so unnatural to assume from the sentence “she doesn’t like how short he is”, that “she likes tall men”. This may be a little crude, but this processing was done to collect the reason expressions. Reason expression identification The reason expressions were tabulated with the feelings placed in rows and the elements placed in columns to obtain a contingency table. The cells of the contingency table represent the number of element and evaluation combinations recorded for dissatisfaction or satisfaction. Example of the contingency table is following. (Table.1) Table1. Partial contingency table for Element and Evaluation combinations and Dissatisfaction or Satisfaction think->anonymous eat->anonymous leave->anonymous go->shop bad->manners big->sushi item broil->salmon delicious->sushiro Dissatisfaction Satisfaction 28 17 2 2 2 1 3 5 1 1 1 3 0 2 0 2 Whether each reason expression truly indicates dissatisfaction or satisfaction was determined using correspondence analysis based on the contingency table. There is a reason why the feeling was determined for each element and evaluation. For example, one of the evaluations is “little”. This generally indicates dissatisfaction in that the amount is a little, but in the case of fat content “little” generally indicates satisfaction. Therefore, it is unsuitable to set the feeling for each evaluation. As in the case of “dirty”, however, there are some evaluations where the feeling is almost always set under a particular domain. For this reason “Unclear” was prepared. The results of the correspondence analysis are given in the next section. Research Results We will look at the reason elements determined to be dissatisfaction or satisfaction to understand the results of the correspondence analysis. Fig. 1. Correspondence analysis results Correspondence analysis allows the optimum simultaneous plotting of row and column items. Correspondence analysis deals with the data ratio rather than raw data, so the tendencies of each row item and each column item can be understood without concern for the bias of the row or column. Now the manner for viewing Figure 1 will be explained. The result is one dimensional because the column is two dimensional, but the result is shown in a plain to make it easer to read. The vertical and the horizontal axes are the same. The numbers in the center of the two red circles represent the plot of the feeling, and the other numbers represent the element and evaluation combinations. The origin represents the respective averages for row items and column items. Dissatisfaction is at the upper right, and satisfaction is at the lower left. This means that the more to the upper right an element and evaluation combination is, the higher the probability the combination truly indicates dissatisfaction. Conversely, the more to the lower left and element and evaluation combination is, the higher the probability the combination truly indicates satisfaction. Since the combination numbers overlap in the graph, the results are shown separately below. There are 85 combinations plotted more to the upper right than dissatisfaction. These are judged to be dissatisfaction was a high probability. There are 37 combinations that are plotted more to the lower left than satisfaction. These are judged to be satisfaction with a high probability. There were 10 examples each of items determined to be suitable as dissatisfaction or satisfaction, and each 5 examples each of items determined to be unsuitable. These are shown in Tables 2 and 3. Table 2. Suitable and unsuitable examples for dissatisfaction Suitableness undelicious->anonymous warm->anonymous bankrupt->anonymous short->term cheat->anonymous bad->feel yell->person bad->treatment forget->order big->burden Unsuitableness delicious->anonymous go->sushi bar eat->sushi float->salmon roe do->expectation Table 3. Suitable and unsuitable examples for satisfaction Suitableness tasty->anonymous float->anonymous go->Kura sushi bar broil->salmon eat->salmon get->anonymous big->sushi item clean->appearance honest->employee do->smiling Unsuitableness do->kickback force->task heavy->task severe->woman bad->shop Considerations The following consider applies to Tables 2 and 3. First, in regards to dissatisfaction, the results are not good if the element is unclear. Both “Delicious” and “Not tasty” are judged as dissatisfaction at the same time. For this reason, the dissatisfaction sentence search was not as good as expected. And for other than unclear as well, not only were the results unsuitable, but variation was lacking. Looking at suitable examples, no elements about sushi were observed. The objects of this sample BBS are almost exclusively belt-conveyor sushi bar, but even in this case it was poor. (Belt-conveyor sushi bars are generally inexpensive.) Second, in regards to satisfaction, the results were bad regardless of whether or not the element was unclear. Looking a suitable examples, there were some elements about sushi and some good memories about going to a sushi bar. There are some unsuitable examples, but there were almost not unsuitable examples for customers of sushi bars. Conclusion The dissatisfaction results were not good, but the satisfaction results were good. Possible reasons for this are given below. First, it is easy for dissatisfied writers to get emotional and disregard grammar. Second, when writing satisfaction sentences, the writer is sincerely satisfied sincerely and are not playing around or joking. When this research is completed, the author expects to be able to extract specific reasons for dissatisfaction or satisfaction without using a dictionary or manual work. This will be useful for marketing research. Future Issues For the issues up to the presentation are addressed. The first is the method for searching for dissatisfied and satisfied sentences. Although negative words are taken into consideration, the current method must be improved to determine if there are common expressions. For example, in case of the sentence “I went a sushi bar yesterday with a man I like”, this “like” is a common expression that indicates feeling, but it is not about the sushi bar in the sentence. The second is how to extract reason expressions. Because we are extracting by only part of speech, the quantity of the result is too high. Conditions need to be applied to the elements. The third is the number of sample BBS. The grammar when writing on anonymous BBS is not restricted, so if the number of samples is increased, it is expected that the peculiar expressions will be buried. The forth is how to determine that a reason expression has never been seen before but that it is similar to a reason expression that has already been determined. For instance, if it is possible to determine that “There was not enough rich (with the curry)” is dissatisfaction, it can also determine that “There was not enough (curry sauce)” is dissatisfaction. This will immensely increase the number of reason expressions that can be extracted. Moreover, future issues include also extracting from question and demand BBS. Questions may be useful for finding latent needs, and demands in and of themselves show what is wanted. Information extraction from BBS is desired for its usefulness throughout marketing research. References Cabocha (2001) http://chasen.org/~taku/software/cabocha/ Chasen (2000) http://chasen-legacy.sourceforge.jp/ Kenji Tateishi (2001), Opinion Information Retrieval from the Internet, IPSJ SIG Notes, Vol.2001, No.69 (NL-144), Page.75-82 Nozomi Kobayashi (2005), Collecting Evaluative Expressions for Opinion Extraction, Journal of Natural Language Processing, Vol.12, No.3, Page.203-222