1 With the advent of Web 2.0, many new technologies and... emerged such as blogs, discussion forums, e-commerce sites to enable...

advertisement

1

CHAPTER 1

INTRODUCTION

1.1 Introduction

With the advent of Web 2.0, many new technologies and platforms have emerged such as blogs, discussion forums, e-commerce sites to enable people procure products and services and provide their opinions and feedbacks online.

Consumers have at their disposal different types of information on the web which enable them to share their experiences and opinions (positive or negative) on any product or service (Zabin and Jefferies 2008). Different people often express their experience, opinions and thoughts on almost anything at different occasions and places. One person may find a particular feature is interesting; whereas, it may not make sense for another.

It is estimated that more than 75,000 new blogs emerge daily with 1.2 million new posts each day covering many consumer opinions on products and services

(Esuli and Sebastiani 2005; Pang and Lee 2008). Moreover, statistics show that more than 81% of Internet users do online research on a product at least once and this has a significant influence on their purchases (Abbasi 2007; Rainie and Horrigan 2007;

Pang Lee 2008). Such an online wealth of information over the web has helped customers, firms, manufacturers, service providers, social and government bodies to take proper decision to procure or enhance various products and services. This has

2 triggered the need to enhance existing methods and techniques to extract and summarize opinions of different online reviews (Pang and Lee, 2004).

Opinion mining is a field that offers a number of tools and techniques that are used to find people / customers’ opinions on certain products, services, events, occasions etc. The mining process can be as simple as learning polarity (positive or negative) and sentiment of the words, or as complicated as performing deep parsing of data to identify grammar and structure of the sentences. Opinion mining seeks extracting useful information from the opinionated sentences written in different forums, articles, books, product reviews etc and then presenting such details in textual summaries or visual presentations for quick reference and decision making.

Opinion mining is an important field as it helps to achieve the following objectives (Pang and Lee 2008):

To understand customers’ feelings and opinions on a particular product/service in order to improve the quality and delivery of such goods and as expressed in everyday communications - this will in turn help to enhance products and services.

To scientifically record different opinions and positions of people and various parties on a specific event, accident, incident, occasion etc. This will in turn help to put proper measures on how to handle such cases based on people opinions.

To improve social services provided to public by governments and social organizations by understanding their demands and suggestions.

Getting people expressed opinions on goods and services.

Companies, supplier and manufacturer firms can utilize on line reviews to respond to their consumer insights by modifying their marketing messages, brand positioning, product development and other activities accordingly

(Zabin and Jefferies 2008).

Factors behind this quick gain of attention include (Shishir and Suresh 2009):

3

The rise of many machine learning methods in natural language processing

(NLP) and information retrieval (IR);

The datasets availability for machine learning algorithms to be trained on, due to the blossoming of the World Wide Web and, specifically, the development of review-aggregation web-sites.

Awareness of intellectual challenges and commercial and intelligence applications that the area offers.

English opinionated sentences can be Positive, Negative or Neutral. The following are few sentences which resemble few challenges:

Jane Austen’s books madden me so that I can’t conceal my frenzy from reader

=> Positive

The Power puff girls learned that with great power comes great responsibility

=> Neutral

At movies gonna watch the mechanic, hope this thing is good

=> Neutral

I don’t think I’ve seen one Adam Sandler movie that’s not good

=> Positive

If I don’t see Source Code this weekend it will have been a complete waste.

=> Positive

The battery life of this laptop is very very low

=> Negative

Due to its high commercial importance, mining and summarizing of user reviews are a widely studied application (Wei et al., 2010). The two main tasks involved in opinion mining regardless of the application are (1) identification of opinion-bearing phrases/sentences from free text and (2) finding sentiment polarity of opinionated phrases. The descriptors such as adjectives or adverbs describing the features are present in an opinion sentence mainly to indicate the polarity of the expressed opinion. However, the strength and polarity of the opinionated phrases are

4 also affected by the presence of linguistic hedges such as modifiers (e.g., “not”), concentrators (e.g., “very,” “extremely”), and dilators (e.g., “quite,” “almost,” and

“nearly”). Zadeh developed the concept of fuzzy linguistic variables and linguistic hedges that modify the meaning and intensity of their operands (Huynh et al., 2002).

Recent papers in this field have also pointed out that the task of opinion mining is sensitive to such hedges and taking the effect of linguistic hedges into consideration can improve the efficiency of the sentiment classification task (Dalal and Zaveri

2013).

1.2 Problem Background

Opinions of people about a specific subject, product or service require effective techniques and methods in order to extract, classify, represent and summarize them for better decision making (Haji 2009; Dan 2010; Ayesha et al.,

2012; Jayashri. and Mayura, K. 2013). There are still rich research areas which are not well addressed by scholars in this field (Haji 2009; Yongyong et al., 2010;

Bjørkelund et al., 2012; Moraes et al., 2013). Among these areas are the following:

From the existing literature on opinion mining, one can easily notice that the focus is more on adverbs and adjectives (Farah et. al., 2007; Pang and Lee

2009; Hana 2011; Moraes et al., 2013). Other part of speech like verbs and

Nouns are not widely addressed and especially when analyzing and processing opinion scores. Opinion words can be adjectives, adverbs, verbs and nouns and analysis of such combination has not been addressed till date of this research. Addressing such a combination to enhance opinion classifications, scoring and summarization is still an important problem which needs to be addressed (Bjørkelund et al., 2012; Moraes et al., 2013).

In addition to the above, extracted opinions need to be properly represented to enhance the classification, polarity and score calculations. Kobayashi et al.

(2005) extracted Opinions as tuples (Subject, Attribute, Value). Qingliang et al. (2008) proposed to present opinions also as tuples (“opinion holder”,

5

“opinion object”, “opinion creation date”, “opinion polarity” and “detailed opinions”). Both of these representations do not represent all characteristics of an opinion. Elements like product/service features, opinion score, intensity of an opinion, group to which product feature belongs are missing and need to be considered while representing an extracted opinion. Moreover, better representation structures need to be designed to enable further processing of extracted opinions. Existing structures can be improved by adding the above suggested elements or characteristics of an opinion.

Moreover, in order to properly identify or measure the strength of an opinion, it is important to convert such an opinion to a value (called a score) which represent its strength. For example, the opinion word ‘beautiful’ can be given a score of 85 to show that it is a highly positive opinion. Existing studies on scoring opinion words, sentences and documents are still in its infancy and needs to be well researched. There are many permutations and options to be considered and studied while analyzing in depth the area of opinion scoring

(Elomaa et al., 2011). SentiWordNet, which is a widely used lexical resource in the field of opinion mining, provides scores to various subjective words.

However, such scoring method is a basic one and in many cases the assigned scores do not reflect the actual meaning and level of opinion words. For example the score for the word “large” is 0.50. This score does not really reflect the meaning of this adjective. There is a need to enhance such scorings by introducing better formulas to obtain scores that reflect the meaning of the subjective word.

Opinions can be expressed in different degrees (referred in this thesis as

‘senti strength’). For example, the subjective sentence “This house is beautiful” is different from the opinion sentence “The house is very beautiful.” The second sentence expressed a higher strength of opinion. Till date of this research, opinion strength is not well researched in depth as a sub subject. So far Adverbs are considered as words that give more strength to adjectives. However, this does not give full picture of the opinion strength as other words or pre/postfixes can increase / decrease strength of opinion. For example, “nice”, “extra nice”, “bright”, “bright less” are few opinion words that express different senti strengths that need to be analyzed. Moreover, proper scores need to be assigned to opinion Senti Strength. For example, if a

6 score of 85 is assigned to the opinion word “beautiful,” a higher score (like

90) needs to be assigned to the opinion “very beautiful”. Moreover, the opinion “extremely beautiful” needs to be assigned much higher score. Such opinion strength and assignment of strength score are not well addressed in the literature and this is one of the major areas which needs to be studied.

Since opinions are fuzzy in nature, Fuzzy logic is not properly used to enhance the opinion mining field as explained in chapter 2. Since opinion words are vague, fuzzy logic is a well-known tool to address vagueness in opinion words for better sentiment analysis. Only very few studies done by

Samaneh et al. (2010) and Animesh et al. (2011). Animesh did not utilize features of fuzzy logic covering fuzzification, fuzzy rules and defuzzification process. These are very important phases of fuzzy logic and implementing them can enhance the scoring of opinion words. Samaneh on the other side did define membership functions for positive, negative and opinion intensities. Values are predefined based on the classification module which was defined at the beginning. Moreover, fuzzy rules were based on linguistic patterns which were predefined patterns and these cannot be assured to be comprehensive to cover all cases in reviews. In Addition, the defuzzification crisp values are also predefined using a set of expected results. These needs to be addressed and full features of fuzzy calculus need to be applied to produce better crisp opinion values or scores.

1.3 The Problem Statement

In view of the problem background and based on previous approaches, it can be concluded that the following major problems are encountered and need to be further improved:

Not all opinion part of speech words are considered when analyzing and processing extracted opinion words and sentences. The combination of

Adjectives, Verbs, adverbs and Nouns is not considered with existing

7 powerful techniques for the betterment of opinion mining and sentiment analysis.

The lack of proper representation of extracted opinions for better scoring and summarization.

Existing scoring methods are dependent on SentiWordNet with limited improvement on the usage of scoring formulas.

Opinion intensity, degree or strength of expressed sentiment is judged on the usage of adverb words. Opinions can be expressed at different senti strengths and levels by using words other than adverbs and such senti strengths are not properly measured.

Fuzzy logic is an important tool which is developed to address vague and unclear problems. Hence, this tool can be used to enhance opinion representation by placing the produced fuzzy crisp values in opinion predicates.

Hence, this research will focus on resolving the above problems by addressing the below queries and issues.

Can other types of part of speech apart from adjectives and adverbs be used to enhance opinion mining and opinion scoring especially if clumped with tools like SentiWordNet and other effective techniques?

How can knowledge extraction of opinions be enhanced?

How can representation of extracted opinions be improved?

Can extracted opinions be represented in vectors called predicates?

How can the strength of an opinion be extracted and represented?

How can summarization of opinions be enhanced?

How can SentiWordNet be used to score opinion words and opinion strengths?

How can fuzzy logic be used to enhance opinion scores obtained from

SentiWordNet?

8

1.4 Research Objectives

This research is intended to meet the following key objectives:

To enhance aspect level opinion mining knowledge extraction by introducing opinion predicates for Adjective, Verb, Adverbs and Nouns (AVAN).

To enhance aspect level opinion mining knowledge representation by introducing opinion Senti Strength.

To enrich aspect level opinion mining representation by using fuzzy logic.

1.5 Research Questions

This study will answer the following questions:

Can aspect level opinion mining knowledge extraction be enhanced by introducing opinion predicates for Adjectives, Verbs, Adverbs and Nouns

(AVAN)?

How can aspect level opinion mining knowledge representation be enhanced by introducing opinion senti strength?

How can aspect level opinion mining representation be enriched by using fuzzy logic?

1.6 Scope of the Study

The field of opinion mining is a huge field which contains many different research areas. However, this thesis is focused on certain areas as explained below.

9

Sentiment Analysis can be done on document, sentence and aspect levels.

This study focuses on aspect level opinion mining. The main focus of this study is to enhance the extraction and representation of opinion mining knowledge on aspect level only. This thesis does not address sentiment analysis on document and sentence levels.

This study handles explicit opinions only and does not deal with other types of opinions like hidden, emotion, implicit, spam and sarcastic opinions. Hidden opinions are opinions that are not explicitly stated in the sentence like “I have stayed in this hotel for more than 10 times.” This implies that this person likes this hotel but it is not explicitly stated. Spam opinions are fake opinions that are made about a product or service for the purpose of broadcasting false news. Sarcastic opinions are expressed without using sentiment words and are expressed in an opposite manner like “What a great car! It stopped working in two days.” The previous example also expresses an emotion opinion which is also not covered in this thesis. Emotion opinions are opinions that are expressed with emotions and which are not explicitly stated. (Bing Liu 2012).

In addition to the above, this thesis does not address context awareness sentiment analysis. Here an opinion word meaning can change based on the context in which it is used. One can use a word and can be viewed as positive in one situation. However, if the same word is used in another context, it can be viewed as negative. For example, the word ‘long’ can be positive if it is used for a battery life of a mobile phone. On the other side, the word ‘long’ can indicate a negative meaning if it used for a process that takes long time. In many cases, it is difficult to distinguish between the two contexts. Another challenge is that in few cases opinion words also occur in objective sentences like “it is a long distance between Florida and California.” Here the word ‘long’ is used in a factual sentence. This thesis does not cover the “context awareness” of used opinions in text.

Moreover and on the dataset side, this thesis focuses on the opinions and sentiments made from online reviews for passengers who traveled with Oman Air.

10

In the first part, the thesis looks at the opinion mining aspect through extraction and representation of data into opinion predicates and then such representation is enhanced by producing opinion scoring and accounting. Secondly the fuzzy logic analysis will be used as a supportive method to enhance the representation of opinions by determining the polarity and strength of these opinions.

In addition to the above, this study uses SentiWordNet as a lexical resource in order to enhance the extraction, representation and scoring of opinions. This is due to the following reasons (Esuli and Sebastiani 2006; Elomaa et al., 2011; Kennedy et al., 2002; Burns et al., 1990).

SentiWordNet is built specifically for the field of opinion mining and sentiment analysis. Hence, it is a powerful resource for extraction, representation and scoring of opinions.

SentiWordNet is built on the well-know lexical resource WorldNet which groups words into sets of synonyms named as synsets and it records all relations among these synonym sets or their members. (Pang and Lee 2008).

SentiWordNet is widely used in opinion mining literature for opinion representation, classification and scoring. Hence, it will be more professional and accurate to compare work of this thesis with previous works when using

SentiWordNet.

SentiWordNet provides three score for each word or synset (Positive,

Negative and Objective). These scores help a lot in analyzing and classifying sentiments easily.

All other existing lexical resources like HowNet, ConceptNet, SenticNet,

WordNet and other dictionaries lack the above characteristics.

1.7 Significance and Contributions of the Study

This study has the following contributions to the field of opinion mining:

11

The main contribution of this study is the enhanced way to represent aspect level opinion mining knowledge using a structure called ‘Opinion Predicate.’

A predicate is a vector space consisting of the main components of an opinion in an ordered way. Such a rich structure will empower opinion processing, analysis and scoring. Few of the previous studies represented opinions in tuples which hold very basic elements of an opinion. This study introduces an opinion predicate structure which covers important elements (like opinion strength, score, opinion aspect, aspect group) of an opinion and these elements were not covered in previous representations.

This is the first study to introduce the combination of AVAN (Adjectives,

Verb, Adverb and Noun) for extracting, classifying, scoring and summarizing extracted opinion words by using SentiWordNet, Opinion senti strength and fuzzy logic tools and techniques. Most of the previous studies focused on

Adjectives and Adverbs as the main source for opinion words. Nouns and verbs can also be used as opinion words and considered by few previous studies. The reasons for selecting SentiWordNet are highlighted above and in chapter two after analyzing various opinion mining lexical resources and databases.

This study introduces opinion strength or degree to properly capture the strength of an opinion (this is referred in the literature as ‘Senti Strength.’).

No studies as of the date of this thesis introduced opinion strength for all part of speech opinion words (AVAN) which is an important contribution of this research. Previous studies used only adverb as a word which can intensify opinion adjectives.

Based on the opinion predicate and opinion senti strength, this study introduced opinion accounting which is a new approach for opinion grouping and summarization. Opinion accounting, which is another vital contribution, proposed a better way to summarize opinions at different level and using a simple way to present opinion scores for various aspect groups. This differs from previous studies as summarization can be done using opinion predicates and at different levels.

12

Using fuzzy logic is another major contribution to further enhance the representation of opinion by adding more accurate score in the opinion predicate. This study has utilized the power of fuzzy logic to analyses indepth opinion words. The reasons behind using fuzzy logic are given in section 2.7. All the three major phases of fuzzy logic (Fuzzification, Fuzzy

Rules and Defuzzification) are implemented. In addition, this research demonstrates the importance of an integrative use of NLP and fuzzy logic as a basis for modeling abstract relationship. None of the existing studies has done such multi-level analysis using fuzzy logic the way applied in this thesis.

1.8 Structure of the Study

The thesis is placed out in seven chapters, as follows

The first chapter introduces the reader to the concept of opinion mining and describes the aims and objectives of the study. It also indicates the scope of the research and sheds the light on the contributions of this study.

The actual study will begin from chapter two by reviewing the existing literature in this field. The second chapter mainly analyzes in depth existing approaches, methods, techniques and challenges found in the area of sentiment analysis.

Chapter three reviews and explains the stages of this research and the methodology which is adopted to effectively complete this research. It also highlights various techniques that will be used to meet the set objectives. All definitions and experiment setups are explained in this chapter.

Chapter four describes how aspect level opinion mining knowledge extraction and representation can be enhanced using opinion predicates and opinion accounting for Adjectives, Verbs, Nouns and Adverbs (AVAN). Moreover, this chapter introduces AVAN structure covering opinion predicates, senti strengths and accounting. Moreover, this chapter explains what about

13

SentiWordNet, its structure and its scoring norms. In addition, this chapter explains AVAN scoring formulas and the enfacements made to effectively score opinion words using Opinion senti strength.

Chapter five explains how aspect level opinion mining knowledge representation can be enriched by using fuzzy logic. All stages of the fuzzy process are explained with examples to illustrate these concepts.

Chapter six gives detail results of classifying customer reviews using

SentiWordNet, AVAN, Opinion senti Strength and fuzzy logic as classification features using SMO classifier.

Chapter seven summarizes the thesis, highlights all the contributions and lists major open issues and challenges that need to be addressed in future researches.

1.9 Chapter Summary

Opinion mining aims to track and summarize opinions of public about a product or a service. Research on opinion mining is a vast area covering many aspects among which is how to properly extract, classify, represent and score opinion strengths in order to take proper decisions. Opinion mining systems should study the degrees and strengths of opinions rather than classifying them as either zero or one.

Existing studies on scoring opinion words, sentences and documents still have long way to go. There are many rich areas that need to be addressed and solved in order to scientifically present proper opinions to both service/product providers and customers. This study embarked based on such needs and based on many open areas in this rich field of research.

The main objective of this thesis is to enhance aspect level opinion mining knowledge extraction and representation. The focus here is on the aspect level of sentiment analysis in order to analyze opinions at phrase and word levels and to produce quality results on opinion representation, polarity assessment and summarization. Moreover, aspect level analysis of opinions makes more sense as

14 people may like few features and dislikes others and hence polarity analysis can be measured more accurately. Also aspect level allows capturing and better representing opinion together with sentiment strengths. The representation of aspect-level sentiments as predicates is the main contribution of this thesis. These predicates also capture sentiment strengths which can be exploited for various uses. To achieve this, powerful resources and tools like SentiWordNet and fuzzy logic are utilized in order to enhance the opinion knowledge extraction and representation. Furthermore, this thesis has proposed novel ideas and interesting directions for coupling fuzzy logic with an integrated use of NLP in representing and characterizing aspect-level sentiments.

In addition to the above and since this thesis focuses on aspect-level analysis of sentiments, there is clear connection between opinion mining and both

SentiWordNet and fuzzy logic. SentiWordNet is a lexical resource or a database which contains all English words with positive and negative scores for each word.

Hence, this resource can help a lot in analyzing opinions at aspect levels. On the other side, fuzzy logic is built to solve complex problems like handling vagueness of words based on defined classes, knowledgebase and fuzzy rules. Hence, fuzzy logic is an important tool to address opinion fuzzy nature. In view of the above, opinion mining, SentiWordNet and fuzzy logic create an excellent combination to enhance opinion extraction and representation at aspect level.

This chapter gave an important introduction of the subject and relevant background of this study. Next, this chapter set important basis for this research. It explained the problems which are addressed and accordingly defined the objectives and research questions that are answered by this study. Moreover, it defined the contributions that are made at the end of this research. Chapter 2 will give a detail analysis of studies and researches done in this field.

Download