Uploaded by Ejira Q. Reyata

CS (BEHAVIORAL SEGMENTATION ANALYSIS OF ONLINE CONSUMERS IN SHOPEE BY USING CLUSTERING IN K MEANS ALGORITHM

advertisement
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
BEHAVIORAL SEGMENTATION ANALYSIS OF ONLINE
CONSUMERS IN SHOPEE BY USING CLUSTERING
IN K-MEANS ALGORITHM
A Thesis Presented to the Faculty of the
College of Informatics and Computing Sciences
Batangas State University
The National Engineering University
Lipa Campus
Marawoy, Lipa City
In Partial Fulfillment of the Requirements for the Degree
Bachelor of Science in Computer Science
LAYLO, FRANK B.
REYES, CYMBELLYN ATHENA C.
ROSALES, GAYLE MARIE L.
MR. JAYSON A. BALAYANTOC
Adviser
May 2022
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
APPROVAL SHEET
This undergraduate thesis entitled “BEHAVIORAL SEGMENTATION
ANALYSIS OF ONLINE CONSUMER IN SHOPEE BY USING
CLUSTERING IN K-MEANS ALGORITHM” prepared by Frank B. Laylo,
Cymbellyn Athena C. Reyes and Gayle Marie L. Rosales in partial fulfillment of
the requirements for the degree of Bachelor of Science in Computer Science has
been examined and recommended for oral examination.
MR. JAYSON A. BALAYANTOC
Adviser
PANEL OF EXAMINERS
Approved by the committee on Oral Examination with a grade of _PASSED .
FRANCIS G. BALAZON, DIT
Chairperson
RICHELLE M. SULIT, MSCS
DIONECES O. ALIMOREN, MSCS
Member
Member
Approved in partial fulfillment of the requirements for the degree of Bachelor
of Science in Computer Science.
FRANCIS G. BALAZON, DIT
May 2022
Dean, College of Informatics and
Computing Sciences
ii
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
ACKNOWLEDGEMENT
We would like to offer our deepest gratitude to everyone who assisted us in
completing this study.
We would like to thank Dr. Francis G. Balazon, for his patience and
understanding;
To Mr. Dioneces O. Alimoren for the inspiration, encouragement, and support
during the dissertation process;
To our thesis advisor, Mr. Jayson A. Balayantoc for helping us to better
understand this research;
To our Parents, who have always been there for us and given us their full support;
To all of the Respondents who took the time to complete our survey;
And lastly, to all the Panelists, for their hard work and for being a great motivator.
The Researchers
iii
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
DEDICATION
This study is wholeheartedly dedicated to our beloved parents and guardians, who
have inspired and strengthened us throughout this research.
To our classmates and friends who helped us accomplish our study with their
words of advice and encouragement;
To our Dean of College of Informatics and Computing Sciences, Dr. Francis G.
Balazon and to our Program Chairperson, Mr. Dioneces O. Alimoren;
To our professors and to the Batangas State University;
To our advisor, Mr. Jayson O. Balayantoc, for his tremendous effort and patience
to guide us throughout the study;
To Almighty God, for his guidance, power of mind, and protection and for giving
us a healthy life to be able to carry out this research;
All of these, we offer to you.
The Researchers
iv
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
TABLE OF CONTENTS
TITLE PAGE
APPROVAL SHEET
ii
ACKNOWLEDGEMENT
iii
DEDICATION
iv
LIST OF TABLES
vii
LIST OF FORMULAS
viii
LIST OF FIGURES
ix
ABSTRACT
x
CHAPTER I
Introduction
1
Background of the Study
4
Statement of the Problem
5
Objectives of the Study
6
Significance of the Study
7
Scope and Limitations of the Study
8
Definition of Terms
8
CHAPTER II
Review of Related Literature
11
E-commerce
11
Consumer Behavior
13
Consumer Segmentation
15
Behavioral Segmentation
17
K-means Algorithm
18
Review of Related Studies
v
20
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Synthesis
30
Conceptual Framework
31
Theoretical Framework
32
CHAPTER III
Project Concepts
34
Analysis and Design
35
Required Packages
36
Model Requirements
37
Functional Requirements
38
Non-Functional Requirements
38
Questionnaire Design
39
Data Analysis
40
Segmentation Process
42
K-means Clustering Algorithm
48
Model Evaluation
50
CHAPTER IV
Computation for the Accuracy, Precision & Recall
52
Number of Demographic Profiles in each Behaviors
53
CHAPTER V
Summary
56
Conclusion
57
Recommendations
59
REFERENCES
60
APPENDICES
vi
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
LIST OF TABLES
Table No.
Page
1
Software Requirements for Development
37
2
Confusion Matrix Result
52
3
Age in each Behavior
54
4
Females in each Behavior
54
5
Males in each Behavior
55
vii
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
LIST OF FORMULAS
Formula No.
1
Page
Formula for Evaluation
50
viii
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
LIST OF FIGURES
Figure No.
Page
1
Conceptual Framework
31
2
Methodology of the Study
35
3
Dataset for Clustering
40
4
Scaled Data
41
5
Distribution Age
42
6
Distribution of Shopping Rate
43
7
Distribution of Price Payment
44
8
Distribution of Product Diversity
45
9
Silhouette Analysis
46
10
Clustered Data
47
11
Clustered Behavioral Factors
48
12
Behavioral Characteristics Model
49
13
Confusion Matrix
51
14
Behavioral Segmentation Analysis
53
ix
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
ABSTRACT
TITLE:
BEHAVIORAL SEGMENTATION ANALYSIS OF
ONLINE CONSUMERS IN SHOPEE BY USING
K-MEANS ALGORITHM
AUTHORS:
Laylo, Frank B.
Reyes, Cymbellyn Athena C.
Rosales, Gayle Marie L.
INSTITUTION:
Batangas State University Lipa Campus
ADDRESS:
Marawoy, Lipa City
DEGREE:
Bachelor of Science in Computer Science
YEAR:
2021 - 2022
ADVISER:
Mr. Jayson A. Balayantoc
E-commerce represents a major shift in today's world of globalization.
During the last decade, the majority of business organizations have kept up with
technological progress and innovation. A marketing method called consumer
segmentation divides consumers into groups based on shared qualities, needs, and
interests. This study is about determining the different behavior that Shopee
x
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
consumers have. Behavioral segmentation analysis is a marketing strategy in which
consumers are grouped based on their certain behavioral factors such as Shopping
Rate, Product Payment and Product Diversity. 4,000 Shopee customers who have
made purchases from the business were selected at random to participate in the
survey as responders. This study discovered four sorts of online consumer segments:
opportunist consumers, transient consumers, need-based consumers, and repetitive
consumers. There is an in-depth discussion of the behavioral characteristics of each
segment, and the Shopee company, along with other selling companies, will be able
to design strategies if they gain an understanding of each segment. The behavioral
segmentation method discussed in this paper is based on the clustering algorithm kmeans. The visualization was performed using the orange application, while the kmeans algorithm was implemented using the jupyter notebook python environment.
xi
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CHAPTER I
This chapter includes an introduction, the background of the study, statement
of the problem, objectives of the study, significance of the study, scope, and
limitation of the study, and definition of terms.
Introduction
Since the arrival of the Internet in the Philippines in 1994, businesses have
been able to sell their products online and make sales transactions via email.
Although e-commerce benefits businesses, questions have been raised concerning
its impact on economic growth and, in particular, productivity growth. Previous
technological revolutions improved living standards over time, meeting one of the
development's primary objectives.
Shopee, a social-first, mobile-centric marketplace where users can explore,
shop, and sell goods and services, started in Singapore in February 2015. It is a
leading mobile e-commerce platform in Southeast Asia, started as a C2C platform
before turning into a B2C marketplace serving consumers across the region.
Payoneer and Shopee have teamed up to provide sellers with simple and affordable
payment options. The app-based platform created a website to compete with other
1
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
e-c ommerce websites in the region. Shopee's "Shopee Guarantee'' escrow service
withholds payment from merchants until consumers receive their purchases. [1]
Due to the rapid growth of technology, organizations have moved from the
old way of selling items to the electronic way of selling items. Business groups rely
on the internet to transact business. Online buying allows analytical consumers to
find a product after a comprehensive search. Online shopping simplifies and
improves the lives of consumers. It saves time and money by allowing them to pay
for their purchases without lining up at cash registers. Online shoppers can also track
their orders and track their shipments. Due to the lack of maintenance and real estate
requirements, businesses can sell items online at attractive pricing.
Although the internet is a convenient way to shop, some people only use it
in certain situations. They use the internet to research products before buying them
in stores. Some worry about being addicted to online shopping. The following are
drawbacks of online shopping: the lack of touch-feel-try generates questions about
the quality of the goods on offer; a consumer must buy a product without seeing it
in person, and online payments are not sufficiently secured. The drawbacks of
online purchasing will not impede its expansion; in fact, internet shopping assisted
firms in recovering from the recession. To make online buying productive,
2
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
merchants should pay attention to stumbling barriers and provide a safe payment
mechanism.
For a consumer to make a purchasing decision, many factors must be
considered. Factors could be their friends, social structure, ethical or personal
factors,
economic
factors,
technological
factors,
cultural
values,
and
recommendations. Online consumer behavior describes how consumers decide
whether or not to buy anything from an online store. Although each consumer’s
needs are different, the new expectations driving online consumer behavior are
general. Product availability, transparency of delivery, low shipping costs, and,
more recently, a smooth buying journey have all influenced whether consumers buy
online or not and whether they will become regular customers.
The focus of this study is to use the k-means algorithm to analyze online
consumer behavioral segmentation to help sellers understand their consumers'
diverse habits. The k-means algorithm divides data points into discrete, nonoverlapping categories. One of the most prominent applications of the k-means
clustering is the segmentation of consumers’ behavior to understand them better and
increase revenue.
3
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Background of the Study
The act of purchasing goods or services directly from a seller over the
Internet using e-commerce is known as online shopping. In recent years, the
internet-based or "Click and Order" business model has replaced the traditional
brick-and-mortar business strategy. More people than ever before are turning to the
internet to purchase a wide range of products, from houses to shoes to plane tickets
and everything in between. [2] Individuals now have a greater number of options
when it comes to selecting their products and services while buying through an
online platform. Consumers have adopted online shopping as a preferred method of
shopping. This new shopping technology not only provides a large quantity and
variety of products to potential consumers, but it also provides a wide range of
company opportunities and a large market.
Despite the numerous benefits, some consumers may consider online
shopping to be hazardous and untrustworthy. There is no face-to-face interaction
between the seller and the consumer, which makes it difficult to socialize, and the
consumer may be unable to establish trust in the seller. In order to improve the
online shopping rate, especially in the countries in which the online shopping rate
is low, it is important to carry out a careful examination of consumer shopping
4
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
behavior. Understanding consumer shopping behavior is a very important step in
developing successful marketing strategies.
The importance of consumers' behaviors toward sustainable production and
consumption is an important issue today. The core concept of this research is to
analyze general consumer purchasing behavior as well as consumers’ attitudes
toward buying products online using k-means algorithm. Since physical stores are
no longer the only way to achieve retail success, an increasing number of businesses
are now providing online shop interfaces for consumers.
Statement of the Problem
The research examines online consumer behavior by gathering demographic
and behavioral data such as shopping rate, price payment, and product diversity. In
addition, the study intends to employ the k-means algorithm to cluster the
demographic and behavioral characteristics of Shopee online shoppers. An increase
in consumer loyalty by personalizing services to consumers and improving
consumer service allows a seller to approach a client or prospect precisely based on
their individual needs and preferences.
5
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Objectives of the Study
In response to the issues mentioned, this study is conducted to achieve the
following objectives:
1. Collect the Shopee consumer data based on:
1.1 Demographic
1.2 Shopping Rate
1.3 Price Payment
1.4 Product Diversity
2. To process the collected data using a K-means clustering algorithm.
3. To develop a model for consumer segmentation using the K-means
algorithm.
4. To evaluate the performance of the model using:
4.1 Accuracy
4.2 Precision
4.3 Recall
6
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Significance of the Study
This study aimed to use the k-means algorithm to segment consumer
behavior when it comes to making a product purchase decision. The outcome of this
study would be beneficial to the seller, consumers, future researchers and the
university.
To the seller, this study will help them to understand consumers’ needs and
aspirations by offering something unique based on their behavior when purchasing
a product or service, and optimize product advertising for better relevancy by
segmenting consumers based on their behavior.
To the consumer, this study will aid in the production and provision of the
goods and services they demand.
To the future researchers, this will assist them in applying k-means for
consumer segmentation and guide their future articles concerning consumer
segmentation using the k-means algorithm.
To the University, this study will benefit the university library by providing
additional information to the students.
7
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Scope and Limitations of the Study
Shopee consumer behaviors are the primary emphasis of this study. The
respondents of the study were 4,000 Shopee consumers who had purchased products
from the company.
To gather the data, the researchers used Google Forms to determine the
Shopee consumer behaviors based on demographics, shopping rates, product
payment, and product diversity. After collecting data, the researcher will utilize the
k-means algorithm to identify distinct consumer behaviors, such as opportunist
consumer, transient consumer, skeptical consumer, and repetitive consumer.
Definition of Terms
In order to provide a better understanding of the terminology used in the
study, the following words are conceptually and operationally explained.
Algorithm - A method for resolving a well-defined computational problem. It
requires knowledge of the various options for solving a computational problem, and
hardware, networking, programming language, and performance constraints that
accompany any particular solution. [3]
8
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
B2C - Business to Customer refers to the process of selling products and services
directly between a business and consumers who are the end-users of the company.
It is also known as direct marketing. [4]
Behavioral Segmentation - The process of sorting and grouping clients based on
their behavior is known as behavioral segmentation. [5] These habits include the
items and material consumers consume, as well as the frequency with which they
connect with an app, website, or business.
C2C - Customer to Customer (C2C) is a business strategy that allows consumers to
trade with one another, often over the internet. C2C firms are a form of a company
model that arose as a result of the sharing economy and e-commerce technology.
[6]
Clustering - Involves grouping the population or data points so that they are more
similar to each other than to other groupings. In short, the goal is to sort comparable
groupings into clusters. [7]
Customer Segmentation - The process of dividing consumers into groups based
on common characteristics so companies can market to each group effectively and
appropriately.
9
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
K-means Algorithm - An iterative algorithm that tries to partition the dataset into
Kpre-defined distinct non-overlapping subgroups (clusters) where each data point
belongs to only one group. [8]
Python - An object-oriented, dynamically semantic high-level programming
language. High-level data structures, dynamic type, and binding make it perfect for
Rapid Application Development and as a scripting or glue language for existing
components. [9]
Shopee - Southeast Asia's and Taiwan's leading e-commerce platform. Shopee
offers consumers a simple, secure, fast, and fun online shopping experience that tens
of millions of people use every day. [10]
10
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CHAPTER II
This chapter discusses relevant literature and studies that the researcher used
to support the current study's importance. It also gives theoretical background and
synthesis to comprehend the research and gain a better understanding.
Review of Related Literature
This related literature includes an exhaustive evaluation of the existing
literature on our topic. This chapter will discuss the relevant information and
findings from the existing literature.
E-commerce
UNCTAD, also known as the United Nations Center for Trade and
Development 2017, has emphasized the significance of e-commerce, particularly
online shopping for developing countries in recent years. For over a decade, the
UNCTAD e-commerce and law reform program in 2020 has assisted developing
countries in Africa, Asia, and Latin America in their efforts to establish legal
regimes that address the issues raised by the electronic nature of information and
11
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
communications technologies (ICTs), such as ensuring trust in online transactions,
facilitating the conduct of domestic and international trade online, and providing
legal protection for users and providers of e-commerce and electronic government
services. [11]
According to Vicente 2016, e-commerce was a rising trend in the Philippines
due to the widespread adoption of mobile technologies, particularly among younger
consumers. Despite its growing popularity and acceptance throughout the country,
it was not exempt from the problems that the country was experiencing. [12]
Segovia 2016 claims that e-commerce is falling behind due to country's rapid
growth and development. This was due to a lack of infrastructure, including internet
connectivity, electronic payments, a legal framework, and logistical support. It was
also stated that there are obstacles and challenges to developing e-commerce in the
Philippines due to the difficulties associated with accessing the internet. [13]
Companies that conduct business outside of their home country, according to
Babenko et al. 2017, may be more interested in reducing operational expenses
through the use of information technology. With rich information flows to simplify
and optimize the movement of physical commodities in the supply chain, it is
possible to save both time and money on product delivery. This is made possible
through the use of the Internet for transactions and coordination. It is usually thought
12
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
that the implementation of e-commerce is a worldwide process overseen by a
common set of actors. [14] Lazada, Shopee, and Zalora are examples of e-commerce
platforms that have long dominated the online shopping business in Southeast Asia.
These platforms sell goods from their own fulfillment centers while also allowing
third-party sellers to sell through their platforms. Because of the growing number
of online users, retailers have turned to the digital medium in order to more readily
reach their target consumers. According to the Statista Research Department 2021,
these platforms offer tempting sales bargains such as free shipping, substantial
discounts on a variety of items, and payment methods like cash-on-delivery. [15]
Consumer Behavior
Consumer behavior is something that needs to be better understood in order
for its growth to succeed. Using consumer behavior research, according to Vrender
2016, we can develop a general model of purchasing behavior that shows the stages
that consumers go through while making a purchase decision. [16] Further to the
topic, Panaitescu 2021, explained that consumer behavior could be divided into four
categories: habitual purchasing behavior, variety-seeking behavior, dissonancereducing purchasing behavior, and complicated purchasing behavior. Customer
behavior types are established by the sort of goods a consumer requires, the amount
13
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
of engagement, and brand distinctions. When buyers purchase a high-priced item,
they exhibit complex purchasing behavior. Consumers are heavily involved in the
buying choice in this uncommon transaction. Consumers will conduct extensive
research before making an investment decision. When a client engages in varietyseeking purchase behavior, he or she is not highly active in the purchasing process,
yet there is still a difference in the product given by different brands. Consumer
engagement is quite high in dissonance-reducing purchasing behavior. This might
be owing to the high cost and occasional buying. Furthermore, there is a scarcity of
options with little distinctions across brands. In this case, a buyer purchases readily
available goods. When a consumer has little input in a purchasing choice, this is
portrayed as habitual buying behavior. In this situation, the consumer notices only
a few notable changes across brands. [17] In the event of service failures, consumers
who are influenced by negative emotions have higher switching intentions,
especially when the failures are controllable factors that can be controlled and
avoided. However, Urea and Hidalgo 2016 say that if consumers are handled
appropriately after a failure, distributive justice methods such as monetary
compensation might increase happy sensations while decreasing negative ones.
Furthermore, negative emotions have an impact on repurchase intent. Also,
procedural justice, such as obtaining fair treatment from the company, increases
14
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
consumer satisfaction, particularly when a consumer complaint and the company
attempts service recovery after the complaint has been filed. [18] Chou and Hsu
2016, asserted that individuals' intentions to continue purchasing online are
influenced by their level of satisfaction with and perceived usefulness of the website
they are currently using. Opposite to this, shopping habits increase the influence of
emotional assessment on continuation intention, but shopping habits decrease the
influence of rational evaluation on continuation intention. [19] There is a direct
correlation between the level of involvement an online business and the level of
enjoyment experienced. Emotional pleasure increases the likelihood of purchase
during online interactions, and consumers' previous emotional experiences may
have an impact on their purchasing decisions. The use of human brands, and
enhanced impressions of human connections and the building of emotional bonds,
according to Chechen et al. 2016, could give businesses an edge over their
competitors in these circumstances. [20]
Consumer Segmentation
Consumer segmentation has always been important in business. According
to a recent Forrester survey, only 33% of organizations implementing consumer
segmentation believe that it significantly impacts their business. Specifically, the
15
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
report states that one of the primary reasons organizations fail is that they continue
to rely on traditional consumer segmentation methods rather than embracing the
vast amount of consumer data and advanced analytics approaches that are now
available. [21] According to Barman et al. 2019, consumer segmentation is
significant when a firm aims to reach a certain market area. This assists marketing
teams in effectively managing their budgets to achieve the best possible results in
terms of product sales results. Following the proper implementation of consumer
segmentation, marketers can create separate content for each target market category
they wish to target. Marketers will benefit from this method of communication as
they work to build relationships with their target audiences. [22] A single client
attribute, such as age, gender, country of origin, or stage in the family life cycle,
could be utilized as a segmentation criterion, according to Dolnicar et al., 2018.
However, it could also contain a broader range of consumer characteristics, such as
the number of perks desired when purchasing a product, the number of activities
accomplished while on vacation, environmental ideals, or a spending habit. [23]
Consumer segmentation may be used to assist businesses in personalizing marketing
plans, analyzing trends, planning product development, advertising campaigns, and
offering relevant products by utilizing a range of distinct customer attributes.
Srivastava 2016, also added that consumer segmentation personalized individual
16
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
communications in order to communicate more effectively with the desired groups.
Location, age, gender, income, lifestyle, and past purchasing history are the most
typical factors utilized in consumer segmentation. [24]
Behavioral Segmentation
The simplest method for understanding how a product provides value or
overcomes obstacles is to categorize consumers based on their actions. According
to Jones 2017, behavioral segmentation refers to a process in marketing that divides
consumers into segments depending on their behavior patterns when interacting
with a particular business or website. [25] As a correlate to what Jones asserted,
Kotler and Keller 2016 stated that people could be divided into segments based on
whether they have an enthusiastic or positive attitude toward a product, or whether
they are indifferent, negative, or hostile toward a product. By considering
consumer's attitude toward a company's brand or product, a company will gain a
broad view of the market and segments. [26] Behavioral segmentation, according to
Fahed Yoseph 2019, is considered one of the essential principles in modern
marketing. Traditional consumer segmentation models necessitate months of
analytical work, resulting in discrete consumers’ insights that are out of date when
compared to the dynamic body of consumers they are meant to reflect.
17
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Personalization and the overall consumer experience are factors for the retail
industry’s success or failure. [27]
K-means Algorithm
According to Bhade et al. 2018, k-means clustering is the best appropriate
method for clustering the dataset. Consumer loyalty and attention span are two
important issues affecting today’s business sector, and they are both difficult to
maintain. With the help of the k-means algorithms, it is simple to determine which
consumer is the most profitable based on the clusters that have been identified. [28]
Segmentation, as defined by Mohammed Muzammil 2021, is the process of
grouping identical elements into similar groupings or clusters. It can be defined as
the process of categorizing data into groups that contain data points or elements that
are comparable to one another. Identical data points are grouped together in one
group using the k-means algorithm, and all of the data points in that group share
common characteristics but are distinct when compared to data points in other
groups. [29] Furthermore, according to Abhinav Sagar 2019, the purpose of kmeans is to divide data points into subgroups that are distinct from one another and
do not overlap. K-means clustering is commonly employed in customer
segmentation to acquire a deeper understanding of them, which may then be used
18
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
to increase a company's revenue. [30] In addition, Trevino 2016 stated that the kmeans clustering algorithm is used to identify groupings in data that have not been
explicitly classified by the researcher. This can be used to confirm business
assumptions about what types of groups exist or to identify unknown groups in
complex data sets. As soon as the algorithm has been run and the groups have been
defined, any new data may be quickly and easily assigned to the appropriate
category. The k-means clustering algorithm produces a final result by iteratively
refining the initial results. The algorithm inputs are the number of clusters K and the
data set. The data set is a collection of features for each data point. [31] Many
clustering algorithms have been developed to classify consumers, according to
Shihab et al 2019. This was done to achieve superior clustering results. K-means
clustering is a well-known technique in data mining; it is an unsupervised, iterative,
partitioning learning technique used in data mining. It addresses a wide range of
clustering issues, particularly large datasets. It is a very powerful tool. The algorithm
is divided into two sections. In the first section, K centers are chosen at random. K
is fixed at the start. Each data object is sent to the nearest center in the second
section. A self-organizing map can be used to select the starting value of K. [32]
19
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Review of Related Studies
This section summarized the studies on our topic. This study is significant
for the findings and reveals any research gaps or weaknesses.
According to Blut 2016, individuals or firms participating in e-commerce,
whether buyers or sellers, rely on Internet-based technology to conduct their
transactions to succeed. Due to the huge power of e-commerce, transactions may
take place at any time and from any location, and geophysical boundaries can be
removed. [33] To boost consumers’ trust in the company, according to a study
conducted by Panula in 2017, its website and its product and business information
must be clear and easily accessible to consumers to increase their trust in the brand
and increase sales. The availability of electronic commerce to online consumers, the
speeding up of delivery, and the rapid response to feedback requests were all used
to develop trustworthiness among consumers. [34] Babenko et. al., 2019, ecommerce technology has benefited businesses. This permitted business and
corporation to sell their products and services worldwide and conveniently, enabling
consumers to purchase from any location suitable for their schedules. Since the
beginning of e-commerce, there have been no constraints on searching for
innovative technologies to meet the existing scenario, which both specialists and
companies have pursued. This suggests that e-commerce, as we know it today, will
20
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
be significantly different in five years. Electronic commerce will witness
considerable development and technological advancements as it becomes more
prevalent in business globally, in both developed and developing countries. [35]
The "Target Market (customer) Segmentation," claims that every salesperson
and marketer understand that their products and services cannot be supplied to
everyone. To effectively sell your product or service, you must first identify the
exact
kind
of
consumers
that
will
benefit
the
most
from
your
offering. Segmentation research may significantly help your company identify how
people differ in their perspectives, desires, and motivations, allowing you to build a
strong brand portfolio and adapt marketing messages to different groups of people.
As mentioned by the researchers, segmentation analysis for each product category
must be purpose-built and unique, and design and analysis require an investigative
and iterative approach. [36] Following the market segmentation into consumer
groups with comparable characteristics, the organization should establish strategies
for each category, argued to Nilsson et al 2016. These strategies should be designed
based on how value is generated in the various segments and, as a result, should be
tailored to each group of clients. As a result, the products become segment-specific
to fulfill each segment’s needs. Market segmentation is thus a useful strategy for a
firm seeking to become more customer-focused. Market segmentation may be
21
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
required for a providing firm operating in an industrial industry that wishes to
become more client-oriented. A segmentation algorithm may lead to a better
knowledge of consumers and raise awareness of the importance of meeting
customer demands in all operations within the organization, particularly marketing
and sales activities. This debate indicates that efficient segmentation necessitates a
significant amount of work and expertise from the whole organization. [37] Since it
is directly related to a company's consumer satisfaction, consumer segmentation is
an important method in literature and software related to consumer relationship
management (CRM). Ozan and L. O. Iheme conducted a study in 2019 titled
"Artificial Neural Networks in Customer Segmentation.” They utilized one of the
most advanced machine learning algorithms available, termed MLP (Multilayer
Perceptron), which is a feed-forward neural network structure optimized by using
backpropagation. The findings demonstrate that even with a small amount of
training data, it is possible to construct a generalizing model capable of reproducing
the understanding that supports its customer classification methodology. When this
model is integrated into their workflow, it can check clients regularly, automatically
determining whether or not to promote a customer and notifying supervisors. [38]
With the help of an accurate clustering procedure, they can effectively identify
consumers and establish their attributes, according to the practical approach in
22
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Customers Segmentation by Using the K-Means Algorithm, a thesis by E. Y. L
Nandapala and K. P. N Jayasena 2020. They assert that with the help of an accurate
clustering procedure, they can effectively identify consumers and establish their
attributes. Businesses will be better able to make accurate decisions, supply new
products and services, and adjust existing products and services in response to
consumer demand due to a successful clustering process since they will identify
consumers accurately. When applied to a dataset in any industry, the method given
in this study may serve as an accurate segmentation methodology. [39]
According to the findings of a study conducted by Farid et al. in 2017,
consumer segmentation is a marketing strategy that involves first dividing
customers into groups based on their underlying characteristics, needs, and interests,
and then designing and implementing strategies to target those groups. A prominent
form of segmentation approach is behavioral segmentation analysis, in which
consumers are categorized according to particular behavioral traits such as decisionmaking, spending, and usage. This study conducted a behavioral segmentation
analysis using real e-commerce transaction records from 10,000 online customers
and discovered five consumer segments, including opportunist customers, transient
customers, need-based shoppers, skeptical newcomers, and repeat purchasers.
Opportunist take advantage of opportunities as soon as they come up, never paying
23
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
full price for a product. They always try to find a deal on what they want to buy.
These people never buy anything that isn't on sale. Transient consumers are
customers who are just passing through a dealer's area of sales responsibility for a
short period. Need-based shoppers only purchase goods when they have an
immediate need for them. When they walk into a store, they already have the section
in mind that they want to go to. They typically do not need the assistance of a
salesperson to select a product because they are typically well-informed regarding
the item that they intend to buy. Skeptical newcomers are consumers who don't need
anything specific and are attracted by the store's atmosphere. These consumers
usually like to talk to people and ask you questions about random products, but they
don't want to buy them. The most common example of this kind of consumer is a
group of college students who go to malls to kill time. They go into any store and
ask about random items there. Repetitive consumers repeatedly buy from a company
and are extremely valuable to the company. It is important to meet the expectations
of this particular consumer. They not only remain dedicated to the brand, but they
also praise and recommend it to their circle of friends and relatives. Detailed
discussions were held on the behavioral features of each segment, and
recommendations were made about how to approach each segment to increase their
online buying rates. Identifying the behavioral features of each segment will enable
24
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
the selling organizations to build marketing tactics that are appropriate for each
segment's needs. [40] The findings of a behavioral segmentation study conducted
by Yu. Liu et al. 2015, which was based on real e-commerce transaction records of
Chinese online consumers, revealed the existence of six different types of online
consumers: economic purchasers, active-star purchasers, direct purchasers, highloyalty purchasers, risk-averse purchasers, and credibility-first purchasers. [41]
Psychographic and behavioral segmentation analysis conducted on Generation Y
female online shoppers by Ladhari et al. 2019, identified four different four
approaches to online shopping: trend shopping, pleasure shopping, price shopping,
and brand shopping. Six shopping profiles have also been identified, each with
different objectives: price shoppers, discovery shoppers, emotional shoppers,
strategic shoppers, fashionistas, and shopping fans. [42] Apart from that, Huseynov
and Yldrm 2017, conducted a behavioral segmentation analysis using real ecommerce transaction data from Turkish online consumers and discovered five
different types of consumer segments, including opportunist consumers, transient
clients, need-based shoppers, skeptical newcomers to the market, and repeat
purchasers. It was demonstrated in both researches that each identified online
consumer segment had distinct behavioral traits that distinguished it from the other
segments. [43] One of the study hypotheses, "Online Consumer Typologies and
25
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Their Shopping Behaviors in B2C E-Commerce Platforms," which was conducted
as a result of the psychographic and behavioral research, is that a large online
consumer audience does not represent a single market segment. It is more
appropriately described as a collection of several consumer segments in the
following way: each segment's members engage in distinct online purchasing habits
and behave distinctly in response to marketing efforts. Although the research
findings revealed that certain characteristics of the broad online consumer audience
were shared in terms of their perception of e-commerce, the findings also revealed
several distinct groups of consumers who were significantly different from one
another in a various of ways. Consequently, the marketing mix of e-retailers should
differ based on whether or not the majority of their customers are online. Instead of
employing a single marketing strategy that applies to all consumers, marketers
should customize their products and services to meet the individual demands of each
consumer category they serve. [44]
The research "An Improved K-means Clustering Algorithm," conducted by
H. Xu et al. 2016, presents how to improve the K-means clustering method by
removing the impact of noise while simultaneously enhancing the selection of
starting points. Gridding optimization also eliminates the computing complexity
associated with the peak density approach, and the need for an excessive number of
26
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
human evaluations and the resulting errors that can result from this. By including
the idea of granularity, the edge of the dense zone is removed without gridding, and
the precision of cluster initialization center points is raised in the absence of
gridding. [45] A technique known as K-means clustering can group data similar to
one another when a data collection has been divided into multiple clusters. It is an
unsupervised approach that may be used to generate several clusters of data from a
single data set of information. In terms of sum squared errors and the number of
successfully categorized instances, the genetic K-means clustering exceeds the
regular K-means clustering. By examining various large-scale data sets with high
dimensionality, it is possible to determine the performance of different clustering
algorithm and the performance of different machine learning algorithms for
different distance metric combinations. [46] Researchers S. Na et. al 2016,
published "Research on k-means Clustering Algorithm: An Improved k-means
Clustering Algorithm" explaining the k-means approach and exploring the
drawbacks of the traditional k-means clustering technique. The conventional kmeans clustering method has low efficiency because of the high computational
complexity that results from the necessity to reassign data points several times
during each iteration. This is due to the high computational complexity that results
from the necessity to reassign data points several times during each iteration. Using
27
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
the approach proposed in this study, it is guaranteed that the entire clustering
procedure will be finished in O(nk) time without sacrificing cluster accuracy. The
results of the experiments indicate that the improved strategy can reduce the time it
takes for the k-means algorithm to run. As a result, the k-means technique that has
been proposed is viable. [47] In k-means iterations, each point needs to be examined
if it is closer to its center than any other center. Hence, each point has a larger
searching space. For example, if there are k clusters, each point needs to calculate
and compare distance (k−1) times in each iteration. J. Qi et al. 2016, propose a novel
optimized hierarchical clustering method incorporated with three optimization
principles. He says that the use of K*initial centers significantly increases the
probability of attaining the finest local optima, and multi-round top-n nearest
clusters merging approaches the optimal result in a more progressive manner than
before. As an alternative to re-calculating feature values from scratch, the top-n and
update principle optimizations update feature values of clusters by preceding
clusters or relocated items. Furthermore, the pruning method minimizes the adjusted
searching space for each point in the k-means iteration by a significant amount. [48]
In P. Divya and K. Anusudha’s "Segmentation of Defected Regions in Leaves using
k- Means and OTSU's Method (2018), they concluded that k-means and Otsu's
methods could be used for segmentation of defected regions in leaf images. The
28
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
defective regions are segmented in the k-means clustering approach by splitting
them into many clusters, each of which comprises a group of pixels, whereas in
Otsu's method, the defected picture is divided based on the automated thresholding
process each time a new threshold is determined. To generate the result, both
proposed approaches are iterative. According to the results, the suggested technique
offers more details about the picture’s defective regions and delivers higher PSNR
values than the existing methods. [49] This paper, namely “Image Segmentation
Algorithm Based on Particle Swarm Optimization with the k-means Optimization”
by X, achieves accurate and efficient image segmentation. Chen et. al (2019)
propose a Particle swarm optimization (PSO) algorithm and k-means aggregation
hybrid image segmentation algorithm, which aims to solve the problem of selecting
the initial center of k-means clustering and improve the disadvantages of easily
falling into local optics. The local optimization ability of the k-means clustering
algorithm is combined to improve the accuracy of image segmentation. The
optimization algorithm retains the advantages of the fast converge speed of the kmeans clustering algorithm and overcomes the disadvantages of the particle swarm
optimization algorithm, which is prone to fall into limited optics. Experimental
results demonstrate that the combination of the PSO and k-means algorithms have
better stability. [50]
29
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Synthesis
To better understand the issue, the researchers looked at similar articles. To
build and carry out the current work, it was necessary to research e-commerce,
consumer purchasing behavior, consumer segmentation, behavioral segmentation,
and the k-means algorithm. Researchers found that there are many types of
consumer segmentation, including behavioral segmentation,
demographic
segmentation, firmographic segmentation, psychographic segmentation, and needbased segmentation that can help companies such as Shopee split consumers into
groups based on their similar attributes or affinities. When it comes to consumer
behavior segmentation, it is essential to understand how and when they decide to
purchase a product or service. It necessitates a marketer's attention to consumer
behavior, such as an existing consumer's purchasing activity or the behavior patterns
of a target audience, to change a brand's marketing message, boost brand loyalty,
and secures client retention. Companies employ the divide-and-conquer strategy to
separate and conquer markets, which serves as the basis for consumer segmentation.
Marketers can gain a competitive advantage over their competitors by effectively
using segmentation. Because of market segmentation, marketers can focus on
managing consumer relationships, which was previously impossible with traditional
mass marketing methods.
30
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Dividing data points into equal segments, the k-means algorithm simplifies
them into non-overlapping groupings. One of the most typical k-means clustering
used to segment consumers; it allows businesses to understand more about their
consumers, which can then be utilized to improve sales. The purpose of the k-means
clustering algorithm is to divide data into k clusters so that data points in the same
set are similar and data points in other groups are further apart. K-means clustering
is a data division method that divides data into k clusters. This study's purpose is to
apply the k-means algorithm using python in segmenting consumers, efficiently use
the k-means algorithm, and identify the primary behavior of an online consumer
when it comes to online shopping.
Conceptual Framework
Given that this conceptual framework is divided into three parts: input,
process, and output; this study can focus its attention on a specific aspect of the
topic.
31
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 1. Conceptual Framework
Figure 1 shows the conceptual framework of our study. The input contains
the data of every Shopee consumer. The collection of the data input by each
consumer will be acquired using Google Form, and each collected data will be
grouped and segmented using the k-means algorithm. The output will be able to
classify various consumer behaviors, and demonstrate whether the k-means
algorithm is efficient in consumer segmentation.
Theoretical Framework
Marketers use behavioral segmentation to target their customers based on
their actual purchasing habits rather than their demographics. Categorizing
consumers into groups divides the market into groups based on their knowledge of,
attitude toward, use of, or reaction to a particular product. American economist
32
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Culbertson developed a market segmentation theory in 1957, which explains that
long-term and short-term interest rates are not related to one another since different
investors finance them. He asserted that market segmentation theory further says
that the buyers and sellers who make up the market for short-term securities have
distinct characteristics and motivations from the buyers and sellers who make up
the market for intermediate and long-term maturity securities. [51] The idea is based
partly on the investment patterns of various institutional investors, such as banks
and insurance companies, and it is meant to be applied broadly in practice.
According to market segmentation theory, there is no relationship between
bond markets with different maturity lengths. Those interest rates impact the supply
and demand for adhesives and other financial products. When it comes to investing
in fixed-income securities, the theory believes that investors and borrowers have
preferences for specific yields. Individual smaller markets are created due to these
preferences, each subject to the supply and demand pressures particular to that
market. When looking at fixed-income for assets with the same credit value as one
another, the theory attempts to explain the form of the yield curve by stating that
bonds of different maturities are not interchangeable with one another. Because of
this, the yield curve is influenced by the forces of supply and demand at each
maturity length.
33
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CHAPTER III
DESIGN AND METHODOLOGY
This chapter discusses how the algorithm was used, the data gathering
process, and the significance of the algorithm's implementation to the study's
outcome.
Project Concept
Behavioral segmentation is one of the most significant unsupervised learning
uses in consumer segmentation. Using clustering algorithms, businesses may
discover several consumer categories, allowing them to target the prospective user
base. The researchers used k-means clustering in this study since it is the most
important approach for grouping unlabeled datasets.
Companies that use consumer segmentation believe that each client has
unique needs that must be addressed with a tailored marketing strategy. They want
to understand better the client they're trying to reach. Their goal must be particular
and designed to meet the needs of every individual consumer. Consumer may better
understand consumer preferences and find valuable categories by collecting data.
This helps them build marketing campaigns efficiently while limiting financial risk.
34
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Behavioral segmentation analysis was performed using data derived from
online purchasing transactions made by Shopee customers. With 54.6 million
monthly web visits, Shopee is one of the most popular online shopping websites in
the Philippines. Shopee members have access to a wide range of many selected
brands in a variety of categories, including clothing, accessories, cosmetics, home
décor, and lifestyle.
Analysis and Design
Figures and tables were used to demonstrate the research study's
methodology. Segmenting consumers using the k-means algorithm allows the
classification of similar consumers into the same segment. This study can help the
sellers better understand consumers in terms of both static and dynamic behaviors.
35
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 2. Methodology of the Study
Figure 2 shows the methodology of the study. It is the process that the
researchers needed to implement. The first step is to import the required packages
and then input the data that's been collected. Then, the 4,000-row dataset is
preprocessed to check for missing values, noisy data, and other irregularities before
algorithm implementation. The silhouette analysis is then used to get the ideal value
of k, in our instance 4 clusters. Next is to implement the k-means algorithm to
display the model and then get the model’s accuracy, precision and recall using the
evaluation matrix. The behaviors were then classified using the k-means algorithm,
obtaining four distinct behaviors (Opportunist, Skeptical, Transient, and
36
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Repetitive). Lastly, the number of demographic profiles associated with each
behavior, such as age and gender, must be determined.
Required Packages
Clustering refers to grouping of related elements into similar groupings or
clusters. It can be defined as grouping data into groups with similar data points or
elements. The k-means algorithm uses the clustering approach to group identical
data points in one group, with all data points in that group sharing common
properties but being unique from data points in other groups. In the algorithm
process every task must begin with importing the necessary packages in the
appropriate environment (python in our case). Pandas is used to work on the data,
NumPy is used to work with arrays, matplotlib and seaborn are used for
visualization, mplot3d is used for three-dimensional visualization, and scikit-learn
is used to develop the k-means model.
37
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Model Requirements
The researchers gathered ideas, data, program needs, and potential
difficulties. Specifically, the researchers specified all of the system's software,
functional, and nonfunctional requirements in detail.
In Table 1, you will find a list of the suggested software requirements to
ensure that the model functions accurately. Python was used to implement the kmeans algorithm in the model, and the Orange application was utilized to visualize
the model and other graphs.
Table 1. Software Requirements for Development
Software Requirements
Specifications
Operating System
Windows 7 or higher
Programming Language
Python
Tools
Jupyter Notebook & Orange App
38
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Functional Requirements
The model's characteristics are defined by the functional requirements. This
also demonstrates how and where the model may be applied.
1. The model will be useful in developing recommendations that can predict
which items or features each consumer would be interested in next.
2. The model will be useful in helping businesses understand the distinct groups
of people that make up their market.
3. The model can generate behavioral segmentation.
Non-Functional Requirements
These are features or standards that the model must meet in order to be
considered functional. To achieve excellent segmentation performance, the
accuracy, precision, and recall of the model are calculated.
Accuracy. The evaluation needs to show that the model is accurate between 50 and
90 percent of the time.
Precision. After going through the evaluation process, the model should generate
reliable findings.
39
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Recall. For the model to be useful, it needs to have the capacity to capture results
with precise segmentation.
Questionnaire Design
In terms of data gathering, the researcher utilized Google Forms to perform
social media surveys as their major strategy. The survey questions focused on the
demographic profile and behavioral factors of Shopee consumers. Questionnaire is
carefully designed to meet the requirements of the research and the questions are
taken from previous literature on Behavioral Segmentation Analysis. Based on that
study, Huseynov and Yldrm 2017, discovered five different types of consumer
segments, including opportunist consumers, transient consumers, need-based
shoppers, skeptical consumers, and repetitive consumers. The researchers’ tool for
computing the responses is a Likert Scale. The Likert Scale commonly used for
questionnaires and is mostly used in survey research. After conducting the survey,
the researcher computed the answer using standard deviation. The questionnaire
consists of two main parts, the first part is mainly focused on the demographic
profile of Shopee consumers. Second part will cover the questions pertaining to the
behavioral factors of Shopee consumers.
40
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Part I. Demographics
The first section of the questionnaire focused on demographic information.
This portion of the survey contained questions relating to the age and gender of
Shopee consumers.
Part II. Behavioral Factors
This is the section that covered the behavioral factors, and these factors are
Shopping Rate, Price Payment, and Product Diversity. Shopping Rate, as one of the
factors, includes five questions. As mentioned above, the questions were selected
from previous literature, and some were self-structured. There were also five
questions on Price Payment, and lastly, the Product Diversity has ten questions, and,
in that part, it included a comment box that was left for the respondents to fill as if
they felt that there were some others.
41
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Data Analysis
Figure 3. Dataset for Clustering
The datasets that were utilized for the study are shown in Figure 3. From
looking at the figure, it is clear that it contains 4,000 rows and 5 columns. Included
in the demographic profile are columns for age and gender. There are columns for
shopping rate, price payment, and product diversity for the behavioral factors.
42
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 4. Scaled Data
Figure 4 shows the scaled data. The researchers utilized the standard scaler
method to enable a smooth flow of gradient descent and assist algorithms in
reaching the minimum of the cost function as rapidly as possible.
43
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Segmentation Process
Figure 5. Distribution of Age
Figure 5 shows the distribution of the consumer's age. This is implemented
by using Python to encode the data. Out of 4,000 Shopee consumers, age 19 got the
highest response consisting of 723 while the lowest was age 60 who got only 1
response. Ages of 18 and 25 composed around 61.73%.
Behavioral factors are used in the segmentation process. These factors are
shopping rate, price payment and product diversity. Each behavioral segmentation
analysis had a different dimension.
44
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 6. Distribution of Shopping Rate
The total number of products purchased by the consumer is represented by
the shopping rate data in Figure 6. The shopping rate was done by getting the zscores of the consumer’s data. The researchers made a histogram plot to visualize
the number of consumers according to their shopping rate. The majority of the
consumers have a shopping rate of 1-10.
45
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 7. Distribution of Price Payment
In figure 7, Price payment data is the average cost of a consumer's internet
purchases. Price payment was done by getting the z-scores of the consumer’s data.
The researchers made a histogram plot to visualize the number of customers
according to their price payment. The majority of the consumers have a price
payment in the range of 11-15.
46
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 8. Distribution of Product Diversity
In figure 8, Product diversity data refers to how many different types of
products a consumer purchase. Product diversity was done by getting the z-scores
of the consumer’s data. The researchers made a histogram plot to visualize the
number of consumers according to their product diversity. The majority of the
customers have a product diversity in the range of 1-10.
47
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 9. Silhouette Analysis
The above plots were subjected to a Silhouette analysis to determine the best
value for n clusters. The x-axis represents the data, and the y-axis for the silhouette
score. For the given data, the values of n clusters 2 and 3 appear to be suboptimal
for the following reason: Presence of clusters with below-average silhouette scores.
For n clusters, the values 4 and 5 appear to be the best. Each cluster has a higher
silhouette score than the average. Furthermore, the size fluctuation is comparable.
A decisive factor is the thickness of the silhouette plot that represents each cluster.
The thickness is more uniform in the plot with n cluster 4 (bottom left) than in the
plot with n cluster 5 (bottom right), where one cluster thickness is significantly
greater than the other. As a result, 4 clusters are the best choice.
48
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 10. Clustered Data
Using the k-means algorithm, the researchers were able to cluster the data
shown in Figure 10. The four classes are represented by the behavioral cluster such
as Opportunist, Transient, Skeptical and Repetitive. The 0 is the Transient
consumer, 1 for Skeptical consumer, 2 for Opportunist consumer and 3 for
Repetitive consumers.
49
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
K-means Clustering Algorithm
Figure 11. Clustered Behavioral Factors
According to related studies about Behavioral Segmentation, a consumer is
Transient if product diversity is greater than price payment and price payment is
greater than shopping rate, Skeptical if product diversity is greater than shopping
rate and shopping rate is greater than price payment, Opportunist if price payment
is greater than shopping rate and shopping rate is greater than product diversity, and
50
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Repetitive if shopping rate is greater than price payment and price payment is
greater than product diversity.
Figure 12. Behavioral Characteristics Model
Figure 12 shows the behavioral characteristics fed into the k-means cluster
visualization tool part of the orange application. C1 (Blue) represents the Transient
consumer, C2 (Red) represents the Skeptical consumer, C3 (Green) represents the
Opportunist consumer, and C4 (Orange) represents the Repetitive consumer.
51
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Model Evaluation
Using the following formulas, the researchers were able to calculate the
accuracy, precision, and recall of the model using the confusion matrix.
π΄π‘π‘π‘’π‘Ÿπ‘Žπ‘π‘¦ =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
π‘ƒπ‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› =
π‘…π‘’π‘π‘Žπ‘™π‘™ =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Formula 1. Formula for Evaluation
Formula 1 is used to assess the performance of the k-means algorithm. In
order to solve the problem, the variables TP means TruePositive, FP means
FalsePositive, TN means TrueNegative, and FN means FalseNegative are
necessary. This formula will be used to examine the k-means algorithm and
determine the efficacy of the study.
52
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CHAPTER IV
RESULTS AND DISCUSSION
This chapter contains a detailed presentation and discussion of the results
of behavioral segmentation by implementing the k-means algorithm.
Figure 13. Confusion Matrix
Figure 13 shows the confusion matrix, with the x-axis representing data
prediction, while the y-axis represents the actual data. True positive (upper left),
true negative (lower right), false positive (upper right), and false negative (lower
left) are the classifications used to determine the algorithm's accuracy, precision,
and recall.
53
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Computation for the Accuracy, Precision and Recall
In this situation, the value of TP is 972, the value of TN is 3028, the value
of FP is 0, and the value of FN is also 0, resulting in the following computation of
the percentage of accuracy, precision, and recall as shown below:
Based on the confusion matrix, the segmentation achieved an accuracy of
100 percent, a precision of 100 percent, and a recall of 100 percent. As a result, we
can conclude that the k-means algorithm is accurate and precise in its calculations.
Table 2. Confusion Matrix Result
Accuracy
100 %
Precision
100 %
Recall
100 %
54
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Figure 14. Behavioral Segmentation Analysis
The results of the behavioral segmentation analysis are shown in Figure 14.
The study was conducted with 4,000 Shopee consumers. Among them, 517
respondents are Transient consumers, 506 respondents are Skeptical consumers, 88
respondents are Opportunist consumers, and 2889 respondents are Repetitive
consumers. The overall result shows that 72.22% of the Shopee consumers are
repetitive.
55
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Number of Demographic Profiles in Behaviors
Age and gender are the demographic profiles in this study. Table 3 shows the
age of each behavior, 46 respondents are less than age 18, 2,463 respondents are
ages 18-25, 1,013 respondents are ages 26-35, 295 respondents are ages 36-45, and
63 respondents are ages 55 and above. Overall, 18 to 25-year-olds had the greatest
response rate of 61.57 percent.
Table 3. Age in each Behavior
The number of females in each behavior is shown in Table 4. There are 284
transient females, 386 skeptical females, 54 opportunist females, and 1377 repetitive
females. Overall, the repetitive had the greatest response rate of 65.54%.
56
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Table 4. Females in each Behavior
The number of males in each behavior is shown in Table 5. There are 233
transient males, 120 skeptical males, 34 opportunist males, and 1512 repetitive
males. Overall, the repetitive had the greatest response rate of 79.62%.
Table 5. Males in each Behavior
57
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CHAPTER V
SUMMARY, CONCLUSIONS AND RECOMMENDATIONS
In this final chapter, the researchers discuss the entire generalization,
conclusion, and advice they have provided.
Summary
In this study, behavioral segmentation analysis was carried out on 4,000
Shopee consumers using k-means algorithms, which were applied to the data. Many
behavioral factors were extracted from the dataset and used to identify groups of
consumers who had similar traits to one another. In the segmentation phase, the kmeans algorithm was applied. Online consumers do not all behave in the same way,
according to data segmentation. The online consumer, on the other hand, is a
collection of unique consumer groups, each with its own set of online shopping
habits. The results of this study can be used by online sellers to improve their online
sales rates by developing more successful marketing strategies for each specific
segment.
58
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Conclusion
Based on the result of the study, the following conclusion were drawn:
1. We were able to define four behavioral segments by gathering data from
Shopee consumers based on their shopping rate, price payment, and product
diversity.
2. Through the use of k-means clustering algorithm, we were able to identify
four distinct Shopee consumer behavioral segments. Each segment has its
own set of characteristics such as Transient consumer, Skeptical consumer,
Opportunist consumer and Repetitive consumers are among the clusters
covered. The identified consumer segment was found to have distinct
features that set it apart from the others.
Transient consumers are coupon-prone consumers who make extensive use
of promotional offers. Attractive promotions and discount coupons can
enhance the frequency with which these consumers shop online. They are
easily replaced by competitors who offer incentives equivalent to or greater
than theirs. Skeptical consumers are composed of consumers who are
relatively new to the online store and are hesitant to make a purchase. This
consumer chooses various options at extremely low prices when shopping
for things. Coupon redemption rates for skeptical consumers are extremely
59
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
low, similar to those among need-based consumers. As a result, discount
coupon rewards may have no major impact on their online spending
rates. Opportunist consumers regularly visit and shop online stores. They
like to use discount coupons and free shipping deals when buying things
online. The availability of free shipping and discount coupons can encourage
opportunist consumers to shop online more often. The high prevalence of
product refunds among opportunist consumers indicates weak decisionmaking abilities while making online product selections. Repetitive
consumers are primarily long-term consumers who visit the online store and
spend significantly more money there than in any other area. This customer
segment is extremely loyal to online retailers. In online e-commerce,
consumer loyalty refers to a favorable attitude toward an online store that
leads to repeated purchasing behavior.
3. The k-means algorithm can determine consumer segments have similarities
in different aspects. In all segments, the sizes of female consumers are
significantly higher than male consumers and based on the number of
demographic profiles per behavior, the age group between 18 and 25 years
old had the highest repetitive behavior response rate, at 61.57 percent. In
60
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
contrast, the male had the highest repetitive behavior response rate, at 79.62
percent.
4. The segmentation achieved an accuracy of 100 percent, a precision of 100
percent, and a recall of 100 percent. We can conclude that the k-means
algorithm is accurate and precise in its calculations.
Recommendations
Based on the conclusion, the following are the recommendations.
1. Future researchers should collect more behavioral factors, such as coupon
redemption and refund rate.
2. Since we used the K-means algorithm, future researchers should utilize
different algorithms and behavioral features than those used in this study to
conduct behavioral segmentation analysis.
3. Future researchers should examine the changes in segment types and features
based on the e-commerce platform used, and the numerous behavioral factors
that influence them.
4. The collection of a large amount of data is preferable because it assists in the
assessment of the model.
61
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
REFERENCES
[1] Media One. 2021. “What is Shopee?” Retrieved October 28, 2021 from
https://mediaonemarketing.com.sg/shopee-review-expanding-ecommerce/
[2] Rahman et al. 2018. "Consumer buying behavior towards online shopping: An
empirical study on Dhaka city, Bangladesh" Retrieved October 28, 2021 from
econstor.eu/bitstream/10419/206108/1/23311975.2018.1514940.pdf
[3] Britannica. "Algorithm and Complexity" Retrieved October 28, 2021 from
https://www.britannica.com/science/computer-science/Algorithms-and-complexity
[4] Investopedia. Business-to-Consumer (B2C). Retrieved October 28, 2021 from
https://www.investopedia.com/terms/b/btoc.asp
[5] Nikki Jones. "What is Behavioral Segmentation?" Retrieved October 28, 2021
from
https://www.yieldify.com/blog/behavioral-segmentation-definition-
examples/
[6] Will Kelton. 2021. Business-to-Consumer (B2C). Retrieved October 28, 2021
from https://www.investopedia.com/terms/b/btoc.asp
[7] Sauravkaushik8 Kaushik. 2016. “An Introduction to Clustering and different
methods of clustering” Retrieved October 28, 2021 from
62
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-anddifferent-methods-of-clustering/
[8] Srilekha Mule. 2021. K-means Clustering. Retrieved October 28, 2021 from
https://www.linkedin.com/pulse/k-means-clustering-srilekha-mule
[9] Python. What is Python? Executive Summary. Retrieved October 28, 2021 from
https://www.python.org/doc/essays/blurb/
[10] Shopee. 2021. "Shopee Careers" Retrieved October 28, 2021 from
https://careers.shopee.ph/about
[11] UNCTAD. 2017. Unctad B2C E-commerce index 2017, 30. Retrieved October
28, 2021
from
http://unctad.org/en/PublicationsLibrary/tn_unctad_ict4d09_en.pdf%0Ahttp://unct
ad.org/en/PublicationsLibrary/tn_unctad_ict4d07_en.pdf [Google Scholar]
[12] Vicente, J. 2016. Special report: The state of e-commerce in the Philippines.
Retrieved October 26, 2021 from http://www.imadigitalmarketer.com/blog/stateecommerceph
[13] Segovia, O. W. 2016, October 31. Unfinished business: Why e-commerce in
the Philippines is
falling behind. Retrieved
October
28,
2021 from
https://medium.com/startupph/chronicles/unfinished-business-why-ecommerce-inthe-philippines-is-falling-behind-bc6087796bc3
63
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
[14] Babenko, V., Pasmor, M., Pankova, Ju., Sidorov, M.: The place and
perspectives of Ukraine in international integration space. Pr. and Persp. in Man.
15(1), 80–92 (2017). doi:10.21511/ppm.15(1).2017.08
[15] Statista Research Department. Topic: E-commerce in the Philippines.
Retrieved October 28, 2021, from https://www.statista.com/topics/6539/ecommerce-in-the-philippines/#dossierKeyfigures
[16] Vrender. (2016). “Importance of online shopping.” Retrieved November 5,
2021,
from
http://www.sooperarticles.com/shopping-articles/clothing-
articles/importance-online-shopping-1495828.html [Google Scholar]
[17] Alexandra Panaitescu and Valentin Radu. 2021. Consumer behavior in
marketing - patterns, types, segmentation - Omniconvert . (November 2021).
Retrieved December 5, 2021 from https://www.omniconvert.com/blog/consumerbehavior-in-marketing-patterns-types-segmentation/
[18] Urueña, A., & Hidalgo, A. 2016. Successful loyalty in e-complaints: FsQCA
and structural equation modeling analyses. Journal of Business Research, 69(4),
1384–1389.
[19] Chou, S., & Hsu, C. 2016. Understanding online repurchase intention: Social
exchange theory and shopping habit. Information Systems and eBusiness
Management, 14(1), 19–45. doi:10.1007/s10257-015-0272-9
64
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
[20] Chechen, L., Pui-Lai, T., Yun-Chi, W., Palvia, P., & Kakhki, M. D. (2016).
The impact of presentation mode and product type on online impulse buying
decisions. Journal of Electronic Commerce Research, 17(2), 153–168
[21] Gary DeAsi. 2018. 10 Powerful Behavioral Segmentation Methods to
Understand Your Customers. Pointillist (2018). Retrieved November 10, 2021 from
https://www.pointillist.com/blog/behavioral-segmentation/
[22] Debaditya Barman and Nirmalya Chowdhury. 2019. A novel approach for the
customer segmentation using clustering through self-organizing maps. International
Journal
of
Business
Analytics
6,
2
(2019),
23–45.
DOI:
http://dx.doi.org/10.4018/ijban.2019040102
[23] Sara Dolnicar, Bettina Grün , and Friedrich Leisch. 2018. Market Segmentation
Analysis: Understanding It, Doing It, and Making It Useful (2018). Retrieved
November 8, 2021 from https://books.google.com.ph/books?id=b1lDwAAQBAJ&printsec=frontcover&dq=customer%2Bsegmentation%2B2017&
hl=en&sa=X&ved=2ahUKEwjTuZeq9Yj0AhUTw4sBHchkDywQ6AF6BAgIEAI
#v=onepage&q=customer%20segmentation%202017&f=false
[24] Srivastava, 2016 R. Srivastava. Identification of customer clusters using RFM
model: a case of diverse purchaser classification, Int. J. Bus. Anal. Intell., 4 (2)
(2016), pp. 45-50
65
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
[25] Nikki Jones. 2017 Behavioral Segmentation Defined with 4 Real-Life
Examples.
Yieldify
(2017).
Retrieved
November
9,
2021
from
https://www.yieldify.com/blog/behavioral-segmentation-definition-examples/
[26] Kotler, P. & Keller, K.L. (2016). Marketing Management, 15th Edition,
Pearson Education,Inc.
[27] Fahed Yoseph, Nurul Hashimah Ahamed Hassain Malim, and Mohammad
AlMalaily. 2019. New behavioral segmentation methods to understand consumers
in the retail industry. (February 2019). Retrieved November 9, 2021 from
http://ischolar.info/index.php/IJCSIT/article/view/186146
[28] K. Bhade, V. Gulalkari, N. Harwani and S. N. Dhage, "A Systematic Approach
to Customer Segmentation and Buyer Targeting for Profit Maximization", 2018 9th
International Conference on Computing Communication and Networking
Technologies
(ICCCNT),
pp.
1-6,
2018,
[online]
Available:
https://ieeexplore.ieee.org/document/8494019.
[29] Mohammed Muzammil. July 27, 2021. "Understanding K – Means Clustering
WIth Customer Segmentation Use Case. Retrieved Nov 9, 2021 from
https://www.analyticsvidhya.com/blog/2021/07/understanding-k-meansclustering-using-customer-segmentation/
66
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
[30] Abhinav Sagar 2019. Customer Segmentation Using K Means Clustering
Retrieved
Nov
9,
2021
from
https://towardsdatascience.com/customer-
segmentation-using-k-means-clustering-d33964f238c3
[31] Andrea Trevino. 2016. Introduction to K-means Clustering. Retrieved
November
9,
2021
from
https://blogs.oracle.com/ai-and-
datascience/post/introduction-to-k-means-clustering
[32] S. H. Shihab, S. Afroge and S. Z. Mishu, "RFM Based Market Segmentation
Approach Using Advanced K-means and Agglomerative Clustering: A
Comparative Study," 2019 International Conference on Electrical, Computer and
Communication
Engineering
(ECCE),
2019,
pp.
1-4,
doi:
10.1109/ECACE.2019.8679376.
[33] Blut, M., Frennea., C. M., Mittal, V., Mothersbaugh, D. L. 2016. How
procedural, financial and relational switching costs affect customer satisfaction,
repurchase intentions, and repurchase behavior: A meta-analysis. International
Journal of Research in Marketing, 32(2), 226-229.
[34] Liro Panula. 2017. Building Trust in e-commerce - Theseus. (2017). Retrieved
November
6,
2021,
from
https://www.theseus.fi/bitstream/handle/10024/133517/Thesis_Iiro_Panula3.pdf?s
equence=1&isAllowed=y
67
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
[35] Vitalina Babenko, ZdzisΕ‚aw Kulczyk, Irina Perevosova, Olga Syniavska, and
Oksana Davydova. 2019. Factors of the development of international e-commerce
under the conditions of globalization. Retrieved November 8, 2021 from
https://www.shsconferences.org/articles/shsconf/abs/2019/06/shsconf_m3e22019_04016/shsconf_
m3e22019_04016.html
[36]
C+R
Researchers.
SEGMENTATION.
2017.
Retrieved
TARGET
November
MARKET
9,
(CONSUMER)
2021
from
https://www.crresearch.com/methods-quantitative-market-research-segmentation
[37] Nilsson & Olsson, Customer Focus through Market Segmentation - The Case
of Volvo CE and the Recycling/Waste Management Segment 2016).
[38] Ş. Ozan and L. O. Iheme, "Artificial Neural Networks in Customer
Segmentation," 2019 27th Signal Processing and Communications Applications
Conference (SIU), 2019, pp. 1-4, doi: 10.1109/SIU.2019.8806558.
[39] E. Y. L. Nandapala and K. P. N. Jayasena, "The practical approach in
Customers segmentation by using the K-Means Algorithm," 2020 IEEE 15th
International Conference on Industrial and Information Systems (ICIIS), 2020, pp.
344-349, doi: 10.1109/ICIIS51140.2020.9342639.
68
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
[40] Farid Huseynov & Sevgi Özkan Yıldırım, 2017. "Behavioural
segmentation analysis of online consumer audience in Turkey by using real ecommerce transaction data," International Journal of Economics and Business
Research,
Inderscience
Enterprises
Ltd,
vol.
14(1),
pages
12-28.
https://ideas.repec.org/a/ids/ijecbr/v14y2017i1p12-28.html
[41] Liu, Y., Li, H., Peng, G., Lv, B., Zhang, C. (2015). Online purchaser
segmentation and promotion strategy selection: 233, 263-279.
[42] Ladhari, R., Gonthier, J., Lajante, M. 2019. Generation Y and online fashion
shopping: Orientations and profiles. Journal of Retailing and Consumer Services,
48,
113-121.
Retrieved
November
20,
2021
from
https://ideas.repec.org/a/eee/joreco/v48y2019icp113-121.html
[43] Huseynov, F., YΔ±ldΔ±rΔ±m, S. O. 2017. Behavioural segmentation analysis of
online consumer audience in Turkey by using real e-commerce transaction data.
International Journal of Economics and Business Research, 14, 12-28.
[44] Huseynov, Farid, and Sevgi Özkan YΔ±ldΔ±rΔ±m. “Online Consumer Typologies
and Their Shopping Behaviors in B2C E-Commerce Platforms.” SAGE Open, Apr.
2019, doi:10.1177/2158244019854639.
[45] H. Xu, S. Yao, Q. Li and Z. Ye, "An Improved K-means Clustering Algorithm,"
(IDAACS-SWS), 2020, pp. 1-5, doi: 10.1109/IDAACS-SWS50031.2020.9297060.
69
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
[46] S. Kapil, M. Chawla and M. D. Ansari, "On K-means data clustering algorithm
with
genetic
algorithm,"
2016
(PDGC),
2016,
pp.
202-206,
doi:
10.1109/PDGC.2016.7913145.
[47] S. Na, L. Xumin and G. Yong, "Research on k-means Clustering Algorithm:
An Improved k-means Clustering Algorithm," 2016 Third International Symposium
on Intelligent Information Technology and Security Informatics, 2016, pp. 63-67,
doi: 10.1109/IITSI.2010.74.
[48] J. Qi, Y. Yu, L. Wang and J. Liu, "K*-Means: An Effective and Efficient KMeans Clustering Algorithm," 2016 IEEE International Conferences on Big Data
and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom),
Sustainable Computing and Communications (SustainCom) (BDCloud-SocialComSustainCom),
2016,
pp.
242-249,
doi:
10.1109/BDCloud-SocialCom-
SustainCom.2016.46.
[49] P. Divya and K. Anusudha, "Segmentation of Defected Regions in Leaves
using K- Means and OTSU's Method," 2018 4th International Conference on
Electrical
Energy
Systems
(ICEES),
2018,
pp.
111-115,
doi:
10.1109/ICEES.2018.8443282.
[50] X. Chen, P. Miao and Q. Bu, "Image Segmentation Algorithm Based on
Particle Swarm Optimization with K-means Optimization," 2019 IEEE
70
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
International Conference on Power, Intelligent Computing and Systems (ICPICS),
2019, pp. 156-159, doi: 10.1109/ICPICS47731.2019.8942442.
[51] Yi. 2018. Market Segmentation. Retrieved November 26, 2021 from
https://www.sciencedirect.com/topics/economics-econometrics-andfinance/market-segmentation
71
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
APPENDICES
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Appendix 1
The Source Code on how to Apply the K-means Algorithm
#Importing the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
#Reading the csv file
behavior = pd.read_csv("CleanThesisData - ZScores 123456.csv")
behavior
X = behavior
y = behavior['Gender']
genders = X.Gender.value_counts()
sns.set_style("darkgrid")
plt.figure(figsize=(10,4))
sns.barplot(x=genders.index, y=genders.values)
plt.show()
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
# We will use distplot for the distribution of age of the customers,
shopping rate, price payment, and product diversity.
#Distribution of age
plt.figure(figsize=(10, 6))
sns.set(style = 'whitegrid')
sns.distplot(X['Age'])
plt.title('Distribution of Age', fontsize = 20)
plt.xlabel('Range of Age')
plt.ylabel('Count')
#Distribution of ShoppingRate
plt.figure(figsize=(10, 6))
sns.set(style = 'whitegrid')
sns.distplot(X['ShoppingRate'])
plt.title('Distribution of Shopping Rate', fontsize = 20)
plt.xlabel('Range of Shopping Rate')
plt.ylabel('Count')
#Distribution of PricePayment
plt.figure(figsize=(10, 6))
sns.set(style = 'whitegrid')
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
sns.distplot(X['PricePayment'])
plt.title('Distribution of Price Payment', fontsize = 20)
plt.xlabel('Range of Price Payment')
plt.ylabel('Count')
#Distribution of ProductDiversity
plt.figure(figsize=(10, 6))
sns.set(style = 'whitegrid')
sns.distplot(X['ProductDiversity'])
plt.title('Distribution of Product Diversity', fontsize = 20)
plt.xlabel('Range of Product Diversity')
plt.ylabel('Count')
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X['Gender'] = le.fit_transform(X['Gender'])
y = le.transform(y)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
scaler_features = scaler.transform(X)
scaled_data = pd.DataFrame(scaler_features)
scaled_data.rename(columns = {0:'Age',
1:'Gender',2:'ShoppingRate',3:'PricePayment',4:'ProductDiversity'}, inplace =
True)
scaled_data
from sklearn.cluster import KMeans
cs = []
for i in range(1, 10):
kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 200, n_init =
10, random_state = 0)
kmeans.fit(scaled_data)
cs.append(kmeans.inertia_)
plt.plot(range(1, 10), cs)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('CS')
plt.show()
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
# Calculate Silhoutte Score
from sklearn.metrics import silhouette_score
score = silhouette_score(behavior, kmeans.labels_, metric='euclidean')
# Print the score
print('Silhouette Score: %.3f' % score)
#Silhouette Analysis for 2, 3, 4, 5 Clusters
from yellowbrick.cluster import SilhouetteVisualizer
fig, ax = plt.subplots(2, 2, figsize=(15,8))
for i in [2, 3, 4, 5]:
'''
Create KMeans instance for different number of clusters
'''
km = KMeans(n_clusters = i, init = 'k-means++', max_iter = 100, n_init = 10,
random_state = 42)
q, mod = divmod(i, 2)
'''
Create SilhouetteVisualizer instance with KMeans instance
Fit the visualizer
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
'''
visualizer = SilhouetteVisualizer(km, colors='yellowbrick', ax=ax[q-1][mod])
visualizer.fit(X)
# K Means Clustering
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4, random_state=42)
y = kmeans.fit_predict(behavior)
y
labels = kmeans.labels_
# check how many of the samples were correctly labeled
correct_labels = sum(y == labels)
print("Result: %d out of %d samples were correctly labeled." % (correct_labels,
y.size))
kmeans.inertia_
centroids = kmeans.cluster_centers_
centroids
X = pd.DataFrame(X)
behavior['Cluster'] = kmeans.labels_
behavior
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Appendix 2
Source Code of Confusion Matrix
#confusion matrix
from sklearn.metrics import confusion_matrix,classification_report
cm = confusion_matrix(behavior['Cluster'],kmeans.labels_)
print('Confusion Matrix:\n \n',cm,'\n \n Classification_Report: \n
\n',classification_report(behavior['Cluster'],kmeans.labels_))
#confusion matrix visualization
import seaborn as sns
import matplotlib.pyplot as plt
f, ax=plt.subplots(figsize=(5,5))
sns.heatmap(cm,annot=True,linewidths=0.5,cmap =
'pink',linecolor="gray",fmt=".0f",ax=ax)
plt.xlabel("prediction")
plt.ylabel("actual")
plt.show()
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Appendix 3
Steps in Orange Application for Visualization
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Appendix 4
Questionnaire on Behavioral Segmentation Analysis of Online Consumers
The respondents, who are the shopee consumers are requested to answer the
following questions:
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CURRICULUM VITAE
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
LAYLO, FRANK B.
Sitio Maligaya, Banay-Banay 2nd, San Jose,
Batangas Cell No: (+63) 9552297187
Email Address: frank.laylo@g.batstate-u.edu.ph
OBJECTIVES
•
To be able to work at a responsible and challenging entry level that promotes personal
growth and utilizes my education and background.
• Computer Literate MS office- Word, Excel, PowerPoint
• Responsible, Reliable, and can work independently
• Excellent communication and interpersonal skills
Works efficiently and productively
EDUCATIONAL ATTAINMENT
:
:
:
:
Batangas State University
Pinagtongulan National High School
Pinagtongulan National High School
Pinagtongulan Elementary School
August 2018 – Present
June 2016 – May 2018
June 2012 – May 2016
June 2006 – April
TRAININGS AND SEMINARS ATTENDED
•
•
•
•
•
BITS Synergy Conference “Data Science and AI Congres (October 2018)
BITS Synergy Conference “Data Science and AI Congres (March 2019)
BITS Synergy Conference “Industry 4.0 Solution Toward Challenges (October 2019)
DeveOps for Beginners Ground Gurus (October 10,2020)
Introduction to Malware threats (October 22, 2020)
I hereby certify that the above information is true and correct to the best of my knowledge and
belief.
FRANK B. LAYLO
Applicant
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CYMBELLYN ATHENA C. REYES
241 Zamora St. Ibabao Cuenca Batangas
Cell No: (+63) 9450973875
Email Address: cymbellynathena.reyes@g.batstate-u.edu.ph
OBJECTIVES
To enhance my capabilities and ability to comply with the demands in the field of
Computer Science and gain additional knowledge and pointers as to how all the
information learned is applied in real time.
SKILLS AND ABILITIES
• Able to work under pressure
• Capable of accomplishing a task within the given time
• Computer Literate MS Office
• Good communication skills
EDUCATIONAL ATTAINMENT
:
:
:
:
Batangas State University Lipa City
Cuenca Institute
Kalayaan Christian School
Ibabao Elementary School
August 2018 – Present
2015 – 2018
2012 – 2016
2006 – 2012
TRAININGS AND SEMINARS ATTENDED
• BITS Synergy Conference “Data Science and AI Congres (October 2018)
•
•
•
•
BITS Synergy Conference “Data Science and AI Congres (March 2019)
BITS Synergy Conference “Industry 4.0 Solution Toward Challenges (October 2019)
DevOps for Beginners Ground Gurus (October 10,2020)
Introduction to Malware threats (October 22, 2020)
I hereby certify that the above information is true and correct to the best of my knowledge and
belief.
CYMBELLYN ATHENA C. REYES
Applicant
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
ROSALES, GAYLE MARIE L.
Castillo Padre Garcia, Batangas
Cell No: (+63) 9563927044
Email Address: gaylemarie.rosales@g.batstate-u.edu.ph
OBJECTIVES
Seeking a work environment that will challenge me further, promote quality products and service,
and provide me with the opportunity to meet and exceed assigned goals.To attain an internship
that I can maximize my knowledge and skills in the field of Computer Science by utilizing my
skills in programming and enabling further personal and professional development
HIGHLIGHTS OF QUALIFICATION
• Works efficiently and productively
• Manage Multitasking and can work well even under pressure.
• Willing to be trained
• Very eager to learn new things and abilities.
EDUCATIONAL ATTAINMENT
Batangas State University Lipa City, Batangas
Holy Trinity School, Padre Garcia, Batangas
Venancio Trinidad Sr. Memorial School
August 2018 – Present
2015 - 2018
2006-2012
TRAININGS AND SEMINARS ATTENDED
•
•
•
•
•
BITS Synergy Conference “Data Science and AI Congres (October 2018)
BITS Synergy Conference “Data Science and AI Congres (March 2019)
BITS Synergy Conference “Industry 4.0 Solution Toward Challenges (October 2019)
DevOps for Beginners Ground Gurus (October 10,2020)
Introduction to Malware threats (October 22, 2020)
I hereby certify that the above information is true and correct to the best of my knowledge and
belief.
GAYLE MARIE L. ROSALES
Applicant
Republic of the Philippines
BATANGAS STATE UNIVERSITY
The National Engineering University
Lipa Campus
Marawoy, Lipa City
COLLEGE OF INFORMATICS AND COMPUTING SCIENCES
CERTIFICATE OF EDITING OF THESIS/DISSERTATION
This is to certify that this Thesis/Dissertation entitled “BEHAVIORAL
SEGMENTATION ANALYSIS OF ONLINE CONSUMERS IN SHOPEE BY
USING CLUSTERING IN K-MEANS ALGORITHM” of FRANK B. LAYLO,
CYMBELLYN ATHENA C. REYES, GAYLE MARIE L. ROSALES in partial
fulfillment of the requirements for the degree Bachelor of Science in Computing Science
has been reviewed and edited by the undersigned based on the minutes of the Final Defense.
It now follows the standard format of the University and conventions of research
writing.
MARVIN DOMINIC B. BUENA, Ph.D.(cand.), MA, LPT
Signature over Printed Name
Grammarian/ Editor
Date Signed: May 24, 2022
Download