Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES BEHAVIORAL SEGMENTATION ANALYSIS OF ONLINE CONSUMERS IN SHOPEE BY USING CLUSTERING IN K-MEANS ALGORITHM A Thesis Presented to the Faculty of the College of Informatics and Computing Sciences Batangas State University The National Engineering University Lipa Campus Marawoy, Lipa City In Partial Fulfillment of the Requirements for the Degree Bachelor of Science in Computer Science LAYLO, FRANK B. REYES, CYMBELLYN ATHENA C. ROSALES, GAYLE MARIE L. MR. JAYSON A. BALAYANTOC Adviser May 2022 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES APPROVAL SHEET This undergraduate thesis entitled “BEHAVIORAL SEGMENTATION ANALYSIS OF ONLINE CONSUMER IN SHOPEE BY USING CLUSTERING IN K-MEANS ALGORITHM” prepared by Frank B. Laylo, Cymbellyn Athena C. Reyes and Gayle Marie L. Rosales in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science has been examined and recommended for oral examination. MR. JAYSON A. BALAYANTOC Adviser PANEL OF EXAMINERS Approved by the committee on Oral Examination with a grade of _PASSED . FRANCIS G. BALAZON, DIT Chairperson RICHELLE M. SULIT, MSCS DIONECES O. ALIMOREN, MSCS Member Member Approved in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science. FRANCIS G. BALAZON, DIT May 2022 Dean, College of Informatics and Computing Sciences ii Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES ACKNOWLEDGEMENT We would like to offer our deepest gratitude to everyone who assisted us in completing this study. We would like to thank Dr. Francis G. Balazon, for his patience and understanding; To Mr. Dioneces O. Alimoren for the inspiration, encouragement, and support during the dissertation process; To our thesis advisor, Mr. Jayson A. Balayantoc for helping us to better understand this research; To our Parents, who have always been there for us and given us their full support; To all of the Respondents who took the time to complete our survey; And lastly, to all the Panelists, for their hard work and for being a great motivator. The Researchers iii Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES DEDICATION This study is wholeheartedly dedicated to our beloved parents and guardians, who have inspired and strengthened us throughout this research. To our classmates and friends who helped us accomplish our study with their words of advice and encouragement; To our Dean of College of Informatics and Computing Sciences, Dr. Francis G. Balazon and to our Program Chairperson, Mr. Dioneces O. Alimoren; To our professors and to the Batangas State University; To our advisor, Mr. Jayson O. Balayantoc, for his tremendous effort and patience to guide us throughout the study; To Almighty God, for his guidance, power of mind, and protection and for giving us a healthy life to be able to carry out this research; All of these, we offer to you. The Researchers iv Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES TABLE OF CONTENTS TITLE PAGE APPROVAL SHEET ii ACKNOWLEDGEMENT iii DEDICATION iv LIST OF TABLES vii LIST OF FORMULAS viii LIST OF FIGURES ix ABSTRACT x CHAPTER I Introduction 1 Background of the Study 4 Statement of the Problem 5 Objectives of the Study 6 Significance of the Study 7 Scope and Limitations of the Study 8 Definition of Terms 8 CHAPTER II Review of Related Literature 11 E-commerce 11 Consumer Behavior 13 Consumer Segmentation 15 Behavioral Segmentation 17 K-means Algorithm 18 Review of Related Studies v 20 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Synthesis 30 Conceptual Framework 31 Theoretical Framework 32 CHAPTER III Project Concepts 34 Analysis and Design 35 Required Packages 36 Model Requirements 37 Functional Requirements 38 Non-Functional Requirements 38 Questionnaire Design 39 Data Analysis 40 Segmentation Process 42 K-means Clustering Algorithm 48 Model Evaluation 50 CHAPTER IV Computation for the Accuracy, Precision & Recall 52 Number of Demographic Profiles in each Behaviors 53 CHAPTER V Summary 56 Conclusion 57 Recommendations 59 REFERENCES 60 APPENDICES vi Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES LIST OF TABLES Table No. Page 1 Software Requirements for Development 37 2 Confusion Matrix Result 52 3 Age in each Behavior 54 4 Females in each Behavior 54 5 Males in each Behavior 55 vii Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES LIST OF FORMULAS Formula No. 1 Page Formula for Evaluation 50 viii Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES LIST OF FIGURES Figure No. Page 1 Conceptual Framework 31 2 Methodology of the Study 35 3 Dataset for Clustering 40 4 Scaled Data 41 5 Distribution Age 42 6 Distribution of Shopping Rate 43 7 Distribution of Price Payment 44 8 Distribution of Product Diversity 45 9 Silhouette Analysis 46 10 Clustered Data 47 11 Clustered Behavioral Factors 48 12 Behavioral Characteristics Model 49 13 Confusion Matrix 51 14 Behavioral Segmentation Analysis 53 ix Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES ABSTRACT TITLE: BEHAVIORAL SEGMENTATION ANALYSIS OF ONLINE CONSUMERS IN SHOPEE BY USING K-MEANS ALGORITHM AUTHORS: Laylo, Frank B. Reyes, Cymbellyn Athena C. Rosales, Gayle Marie L. INSTITUTION: Batangas State University Lipa Campus ADDRESS: Marawoy, Lipa City DEGREE: Bachelor of Science in Computer Science YEAR: 2021 - 2022 ADVISER: Mr. Jayson A. Balayantoc E-commerce represents a major shift in today's world of globalization. During the last decade, the majority of business organizations have kept up with technological progress and innovation. A marketing method called consumer segmentation divides consumers into groups based on shared qualities, needs, and interests. This study is about determining the different behavior that Shopee x Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES consumers have. Behavioral segmentation analysis is a marketing strategy in which consumers are grouped based on their certain behavioral factors such as Shopping Rate, Product Payment and Product Diversity. 4,000 Shopee customers who have made purchases from the business were selected at random to participate in the survey as responders. This study discovered four sorts of online consumer segments: opportunist consumers, transient consumers, need-based consumers, and repetitive consumers. There is an in-depth discussion of the behavioral characteristics of each segment, and the Shopee company, along with other selling companies, will be able to design strategies if they gain an understanding of each segment. The behavioral segmentation method discussed in this paper is based on the clustering algorithm kmeans. The visualization was performed using the orange application, while the kmeans algorithm was implemented using the jupyter notebook python environment. xi Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CHAPTER I This chapter includes an introduction, the background of the study, statement of the problem, objectives of the study, significance of the study, scope, and limitation of the study, and definition of terms. Introduction Since the arrival of the Internet in the Philippines in 1994, businesses have been able to sell their products online and make sales transactions via email. Although e-commerce benefits businesses, questions have been raised concerning its impact on economic growth and, in particular, productivity growth. Previous technological revolutions improved living standards over time, meeting one of the development's primary objectives. Shopee, a social-first, mobile-centric marketplace where users can explore, shop, and sell goods and services, started in Singapore in February 2015. It is a leading mobile e-commerce platform in Southeast Asia, started as a C2C platform before turning into a B2C marketplace serving consumers across the region. Payoneer and Shopee have teamed up to provide sellers with simple and affordable payment options. The app-based platform created a website to compete with other 1 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES e-c ommerce websites in the region. Shopee's "Shopee Guarantee'' escrow service withholds payment from merchants until consumers receive their purchases. [1] Due to the rapid growth of technology, organizations have moved from the old way of selling items to the electronic way of selling items. Business groups rely on the internet to transact business. Online buying allows analytical consumers to find a product after a comprehensive search. Online shopping simplifies and improves the lives of consumers. It saves time and money by allowing them to pay for their purchases without lining up at cash registers. Online shoppers can also track their orders and track their shipments. Due to the lack of maintenance and real estate requirements, businesses can sell items online at attractive pricing. Although the internet is a convenient way to shop, some people only use it in certain situations. They use the internet to research products before buying them in stores. Some worry about being addicted to online shopping. The following are drawbacks of online shopping: the lack of touch-feel-try generates questions about the quality of the goods on offer; a consumer must buy a product without seeing it in person, and online payments are not sufficiently secured. The drawbacks of online purchasing will not impede its expansion; in fact, internet shopping assisted firms in recovering from the recession. To make online buying productive, 2 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES merchants should pay attention to stumbling barriers and provide a safe payment mechanism. For a consumer to make a purchasing decision, many factors must be considered. Factors could be their friends, social structure, ethical or personal factors, economic factors, technological factors, cultural values, and recommendations. Online consumer behavior describes how consumers decide whether or not to buy anything from an online store. Although each consumer’s needs are different, the new expectations driving online consumer behavior are general. Product availability, transparency of delivery, low shipping costs, and, more recently, a smooth buying journey have all influenced whether consumers buy online or not and whether they will become regular customers. The focus of this study is to use the k-means algorithm to analyze online consumer behavioral segmentation to help sellers understand their consumers' diverse habits. The k-means algorithm divides data points into discrete, nonoverlapping categories. One of the most prominent applications of the k-means clustering is the segmentation of consumers’ behavior to understand them better and increase revenue. 3 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Background of the Study The act of purchasing goods or services directly from a seller over the Internet using e-commerce is known as online shopping. In recent years, the internet-based or "Click and Order" business model has replaced the traditional brick-and-mortar business strategy. More people than ever before are turning to the internet to purchase a wide range of products, from houses to shoes to plane tickets and everything in between. [2] Individuals now have a greater number of options when it comes to selecting their products and services while buying through an online platform. Consumers have adopted online shopping as a preferred method of shopping. This new shopping technology not only provides a large quantity and variety of products to potential consumers, but it also provides a wide range of company opportunities and a large market. Despite the numerous benefits, some consumers may consider online shopping to be hazardous and untrustworthy. There is no face-to-face interaction between the seller and the consumer, which makes it difficult to socialize, and the consumer may be unable to establish trust in the seller. In order to improve the online shopping rate, especially in the countries in which the online shopping rate is low, it is important to carry out a careful examination of consumer shopping 4 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES behavior. Understanding consumer shopping behavior is a very important step in developing successful marketing strategies. The importance of consumers' behaviors toward sustainable production and consumption is an important issue today. The core concept of this research is to analyze general consumer purchasing behavior as well as consumers’ attitudes toward buying products online using k-means algorithm. Since physical stores are no longer the only way to achieve retail success, an increasing number of businesses are now providing online shop interfaces for consumers. Statement of the Problem The research examines online consumer behavior by gathering demographic and behavioral data such as shopping rate, price payment, and product diversity. In addition, the study intends to employ the k-means algorithm to cluster the demographic and behavioral characteristics of Shopee online shoppers. An increase in consumer loyalty by personalizing services to consumers and improving consumer service allows a seller to approach a client or prospect precisely based on their individual needs and preferences. 5 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Objectives of the Study In response to the issues mentioned, this study is conducted to achieve the following objectives: 1. Collect the Shopee consumer data based on: 1.1 Demographic 1.2 Shopping Rate 1.3 Price Payment 1.4 Product Diversity 2. To process the collected data using a K-means clustering algorithm. 3. To develop a model for consumer segmentation using the K-means algorithm. 4. To evaluate the performance of the model using: 4.1 Accuracy 4.2 Precision 4.3 Recall 6 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Significance of the Study This study aimed to use the k-means algorithm to segment consumer behavior when it comes to making a product purchase decision. The outcome of this study would be beneficial to the seller, consumers, future researchers and the university. To the seller, this study will help them to understand consumers’ needs and aspirations by offering something unique based on their behavior when purchasing a product or service, and optimize product advertising for better relevancy by segmenting consumers based on their behavior. To the consumer, this study will aid in the production and provision of the goods and services they demand. To the future researchers, this will assist them in applying k-means for consumer segmentation and guide their future articles concerning consumer segmentation using the k-means algorithm. To the University, this study will benefit the university library by providing additional information to the students. 7 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Scope and Limitations of the Study Shopee consumer behaviors are the primary emphasis of this study. The respondents of the study were 4,000 Shopee consumers who had purchased products from the company. To gather the data, the researchers used Google Forms to determine the Shopee consumer behaviors based on demographics, shopping rates, product payment, and product diversity. After collecting data, the researcher will utilize the k-means algorithm to identify distinct consumer behaviors, such as opportunist consumer, transient consumer, skeptical consumer, and repetitive consumer. Definition of Terms In order to provide a better understanding of the terminology used in the study, the following words are conceptually and operationally explained. Algorithm - A method for resolving a well-defined computational problem. It requires knowledge of the various options for solving a computational problem, and hardware, networking, programming language, and performance constraints that accompany any particular solution. [3] 8 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES B2C - Business to Customer refers to the process of selling products and services directly between a business and consumers who are the end-users of the company. It is also known as direct marketing. [4] Behavioral Segmentation - The process of sorting and grouping clients based on their behavior is known as behavioral segmentation. [5] These habits include the items and material consumers consume, as well as the frequency with which they connect with an app, website, or business. C2C - Customer to Customer (C2C) is a business strategy that allows consumers to trade with one another, often over the internet. C2C firms are a form of a company model that arose as a result of the sharing economy and e-commerce technology. [6] Clustering - Involves grouping the population or data points so that they are more similar to each other than to other groupings. In short, the goal is to sort comparable groupings into clusters. [7] Customer Segmentation - The process of dividing consumers into groups based on common characteristics so companies can market to each group effectively and appropriately. 9 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES K-means Algorithm - An iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. [8] Python - An object-oriented, dynamically semantic high-level programming language. High-level data structures, dynamic type, and binding make it perfect for Rapid Application Development and as a scripting or glue language for existing components. [9] Shopee - Southeast Asia's and Taiwan's leading e-commerce platform. Shopee offers consumers a simple, secure, fast, and fun online shopping experience that tens of millions of people use every day. [10] 10 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CHAPTER II This chapter discusses relevant literature and studies that the researcher used to support the current study's importance. It also gives theoretical background and synthesis to comprehend the research and gain a better understanding. Review of Related Literature This related literature includes an exhaustive evaluation of the existing literature on our topic. This chapter will discuss the relevant information and findings from the existing literature. E-commerce UNCTAD, also known as the United Nations Center for Trade and Development 2017, has emphasized the significance of e-commerce, particularly online shopping for developing countries in recent years. For over a decade, the UNCTAD e-commerce and law reform program in 2020 has assisted developing countries in Africa, Asia, and Latin America in their efforts to establish legal regimes that address the issues raised by the electronic nature of information and 11 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES communications technologies (ICTs), such as ensuring trust in online transactions, facilitating the conduct of domestic and international trade online, and providing legal protection for users and providers of e-commerce and electronic government services. [11] According to Vicente 2016, e-commerce was a rising trend in the Philippines due to the widespread adoption of mobile technologies, particularly among younger consumers. Despite its growing popularity and acceptance throughout the country, it was not exempt from the problems that the country was experiencing. [12] Segovia 2016 claims that e-commerce is falling behind due to country's rapid growth and development. This was due to a lack of infrastructure, including internet connectivity, electronic payments, a legal framework, and logistical support. It was also stated that there are obstacles and challenges to developing e-commerce in the Philippines due to the difficulties associated with accessing the internet. [13] Companies that conduct business outside of their home country, according to Babenko et al. 2017, may be more interested in reducing operational expenses through the use of information technology. With rich information flows to simplify and optimize the movement of physical commodities in the supply chain, it is possible to save both time and money on product delivery. This is made possible through the use of the Internet for transactions and coordination. It is usually thought 12 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES that the implementation of e-commerce is a worldwide process overseen by a common set of actors. [14] Lazada, Shopee, and Zalora are examples of e-commerce platforms that have long dominated the online shopping business in Southeast Asia. These platforms sell goods from their own fulfillment centers while also allowing third-party sellers to sell through their platforms. Because of the growing number of online users, retailers have turned to the digital medium in order to more readily reach their target consumers. According to the Statista Research Department 2021, these platforms offer tempting sales bargains such as free shipping, substantial discounts on a variety of items, and payment methods like cash-on-delivery. [15] Consumer Behavior Consumer behavior is something that needs to be better understood in order for its growth to succeed. Using consumer behavior research, according to Vrender 2016, we can develop a general model of purchasing behavior that shows the stages that consumers go through while making a purchase decision. [16] Further to the topic, Panaitescu 2021, explained that consumer behavior could be divided into four categories: habitual purchasing behavior, variety-seeking behavior, dissonancereducing purchasing behavior, and complicated purchasing behavior. Customer behavior types are established by the sort of goods a consumer requires, the amount 13 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES of engagement, and brand distinctions. When buyers purchase a high-priced item, they exhibit complex purchasing behavior. Consumers are heavily involved in the buying choice in this uncommon transaction. Consumers will conduct extensive research before making an investment decision. When a client engages in varietyseeking purchase behavior, he or she is not highly active in the purchasing process, yet there is still a difference in the product given by different brands. Consumer engagement is quite high in dissonance-reducing purchasing behavior. This might be owing to the high cost and occasional buying. Furthermore, there is a scarcity of options with little distinctions across brands. In this case, a buyer purchases readily available goods. When a consumer has little input in a purchasing choice, this is portrayed as habitual buying behavior. In this situation, the consumer notices only a few notable changes across brands. [17] In the event of service failures, consumers who are influenced by negative emotions have higher switching intentions, especially when the failures are controllable factors that can be controlled and avoided. However, Urea and Hidalgo 2016 say that if consumers are handled appropriately after a failure, distributive justice methods such as monetary compensation might increase happy sensations while decreasing negative ones. Furthermore, negative emotions have an impact on repurchase intent. Also, procedural justice, such as obtaining fair treatment from the company, increases 14 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES consumer satisfaction, particularly when a consumer complaint and the company attempts service recovery after the complaint has been filed. [18] Chou and Hsu 2016, asserted that individuals' intentions to continue purchasing online are influenced by their level of satisfaction with and perceived usefulness of the website they are currently using. Opposite to this, shopping habits increase the influence of emotional assessment on continuation intention, but shopping habits decrease the influence of rational evaluation on continuation intention. [19] There is a direct correlation between the level of involvement an online business and the level of enjoyment experienced. Emotional pleasure increases the likelihood of purchase during online interactions, and consumers' previous emotional experiences may have an impact on their purchasing decisions. The use of human brands, and enhanced impressions of human connections and the building of emotional bonds, according to Chechen et al. 2016, could give businesses an edge over their competitors in these circumstances. [20] Consumer Segmentation Consumer segmentation has always been important in business. According to a recent Forrester survey, only 33% of organizations implementing consumer segmentation believe that it significantly impacts their business. Specifically, the 15 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES report states that one of the primary reasons organizations fail is that they continue to rely on traditional consumer segmentation methods rather than embracing the vast amount of consumer data and advanced analytics approaches that are now available. [21] According to Barman et al. 2019, consumer segmentation is significant when a firm aims to reach a certain market area. This assists marketing teams in effectively managing their budgets to achieve the best possible results in terms of product sales results. Following the proper implementation of consumer segmentation, marketers can create separate content for each target market category they wish to target. Marketers will benefit from this method of communication as they work to build relationships with their target audiences. [22] A single client attribute, such as age, gender, country of origin, or stage in the family life cycle, could be utilized as a segmentation criterion, according to Dolnicar et al., 2018. However, it could also contain a broader range of consumer characteristics, such as the number of perks desired when purchasing a product, the number of activities accomplished while on vacation, environmental ideals, or a spending habit. [23] Consumer segmentation may be used to assist businesses in personalizing marketing plans, analyzing trends, planning product development, advertising campaigns, and offering relevant products by utilizing a range of distinct customer attributes. Srivastava 2016, also added that consumer segmentation personalized individual 16 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES communications in order to communicate more effectively with the desired groups. Location, age, gender, income, lifestyle, and past purchasing history are the most typical factors utilized in consumer segmentation. [24] Behavioral Segmentation The simplest method for understanding how a product provides value or overcomes obstacles is to categorize consumers based on their actions. According to Jones 2017, behavioral segmentation refers to a process in marketing that divides consumers into segments depending on their behavior patterns when interacting with a particular business or website. [25] As a correlate to what Jones asserted, Kotler and Keller 2016 stated that people could be divided into segments based on whether they have an enthusiastic or positive attitude toward a product, or whether they are indifferent, negative, or hostile toward a product. By considering consumer's attitude toward a company's brand or product, a company will gain a broad view of the market and segments. [26] Behavioral segmentation, according to Fahed Yoseph 2019, is considered one of the essential principles in modern marketing. Traditional consumer segmentation models necessitate months of analytical work, resulting in discrete consumers’ insights that are out of date when compared to the dynamic body of consumers they are meant to reflect. 17 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Personalization and the overall consumer experience are factors for the retail industry’s success or failure. [27] K-means Algorithm According to Bhade et al. 2018, k-means clustering is the best appropriate method for clustering the dataset. Consumer loyalty and attention span are two important issues affecting today’s business sector, and they are both difficult to maintain. With the help of the k-means algorithms, it is simple to determine which consumer is the most profitable based on the clusters that have been identified. [28] Segmentation, as defined by Mohammed Muzammil 2021, is the process of grouping identical elements into similar groupings or clusters. It can be defined as the process of categorizing data into groups that contain data points or elements that are comparable to one another. Identical data points are grouped together in one group using the k-means algorithm, and all of the data points in that group share common characteristics but are distinct when compared to data points in other groups. [29] Furthermore, according to Abhinav Sagar 2019, the purpose of kmeans is to divide data points into subgroups that are distinct from one another and do not overlap. K-means clustering is commonly employed in customer segmentation to acquire a deeper understanding of them, which may then be used 18 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES to increase a company's revenue. [30] In addition, Trevino 2016 stated that the kmeans clustering algorithm is used to identify groupings in data that have not been explicitly classified by the researcher. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets. As soon as the algorithm has been run and the groups have been defined, any new data may be quickly and easily assigned to the appropriate category. The k-means clustering algorithm produces a final result by iteratively refining the initial results. The algorithm inputs are the number of clusters K and the data set. The data set is a collection of features for each data point. [31] Many clustering algorithms have been developed to classify consumers, according to Shihab et al 2019. This was done to achieve superior clustering results. K-means clustering is a well-known technique in data mining; it is an unsupervised, iterative, partitioning learning technique used in data mining. It addresses a wide range of clustering issues, particularly large datasets. It is a very powerful tool. The algorithm is divided into two sections. In the first section, K centers are chosen at random. K is fixed at the start. Each data object is sent to the nearest center in the second section. A self-organizing map can be used to select the starting value of K. [32] 19 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Review of Related Studies This section summarized the studies on our topic. This study is significant for the findings and reveals any research gaps or weaknesses. According to Blut 2016, individuals or firms participating in e-commerce, whether buyers or sellers, rely on Internet-based technology to conduct their transactions to succeed. Due to the huge power of e-commerce, transactions may take place at any time and from any location, and geophysical boundaries can be removed. [33] To boost consumers’ trust in the company, according to a study conducted by Panula in 2017, its website and its product and business information must be clear and easily accessible to consumers to increase their trust in the brand and increase sales. The availability of electronic commerce to online consumers, the speeding up of delivery, and the rapid response to feedback requests were all used to develop trustworthiness among consumers. [34] Babenko et. al., 2019, ecommerce technology has benefited businesses. This permitted business and corporation to sell their products and services worldwide and conveniently, enabling consumers to purchase from any location suitable for their schedules. Since the beginning of e-commerce, there have been no constraints on searching for innovative technologies to meet the existing scenario, which both specialists and companies have pursued. This suggests that e-commerce, as we know it today, will 20 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES be significantly different in five years. Electronic commerce will witness considerable development and technological advancements as it becomes more prevalent in business globally, in both developed and developing countries. [35] The "Target Market (customer) Segmentation," claims that every salesperson and marketer understand that their products and services cannot be supplied to everyone. To effectively sell your product or service, you must first identify the exact kind of consumers that will benefit the most from your offering. Segmentation research may significantly help your company identify how people differ in their perspectives, desires, and motivations, allowing you to build a strong brand portfolio and adapt marketing messages to different groups of people. As mentioned by the researchers, segmentation analysis for each product category must be purpose-built and unique, and design and analysis require an investigative and iterative approach. [36] Following the market segmentation into consumer groups with comparable characteristics, the organization should establish strategies for each category, argued to Nilsson et al 2016. These strategies should be designed based on how value is generated in the various segments and, as a result, should be tailored to each group of clients. As a result, the products become segment-specific to fulfill each segment’s needs. Market segmentation is thus a useful strategy for a firm seeking to become more customer-focused. Market segmentation may be 21 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES required for a providing firm operating in an industrial industry that wishes to become more client-oriented. A segmentation algorithm may lead to a better knowledge of consumers and raise awareness of the importance of meeting customer demands in all operations within the organization, particularly marketing and sales activities. This debate indicates that efficient segmentation necessitates a significant amount of work and expertise from the whole organization. [37] Since it is directly related to a company's consumer satisfaction, consumer segmentation is an important method in literature and software related to consumer relationship management (CRM). Ozan and L. O. Iheme conducted a study in 2019 titled "Artificial Neural Networks in Customer Segmentation.” They utilized one of the most advanced machine learning algorithms available, termed MLP (Multilayer Perceptron), which is a feed-forward neural network structure optimized by using backpropagation. The findings demonstrate that even with a small amount of training data, it is possible to construct a generalizing model capable of reproducing the understanding that supports its customer classification methodology. When this model is integrated into their workflow, it can check clients regularly, automatically determining whether or not to promote a customer and notifying supervisors. [38] With the help of an accurate clustering procedure, they can effectively identify consumers and establish their attributes, according to the practical approach in 22 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Customers Segmentation by Using the K-Means Algorithm, a thesis by E. Y. L Nandapala and K. P. N Jayasena 2020. They assert that with the help of an accurate clustering procedure, they can effectively identify consumers and establish their attributes. Businesses will be better able to make accurate decisions, supply new products and services, and adjust existing products and services in response to consumer demand due to a successful clustering process since they will identify consumers accurately. When applied to a dataset in any industry, the method given in this study may serve as an accurate segmentation methodology. [39] According to the findings of a study conducted by Farid et al. in 2017, consumer segmentation is a marketing strategy that involves first dividing customers into groups based on their underlying characteristics, needs, and interests, and then designing and implementing strategies to target those groups. A prominent form of segmentation approach is behavioral segmentation analysis, in which consumers are categorized according to particular behavioral traits such as decisionmaking, spending, and usage. This study conducted a behavioral segmentation analysis using real e-commerce transaction records from 10,000 online customers and discovered five consumer segments, including opportunist customers, transient customers, need-based shoppers, skeptical newcomers, and repeat purchasers. Opportunist take advantage of opportunities as soon as they come up, never paying 23 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES full price for a product. They always try to find a deal on what they want to buy. These people never buy anything that isn't on sale. Transient consumers are customers who are just passing through a dealer's area of sales responsibility for a short period. Need-based shoppers only purchase goods when they have an immediate need for them. When they walk into a store, they already have the section in mind that they want to go to. They typically do not need the assistance of a salesperson to select a product because they are typically well-informed regarding the item that they intend to buy. Skeptical newcomers are consumers who don't need anything specific and are attracted by the store's atmosphere. These consumers usually like to talk to people and ask you questions about random products, but they don't want to buy them. The most common example of this kind of consumer is a group of college students who go to malls to kill time. They go into any store and ask about random items there. Repetitive consumers repeatedly buy from a company and are extremely valuable to the company. It is important to meet the expectations of this particular consumer. They not only remain dedicated to the brand, but they also praise and recommend it to their circle of friends and relatives. Detailed discussions were held on the behavioral features of each segment, and recommendations were made about how to approach each segment to increase their online buying rates. Identifying the behavioral features of each segment will enable 24 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES the selling organizations to build marketing tactics that are appropriate for each segment's needs. [40] The findings of a behavioral segmentation study conducted by Yu. Liu et al. 2015, which was based on real e-commerce transaction records of Chinese online consumers, revealed the existence of six different types of online consumers: economic purchasers, active-star purchasers, direct purchasers, highloyalty purchasers, risk-averse purchasers, and credibility-first purchasers. [41] Psychographic and behavioral segmentation analysis conducted on Generation Y female online shoppers by Ladhari et al. 2019, identified four different four approaches to online shopping: trend shopping, pleasure shopping, price shopping, and brand shopping. Six shopping profiles have also been identified, each with different objectives: price shoppers, discovery shoppers, emotional shoppers, strategic shoppers, fashionistas, and shopping fans. [42] Apart from that, Huseynov and Yldrm 2017, conducted a behavioral segmentation analysis using real ecommerce transaction data from Turkish online consumers and discovered five different types of consumer segments, including opportunist consumers, transient clients, need-based shoppers, skeptical newcomers to the market, and repeat purchasers. It was demonstrated in both researches that each identified online consumer segment had distinct behavioral traits that distinguished it from the other segments. [43] One of the study hypotheses, "Online Consumer Typologies and 25 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Their Shopping Behaviors in B2C E-Commerce Platforms," which was conducted as a result of the psychographic and behavioral research, is that a large online consumer audience does not represent a single market segment. It is more appropriately described as a collection of several consumer segments in the following way: each segment's members engage in distinct online purchasing habits and behave distinctly in response to marketing efforts. Although the research findings revealed that certain characteristics of the broad online consumer audience were shared in terms of their perception of e-commerce, the findings also revealed several distinct groups of consumers who were significantly different from one another in a various of ways. Consequently, the marketing mix of e-retailers should differ based on whether or not the majority of their customers are online. Instead of employing a single marketing strategy that applies to all consumers, marketers should customize their products and services to meet the individual demands of each consumer category they serve. [44] The research "An Improved K-means Clustering Algorithm," conducted by H. Xu et al. 2016, presents how to improve the K-means clustering method by removing the impact of noise while simultaneously enhancing the selection of starting points. Gridding optimization also eliminates the computing complexity associated with the peak density approach, and the need for an excessive number of 26 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES human evaluations and the resulting errors that can result from this. By including the idea of granularity, the edge of the dense zone is removed without gridding, and the precision of cluster initialization center points is raised in the absence of gridding. [45] A technique known as K-means clustering can group data similar to one another when a data collection has been divided into multiple clusters. It is an unsupervised approach that may be used to generate several clusters of data from a single data set of information. In terms of sum squared errors and the number of successfully categorized instances, the genetic K-means clustering exceeds the regular K-means clustering. By examining various large-scale data sets with high dimensionality, it is possible to determine the performance of different clustering algorithm and the performance of different machine learning algorithms for different distance metric combinations. [46] Researchers S. Na et. al 2016, published "Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm" explaining the k-means approach and exploring the drawbacks of the traditional k-means clustering technique. The conventional kmeans clustering method has low efficiency because of the high computational complexity that results from the necessity to reassign data points several times during each iteration. This is due to the high computational complexity that results from the necessity to reassign data points several times during each iteration. Using 27 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES the approach proposed in this study, it is guaranteed that the entire clustering procedure will be finished in O(nk) time without sacrificing cluster accuracy. The results of the experiments indicate that the improved strategy can reduce the time it takes for the k-means algorithm to run. As a result, the k-means technique that has been proposed is viable. [47] In k-means iterations, each point needs to be examined if it is closer to its center than any other center. Hence, each point has a larger searching space. For example, if there are k clusters, each point needs to calculate and compare distance (k−1) times in each iteration. J. Qi et al. 2016, propose a novel optimized hierarchical clustering method incorporated with three optimization principles. He says that the use of K*initial centers significantly increases the probability of attaining the finest local optima, and multi-round top-n nearest clusters merging approaches the optimal result in a more progressive manner than before. As an alternative to re-calculating feature values from scratch, the top-n and update principle optimizations update feature values of clusters by preceding clusters or relocated items. Furthermore, the pruning method minimizes the adjusted searching space for each point in the k-means iteration by a significant amount. [48] In P. Divya and K. Anusudha’s "Segmentation of Defected Regions in Leaves using k- Means and OTSU's Method (2018), they concluded that k-means and Otsu's methods could be used for segmentation of defected regions in leaf images. The 28 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES defective regions are segmented in the k-means clustering approach by splitting them into many clusters, each of which comprises a group of pixels, whereas in Otsu's method, the defected picture is divided based on the automated thresholding process each time a new threshold is determined. To generate the result, both proposed approaches are iterative. According to the results, the suggested technique offers more details about the picture’s defective regions and delivers higher PSNR values than the existing methods. [49] This paper, namely “Image Segmentation Algorithm Based on Particle Swarm Optimization with the k-means Optimization” by X, achieves accurate and efficient image segmentation. Chen et. al (2019) propose a Particle swarm optimization (PSO) algorithm and k-means aggregation hybrid image segmentation algorithm, which aims to solve the problem of selecting the initial center of k-means clustering and improve the disadvantages of easily falling into local optics. The local optimization ability of the k-means clustering algorithm is combined to improve the accuracy of image segmentation. The optimization algorithm retains the advantages of the fast converge speed of the kmeans clustering algorithm and overcomes the disadvantages of the particle swarm optimization algorithm, which is prone to fall into limited optics. Experimental results demonstrate that the combination of the PSO and k-means algorithms have better stability. [50] 29 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Synthesis To better understand the issue, the researchers looked at similar articles. To build and carry out the current work, it was necessary to research e-commerce, consumer purchasing behavior, consumer segmentation, behavioral segmentation, and the k-means algorithm. Researchers found that there are many types of consumer segmentation, including behavioral segmentation, demographic segmentation, firmographic segmentation, psychographic segmentation, and needbased segmentation that can help companies such as Shopee split consumers into groups based on their similar attributes or affinities. When it comes to consumer behavior segmentation, it is essential to understand how and when they decide to purchase a product or service. It necessitates a marketer's attention to consumer behavior, such as an existing consumer's purchasing activity or the behavior patterns of a target audience, to change a brand's marketing message, boost brand loyalty, and secures client retention. Companies employ the divide-and-conquer strategy to separate and conquer markets, which serves as the basis for consumer segmentation. Marketers can gain a competitive advantage over their competitors by effectively using segmentation. Because of market segmentation, marketers can focus on managing consumer relationships, which was previously impossible with traditional mass marketing methods. 30 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Dividing data points into equal segments, the k-means algorithm simplifies them into non-overlapping groupings. One of the most typical k-means clustering used to segment consumers; it allows businesses to understand more about their consumers, which can then be utilized to improve sales. The purpose of the k-means clustering algorithm is to divide data into k clusters so that data points in the same set are similar and data points in other groups are further apart. K-means clustering is a data division method that divides data into k clusters. This study's purpose is to apply the k-means algorithm using python in segmenting consumers, efficiently use the k-means algorithm, and identify the primary behavior of an online consumer when it comes to online shopping. Conceptual Framework Given that this conceptual framework is divided into three parts: input, process, and output; this study can focus its attention on a specific aspect of the topic. 31 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 1. Conceptual Framework Figure 1 shows the conceptual framework of our study. The input contains the data of every Shopee consumer. The collection of the data input by each consumer will be acquired using Google Form, and each collected data will be grouped and segmented using the k-means algorithm. The output will be able to classify various consumer behaviors, and demonstrate whether the k-means algorithm is efficient in consumer segmentation. Theoretical Framework Marketers use behavioral segmentation to target their customers based on their actual purchasing habits rather than their demographics. Categorizing consumers into groups divides the market into groups based on their knowledge of, attitude toward, use of, or reaction to a particular product. American economist 32 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Culbertson developed a market segmentation theory in 1957, which explains that long-term and short-term interest rates are not related to one another since different investors finance them. He asserted that market segmentation theory further says that the buyers and sellers who make up the market for short-term securities have distinct characteristics and motivations from the buyers and sellers who make up the market for intermediate and long-term maturity securities. [51] The idea is based partly on the investment patterns of various institutional investors, such as banks and insurance companies, and it is meant to be applied broadly in practice. According to market segmentation theory, there is no relationship between bond markets with different maturity lengths. Those interest rates impact the supply and demand for adhesives and other financial products. When it comes to investing in fixed-income securities, the theory believes that investors and borrowers have preferences for specific yields. Individual smaller markets are created due to these preferences, each subject to the supply and demand pressures particular to that market. When looking at fixed-income for assets with the same credit value as one another, the theory attempts to explain the form of the yield curve by stating that bonds of different maturities are not interchangeable with one another. Because of this, the yield curve is influenced by the forces of supply and demand at each maturity length. 33 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CHAPTER III DESIGN AND METHODOLOGY This chapter discusses how the algorithm was used, the data gathering process, and the significance of the algorithm's implementation to the study's outcome. Project Concept Behavioral segmentation is one of the most significant unsupervised learning uses in consumer segmentation. Using clustering algorithms, businesses may discover several consumer categories, allowing them to target the prospective user base. The researchers used k-means clustering in this study since it is the most important approach for grouping unlabeled datasets. Companies that use consumer segmentation believe that each client has unique needs that must be addressed with a tailored marketing strategy. They want to understand better the client they're trying to reach. Their goal must be particular and designed to meet the needs of every individual consumer. Consumer may better understand consumer preferences and find valuable categories by collecting data. This helps them build marketing campaigns efficiently while limiting financial risk. 34 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Behavioral segmentation analysis was performed using data derived from online purchasing transactions made by Shopee customers. With 54.6 million monthly web visits, Shopee is one of the most popular online shopping websites in the Philippines. Shopee members have access to a wide range of many selected brands in a variety of categories, including clothing, accessories, cosmetics, home décor, and lifestyle. Analysis and Design Figures and tables were used to demonstrate the research study's methodology. Segmenting consumers using the k-means algorithm allows the classification of similar consumers into the same segment. This study can help the sellers better understand consumers in terms of both static and dynamic behaviors. 35 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 2. Methodology of the Study Figure 2 shows the methodology of the study. It is the process that the researchers needed to implement. The first step is to import the required packages and then input the data that's been collected. Then, the 4,000-row dataset is preprocessed to check for missing values, noisy data, and other irregularities before algorithm implementation. The silhouette analysis is then used to get the ideal value of k, in our instance 4 clusters. Next is to implement the k-means algorithm to display the model and then get the model’s accuracy, precision and recall using the evaluation matrix. The behaviors were then classified using the k-means algorithm, obtaining four distinct behaviors (Opportunist, Skeptical, Transient, and 36 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Repetitive). Lastly, the number of demographic profiles associated with each behavior, such as age and gender, must be determined. Required Packages Clustering refers to grouping of related elements into similar groupings or clusters. It can be defined as grouping data into groups with similar data points or elements. The k-means algorithm uses the clustering approach to group identical data points in one group, with all data points in that group sharing common properties but being unique from data points in other groups. In the algorithm process every task must begin with importing the necessary packages in the appropriate environment (python in our case). Pandas is used to work on the data, NumPy is used to work with arrays, matplotlib and seaborn are used for visualization, mplot3d is used for three-dimensional visualization, and scikit-learn is used to develop the k-means model. 37 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Model Requirements The researchers gathered ideas, data, program needs, and potential difficulties. Specifically, the researchers specified all of the system's software, functional, and nonfunctional requirements in detail. In Table 1, you will find a list of the suggested software requirements to ensure that the model functions accurately. Python was used to implement the kmeans algorithm in the model, and the Orange application was utilized to visualize the model and other graphs. Table 1. Software Requirements for Development Software Requirements Specifications Operating System Windows 7 or higher Programming Language Python Tools Jupyter Notebook & Orange App 38 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Functional Requirements The model's characteristics are defined by the functional requirements. This also demonstrates how and where the model may be applied. 1. The model will be useful in developing recommendations that can predict which items or features each consumer would be interested in next. 2. The model will be useful in helping businesses understand the distinct groups of people that make up their market. 3. The model can generate behavioral segmentation. Non-Functional Requirements These are features or standards that the model must meet in order to be considered functional. To achieve excellent segmentation performance, the accuracy, precision, and recall of the model are calculated. Accuracy. The evaluation needs to show that the model is accurate between 50 and 90 percent of the time. Precision. After going through the evaluation process, the model should generate reliable findings. 39 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Recall. For the model to be useful, it needs to have the capacity to capture results with precise segmentation. Questionnaire Design In terms of data gathering, the researcher utilized Google Forms to perform social media surveys as their major strategy. The survey questions focused on the demographic profile and behavioral factors of Shopee consumers. Questionnaire is carefully designed to meet the requirements of the research and the questions are taken from previous literature on Behavioral Segmentation Analysis. Based on that study, Huseynov and Yldrm 2017, discovered five different types of consumer segments, including opportunist consumers, transient consumers, need-based shoppers, skeptical consumers, and repetitive consumers. The researchers’ tool for computing the responses is a Likert Scale. The Likert Scale commonly used for questionnaires and is mostly used in survey research. After conducting the survey, the researcher computed the answer using standard deviation. The questionnaire consists of two main parts, the first part is mainly focused on the demographic profile of Shopee consumers. Second part will cover the questions pertaining to the behavioral factors of Shopee consumers. 40 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Part I. Demographics The first section of the questionnaire focused on demographic information. This portion of the survey contained questions relating to the age and gender of Shopee consumers. Part II. Behavioral Factors This is the section that covered the behavioral factors, and these factors are Shopping Rate, Price Payment, and Product Diversity. Shopping Rate, as one of the factors, includes five questions. As mentioned above, the questions were selected from previous literature, and some were self-structured. There were also five questions on Price Payment, and lastly, the Product Diversity has ten questions, and, in that part, it included a comment box that was left for the respondents to fill as if they felt that there were some others. 41 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Data Analysis Figure 3. Dataset for Clustering The datasets that were utilized for the study are shown in Figure 3. From looking at the figure, it is clear that it contains 4,000 rows and 5 columns. Included in the demographic profile are columns for age and gender. There are columns for shopping rate, price payment, and product diversity for the behavioral factors. 42 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 4. Scaled Data Figure 4 shows the scaled data. The researchers utilized the standard scaler method to enable a smooth flow of gradient descent and assist algorithms in reaching the minimum of the cost function as rapidly as possible. 43 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Segmentation Process Figure 5. Distribution of Age Figure 5 shows the distribution of the consumer's age. This is implemented by using Python to encode the data. Out of 4,000 Shopee consumers, age 19 got the highest response consisting of 723 while the lowest was age 60 who got only 1 response. Ages of 18 and 25 composed around 61.73%. Behavioral factors are used in the segmentation process. These factors are shopping rate, price payment and product diversity. Each behavioral segmentation analysis had a different dimension. 44 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 6. Distribution of Shopping Rate The total number of products purchased by the consumer is represented by the shopping rate data in Figure 6. The shopping rate was done by getting the zscores of the consumer’s data. The researchers made a histogram plot to visualize the number of consumers according to their shopping rate. The majority of the consumers have a shopping rate of 1-10. 45 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 7. Distribution of Price Payment In figure 7, Price payment data is the average cost of a consumer's internet purchases. Price payment was done by getting the z-scores of the consumer’s data. The researchers made a histogram plot to visualize the number of customers according to their price payment. The majority of the consumers have a price payment in the range of 11-15. 46 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 8. Distribution of Product Diversity In figure 8, Product diversity data refers to how many different types of products a consumer purchase. Product diversity was done by getting the z-scores of the consumer’s data. The researchers made a histogram plot to visualize the number of consumers according to their product diversity. The majority of the customers have a product diversity in the range of 1-10. 47 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 9. Silhouette Analysis The above plots were subjected to a Silhouette analysis to determine the best value for n clusters. The x-axis represents the data, and the y-axis for the silhouette score. For the given data, the values of n clusters 2 and 3 appear to be suboptimal for the following reason: Presence of clusters with below-average silhouette scores. For n clusters, the values 4 and 5 appear to be the best. Each cluster has a higher silhouette score than the average. Furthermore, the size fluctuation is comparable. A decisive factor is the thickness of the silhouette plot that represents each cluster. The thickness is more uniform in the plot with n cluster 4 (bottom left) than in the plot with n cluster 5 (bottom right), where one cluster thickness is significantly greater than the other. As a result, 4 clusters are the best choice. 48 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 10. Clustered Data Using the k-means algorithm, the researchers were able to cluster the data shown in Figure 10. The four classes are represented by the behavioral cluster such as Opportunist, Transient, Skeptical and Repetitive. The 0 is the Transient consumer, 1 for Skeptical consumer, 2 for Opportunist consumer and 3 for Repetitive consumers. 49 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES K-means Clustering Algorithm Figure 11. Clustered Behavioral Factors According to related studies about Behavioral Segmentation, a consumer is Transient if product diversity is greater than price payment and price payment is greater than shopping rate, Skeptical if product diversity is greater than shopping rate and shopping rate is greater than price payment, Opportunist if price payment is greater than shopping rate and shopping rate is greater than product diversity, and 50 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Repetitive if shopping rate is greater than price payment and price payment is greater than product diversity. Figure 12. Behavioral Characteristics Model Figure 12 shows the behavioral characteristics fed into the k-means cluster visualization tool part of the orange application. C1 (Blue) represents the Transient consumer, C2 (Red) represents the Skeptical consumer, C3 (Green) represents the Opportunist consumer, and C4 (Orange) represents the Repetitive consumer. 51 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Model Evaluation Using the following formulas, the researchers were able to calculate the accuracy, precision, and recall of the model using the confusion matrix. π΄πππ’ππππ¦ = ππ + ππ ππ + ππ + πΉπ + πΉπ ππππππ πππ = π πππππ = ππ ππ + πΉπ ππ ππ + πΉπ Formula 1. Formula for Evaluation Formula 1 is used to assess the performance of the k-means algorithm. In order to solve the problem, the variables TP means TruePositive, FP means FalsePositive, TN means TrueNegative, and FN means FalseNegative are necessary. This formula will be used to examine the k-means algorithm and determine the efficacy of the study. 52 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CHAPTER IV RESULTS AND DISCUSSION This chapter contains a detailed presentation and discussion of the results of behavioral segmentation by implementing the k-means algorithm. Figure 13. Confusion Matrix Figure 13 shows the confusion matrix, with the x-axis representing data prediction, while the y-axis represents the actual data. True positive (upper left), true negative (lower right), false positive (upper right), and false negative (lower left) are the classifications used to determine the algorithm's accuracy, precision, and recall. 53 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Computation for the Accuracy, Precision and Recall In this situation, the value of TP is 972, the value of TN is 3028, the value of FP is 0, and the value of FN is also 0, resulting in the following computation of the percentage of accuracy, precision, and recall as shown below: Based on the confusion matrix, the segmentation achieved an accuracy of 100 percent, a precision of 100 percent, and a recall of 100 percent. As a result, we can conclude that the k-means algorithm is accurate and precise in its calculations. Table 2. Confusion Matrix Result Accuracy 100 % Precision 100 % Recall 100 % 54 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Figure 14. Behavioral Segmentation Analysis The results of the behavioral segmentation analysis are shown in Figure 14. The study was conducted with 4,000 Shopee consumers. Among them, 517 respondents are Transient consumers, 506 respondents are Skeptical consumers, 88 respondents are Opportunist consumers, and 2889 respondents are Repetitive consumers. The overall result shows that 72.22% of the Shopee consumers are repetitive. 55 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Number of Demographic Profiles in Behaviors Age and gender are the demographic profiles in this study. Table 3 shows the age of each behavior, 46 respondents are less than age 18, 2,463 respondents are ages 18-25, 1,013 respondents are ages 26-35, 295 respondents are ages 36-45, and 63 respondents are ages 55 and above. Overall, 18 to 25-year-olds had the greatest response rate of 61.57 percent. Table 3. Age in each Behavior The number of females in each behavior is shown in Table 4. There are 284 transient females, 386 skeptical females, 54 opportunist females, and 1377 repetitive females. Overall, the repetitive had the greatest response rate of 65.54%. 56 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Table 4. Females in each Behavior The number of males in each behavior is shown in Table 5. There are 233 transient males, 120 skeptical males, 34 opportunist males, and 1512 repetitive males. Overall, the repetitive had the greatest response rate of 79.62%. Table 5. Males in each Behavior 57 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CHAPTER V SUMMARY, CONCLUSIONS AND RECOMMENDATIONS In this final chapter, the researchers discuss the entire generalization, conclusion, and advice they have provided. Summary In this study, behavioral segmentation analysis was carried out on 4,000 Shopee consumers using k-means algorithms, which were applied to the data. Many behavioral factors were extracted from the dataset and used to identify groups of consumers who had similar traits to one another. In the segmentation phase, the kmeans algorithm was applied. Online consumers do not all behave in the same way, according to data segmentation. The online consumer, on the other hand, is a collection of unique consumer groups, each with its own set of online shopping habits. The results of this study can be used by online sellers to improve their online sales rates by developing more successful marketing strategies for each specific segment. 58 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Conclusion Based on the result of the study, the following conclusion were drawn: 1. We were able to define four behavioral segments by gathering data from Shopee consumers based on their shopping rate, price payment, and product diversity. 2. Through the use of k-means clustering algorithm, we were able to identify four distinct Shopee consumer behavioral segments. Each segment has its own set of characteristics such as Transient consumer, Skeptical consumer, Opportunist consumer and Repetitive consumers are among the clusters covered. The identified consumer segment was found to have distinct features that set it apart from the others. Transient consumers are coupon-prone consumers who make extensive use of promotional offers. Attractive promotions and discount coupons can enhance the frequency with which these consumers shop online. They are easily replaced by competitors who offer incentives equivalent to or greater than theirs. Skeptical consumers are composed of consumers who are relatively new to the online store and are hesitant to make a purchase. This consumer chooses various options at extremely low prices when shopping for things. Coupon redemption rates for skeptical consumers are extremely 59 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES low, similar to those among need-based consumers. As a result, discount coupon rewards may have no major impact on their online spending rates. Opportunist consumers regularly visit and shop online stores. They like to use discount coupons and free shipping deals when buying things online. The availability of free shipping and discount coupons can encourage opportunist consumers to shop online more often. The high prevalence of product refunds among opportunist consumers indicates weak decisionmaking abilities while making online product selections. Repetitive consumers are primarily long-term consumers who visit the online store and spend significantly more money there than in any other area. This customer segment is extremely loyal to online retailers. In online e-commerce, consumer loyalty refers to a favorable attitude toward an online store that leads to repeated purchasing behavior. 3. The k-means algorithm can determine consumer segments have similarities in different aspects. In all segments, the sizes of female consumers are significantly higher than male consumers and based on the number of demographic profiles per behavior, the age group between 18 and 25 years old had the highest repetitive behavior response rate, at 61.57 percent. In 60 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES contrast, the male had the highest repetitive behavior response rate, at 79.62 percent. 4. The segmentation achieved an accuracy of 100 percent, a precision of 100 percent, and a recall of 100 percent. We can conclude that the k-means algorithm is accurate and precise in its calculations. Recommendations Based on the conclusion, the following are the recommendations. 1. Future researchers should collect more behavioral factors, such as coupon redemption and refund rate. 2. Since we used the K-means algorithm, future researchers should utilize different algorithms and behavioral features than those used in this study to conduct behavioral segmentation analysis. 3. Future researchers should examine the changes in segment types and features based on the e-commerce platform used, and the numerous behavioral factors that influence them. 4. The collection of a large amount of data is preferable because it assists in the assessment of the model. 61 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES REFERENCES [1] Media One. 2021. “What is Shopee?” Retrieved October 28, 2021 from https://mediaonemarketing.com.sg/shopee-review-expanding-ecommerce/ [2] Rahman et al. 2018. "Consumer buying behavior towards online shopping: An empirical study on Dhaka city, Bangladesh" Retrieved October 28, 2021 from econstor.eu/bitstream/10419/206108/1/23311975.2018.1514940.pdf [3] Britannica. "Algorithm and Complexity" Retrieved October 28, 2021 from https://www.britannica.com/science/computer-science/Algorithms-and-complexity [4] Investopedia. Business-to-Consumer (B2C). Retrieved October 28, 2021 from https://www.investopedia.com/terms/b/btoc.asp [5] Nikki Jones. "What is Behavioral Segmentation?" Retrieved October 28, 2021 from https://www.yieldify.com/blog/behavioral-segmentation-definition- examples/ [6] Will Kelton. 2021. Business-to-Consumer (B2C). Retrieved October 28, 2021 from https://www.investopedia.com/terms/b/btoc.asp [7] Sauravkaushik8 Kaushik. 2016. “An Introduction to Clustering and different methods of clustering” Retrieved October 28, 2021 from 62 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-anddifferent-methods-of-clustering/ [8] Srilekha Mule. 2021. K-means Clustering. Retrieved October 28, 2021 from https://www.linkedin.com/pulse/k-means-clustering-srilekha-mule [9] Python. What is Python? Executive Summary. Retrieved October 28, 2021 from https://www.python.org/doc/essays/blurb/ [10] Shopee. 2021. "Shopee Careers" Retrieved October 28, 2021 from https://careers.shopee.ph/about [11] UNCTAD. 2017. Unctad B2C E-commerce index 2017, 30. Retrieved October 28, 2021 from http://unctad.org/en/PublicationsLibrary/tn_unctad_ict4d09_en.pdf%0Ahttp://unct ad.org/en/PublicationsLibrary/tn_unctad_ict4d07_en.pdf [Google Scholar] [12] Vicente, J. 2016. Special report: The state of e-commerce in the Philippines. Retrieved October 26, 2021 from http://www.imadigitalmarketer.com/blog/stateecommerceph [13] Segovia, O. W. 2016, October 31. Unfinished business: Why e-commerce in the Philippines is falling behind. Retrieved October 28, 2021 from https://medium.com/startupph/chronicles/unfinished-business-why-ecommerce-inthe-philippines-is-falling-behind-bc6087796bc3 63 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES [14] Babenko, V., Pasmor, M., Pankova, Ju., Sidorov, M.: The place and perspectives of Ukraine in international integration space. Pr. and Persp. in Man. 15(1), 80–92 (2017). doi:10.21511/ppm.15(1).2017.08 [15] Statista Research Department. Topic: E-commerce in the Philippines. Retrieved October 28, 2021, from https://www.statista.com/topics/6539/ecommerce-in-the-philippines/#dossierKeyfigures [16] Vrender. (2016). “Importance of online shopping.” Retrieved November 5, 2021, from http://www.sooperarticles.com/shopping-articles/clothing- articles/importance-online-shopping-1495828.html [Google Scholar] [17] Alexandra Panaitescu and Valentin Radu. 2021. Consumer behavior in marketing - patterns, types, segmentation - Omniconvert . (November 2021). Retrieved December 5, 2021 from https://www.omniconvert.com/blog/consumerbehavior-in-marketing-patterns-types-segmentation/ [18] Urueña, A., & Hidalgo, A. 2016. Successful loyalty in e-complaints: FsQCA and structural equation modeling analyses. Journal of Business Research, 69(4), 1384–1389. [19] Chou, S., & Hsu, C. 2016. Understanding online repurchase intention: Social exchange theory and shopping habit. Information Systems and eBusiness Management, 14(1), 19–45. doi:10.1007/s10257-015-0272-9 64 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES [20] Chechen, L., Pui-Lai, T., Yun-Chi, W., Palvia, P., & Kakhki, M. D. (2016). The impact of presentation mode and product type on online impulse buying decisions. Journal of Electronic Commerce Research, 17(2), 153–168 [21] Gary DeAsi. 2018. 10 Powerful Behavioral Segmentation Methods to Understand Your Customers. Pointillist (2018). Retrieved November 10, 2021 from https://www.pointillist.com/blog/behavioral-segmentation/ [22] Debaditya Barman and Nirmalya Chowdhury. 2019. A novel approach for the customer segmentation using clustering through self-organizing maps. International Journal of Business Analytics 6, 2 (2019), 23–45. DOI: http://dx.doi.org/10.4018/ijban.2019040102 [23] Sara Dolnicar, Bettina Grün , and Friedrich Leisch. 2018. Market Segmentation Analysis: Understanding It, Doing It, and Making It Useful (2018). Retrieved November 8, 2021 from https://books.google.com.ph/books?id=b1lDwAAQBAJ&printsec=frontcover&dq=customer%2Bsegmentation%2B2017& hl=en&sa=X&ved=2ahUKEwjTuZeq9Yj0AhUTw4sBHchkDywQ6AF6BAgIEAI #v=onepage&q=customer%20segmentation%202017&f=false [24] Srivastava, 2016 R. Srivastava. Identification of customer clusters using RFM model: a case of diverse purchaser classification, Int. J. Bus. Anal. Intell., 4 (2) (2016), pp. 45-50 65 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES [25] Nikki Jones. 2017 Behavioral Segmentation Defined with 4 Real-Life Examples. Yieldify (2017). Retrieved November 9, 2021 from https://www.yieldify.com/blog/behavioral-segmentation-definition-examples/ [26] Kotler, P. & Keller, K.L. (2016). Marketing Management, 15th Edition, Pearson Education,Inc. [27] Fahed Yoseph, Nurul Hashimah Ahamed Hassain Malim, and Mohammad AlMalaily. 2019. New behavioral segmentation methods to understand consumers in the retail industry. (February 2019). Retrieved November 9, 2021 from http://ischolar.info/index.php/IJCSIT/article/view/186146 [28] K. Bhade, V. Gulalkari, N. Harwani and S. N. Dhage, "A Systematic Approach to Customer Segmentation and Buyer Targeting for Profit Maximization", 2018 9th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1-6, 2018, [online] Available: https://ieeexplore.ieee.org/document/8494019. [29] Mohammed Muzammil. July 27, 2021. "Understanding K – Means Clustering WIth Customer Segmentation Use Case. Retrieved Nov 9, 2021 from https://www.analyticsvidhya.com/blog/2021/07/understanding-k-meansclustering-using-customer-segmentation/ 66 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES [30] Abhinav Sagar 2019. Customer Segmentation Using K Means Clustering Retrieved Nov 9, 2021 from https://towardsdatascience.com/customer- segmentation-using-k-means-clustering-d33964f238c3 [31] Andrea Trevino. 2016. Introduction to K-means Clustering. Retrieved November 9, 2021 from https://blogs.oracle.com/ai-and- datascience/post/introduction-to-k-means-clustering [32] S. H. Shihab, S. Afroge and S. Z. Mishu, "RFM Based Market Segmentation Approach Using Advanced K-means and Agglomerative Clustering: A Comparative Study," 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), 2019, pp. 1-4, doi: 10.1109/ECACE.2019.8679376. [33] Blut, M., Frennea., C. M., Mittal, V., Mothersbaugh, D. L. 2016. How procedural, financial and relational switching costs affect customer satisfaction, repurchase intentions, and repurchase behavior: A meta-analysis. International Journal of Research in Marketing, 32(2), 226-229. [34] Liro Panula. 2017. Building Trust in e-commerce - Theseus. (2017). Retrieved November 6, 2021, from https://www.theseus.fi/bitstream/handle/10024/133517/Thesis_Iiro_Panula3.pdf?s equence=1&isAllowed=y 67 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES [35] Vitalina Babenko, ZdzisΕaw Kulczyk, Irina Perevosova, Olga Syniavska, and Oksana Davydova. 2019. Factors of the development of international e-commerce under the conditions of globalization. Retrieved November 8, 2021 from https://www.shsconferences.org/articles/shsconf/abs/2019/06/shsconf_m3e22019_04016/shsconf_ m3e22019_04016.html [36] C+R Researchers. SEGMENTATION. 2017. Retrieved TARGET November MARKET 9, (CONSUMER) 2021 from https://www.crresearch.com/methods-quantitative-market-research-segmentation [37] Nilsson & Olsson, Customer Focus through Market Segmentation - The Case of Volvo CE and the Recycling/Waste Management Segment 2016). [38] Ε. Ozan and L. O. Iheme, "Artificial Neural Networks in Customer Segmentation," 2019 27th Signal Processing and Communications Applications Conference (SIU), 2019, pp. 1-4, doi: 10.1109/SIU.2019.8806558. [39] E. Y. L. Nandapala and K. P. N. Jayasena, "The practical approach in Customers segmentation by using the K-Means Algorithm," 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), 2020, pp. 344-349, doi: 10.1109/ICIIS51140.2020.9342639. 68 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES [40] Farid Huseynov & Sevgi Özkan Yıldırım, 2017. "Behavioural segmentation analysis of online consumer audience in Turkey by using real ecommerce transaction data," International Journal of Economics and Business Research, Inderscience Enterprises Ltd, vol. 14(1), pages 12-28. https://ideas.repec.org/a/ids/ijecbr/v14y2017i1p12-28.html [41] Liu, Y., Li, H., Peng, G., Lv, B., Zhang, C. (2015). Online purchaser segmentation and promotion strategy selection: 233, 263-279. [42] Ladhari, R., Gonthier, J., Lajante, M. 2019. Generation Y and online fashion shopping: Orientations and profiles. Journal of Retailing and Consumer Services, 48, 113-121. Retrieved November 20, 2021 from https://ideas.repec.org/a/eee/joreco/v48y2019icp113-121.html [43] Huseynov, F., YΔ±ldΔ±rΔ±m, S. O. 2017. Behavioural segmentation analysis of online consumer audience in Turkey by using real e-commerce transaction data. International Journal of Economics and Business Research, 14, 12-28. [44] Huseynov, Farid, and Sevgi Özkan YΔ±ldΔ±rΔ±m. “Online Consumer Typologies and Their Shopping Behaviors in B2C E-Commerce Platforms.” SAGE Open, Apr. 2019, doi:10.1177/2158244019854639. [45] H. Xu, S. Yao, Q. Li and Z. Ye, "An Improved K-means Clustering Algorithm," (IDAACS-SWS), 2020, pp. 1-5, doi: 10.1109/IDAACS-SWS50031.2020.9297060. 69 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES [46] S. Kapil, M. Chawla and M. D. Ansari, "On K-means data clustering algorithm with genetic algorithm," 2016 (PDGC), 2016, pp. 202-206, doi: 10.1109/PDGC.2016.7913145. [47] S. Na, L. Xumin and G. Yong, "Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm," 2016 Third International Symposium on Intelligent Information Technology and Security Informatics, 2016, pp. 63-67, doi: 10.1109/IITSI.2010.74. [48] J. Qi, Y. Yu, L. Wang and J. Liu, "K*-Means: An Effective and Efficient KMeans Clustering Algorithm," 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialComSustainCom), 2016, pp. 242-249, doi: 10.1109/BDCloud-SocialCom- SustainCom.2016.46. [49] P. Divya and K. Anusudha, "Segmentation of Defected Regions in Leaves using K- Means and OTSU's Method," 2018 4th International Conference on Electrical Energy Systems (ICEES), 2018, pp. 111-115, doi: 10.1109/ICEES.2018.8443282. [50] X. Chen, P. Miao and Q. Bu, "Image Segmentation Algorithm Based on Particle Swarm Optimization with K-means Optimization," 2019 IEEE 70 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES International Conference on Power, Intelligent Computing and Systems (ICPICS), 2019, pp. 156-159, doi: 10.1109/ICPICS47731.2019.8942442. [51] Yi. 2018. Market Segmentation. Retrieved November 26, 2021 from https://www.sciencedirect.com/topics/economics-econometrics-andfinance/market-segmentation 71 Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES APPENDICES Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Appendix 1 The Source Code on how to Apply the K-means Algorithm #Importing the necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import os #Reading the csv file behavior = pd.read_csv("CleanThesisData - ZScores 123456.csv") behavior X = behavior y = behavior['Gender'] genders = X.Gender.value_counts() sns.set_style("darkgrid") plt.figure(figsize=(10,4)) sns.barplot(x=genders.index, y=genders.values) plt.show() Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES # We will use distplot for the distribution of age of the customers, shopping rate, price payment, and product diversity. #Distribution of age plt.figure(figsize=(10, 6)) sns.set(style = 'whitegrid') sns.distplot(X['Age']) plt.title('Distribution of Age', fontsize = 20) plt.xlabel('Range of Age') plt.ylabel('Count') #Distribution of ShoppingRate plt.figure(figsize=(10, 6)) sns.set(style = 'whitegrid') sns.distplot(X['ShoppingRate']) plt.title('Distribution of Shopping Rate', fontsize = 20) plt.xlabel('Range of Shopping Rate') plt.ylabel('Count') #Distribution of PricePayment plt.figure(figsize=(10, 6)) sns.set(style = 'whitegrid') Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES sns.distplot(X['PricePayment']) plt.title('Distribution of Price Payment', fontsize = 20) plt.xlabel('Range of Price Payment') plt.ylabel('Count') #Distribution of ProductDiversity plt.figure(figsize=(10, 6)) sns.set(style = 'whitegrid') sns.distplot(X['ProductDiversity']) plt.title('Distribution of Product Diversity', fontsize = 20) plt.xlabel('Range of Product Diversity') plt.ylabel('Count') from sklearn.preprocessing import LabelEncoder le = LabelEncoder() X['Gender'] = le.fit_transform(X['Gender']) y = le.transform(y) from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X) Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES scaler_features = scaler.transform(X) scaled_data = pd.DataFrame(scaler_features) scaled_data.rename(columns = {0:'Age', 1:'Gender',2:'ShoppingRate',3:'PricePayment',4:'ProductDiversity'}, inplace = True) scaled_data from sklearn.cluster import KMeans cs = [] for i in range(1, 10): kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 200, n_init = 10, random_state = 0) kmeans.fit(scaled_data) cs.append(kmeans.inertia_) plt.plot(range(1, 10), cs) plt.title('The Elbow Method') plt.xlabel('Number of clusters') plt.ylabel('CS') plt.show() Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES # Calculate Silhoutte Score from sklearn.metrics import silhouette_score score = silhouette_score(behavior, kmeans.labels_, metric='euclidean') # Print the score print('Silhouette Score: %.3f' % score) #Silhouette Analysis for 2, 3, 4, 5 Clusters from yellowbrick.cluster import SilhouetteVisualizer fig, ax = plt.subplots(2, 2, figsize=(15,8)) for i in [2, 3, 4, 5]: ''' Create KMeans instance for different number of clusters ''' km = KMeans(n_clusters = i, init = 'k-means++', max_iter = 100, n_init = 10, random_state = 42) q, mod = divmod(i, 2) ''' Create SilhouetteVisualizer instance with KMeans instance Fit the visualizer Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES ''' visualizer = SilhouetteVisualizer(km, colors='yellowbrick', ax=ax[q-1][mod]) visualizer.fit(X) # K Means Clustering from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=4, random_state=42) y = kmeans.fit_predict(behavior) y labels = kmeans.labels_ # check how many of the samples were correctly labeled correct_labels = sum(y == labels) print("Result: %d out of %d samples were correctly labeled." % (correct_labels, y.size)) kmeans.inertia_ centroids = kmeans.cluster_centers_ centroids X = pd.DataFrame(X) behavior['Cluster'] = kmeans.labels_ behavior Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Appendix 2 Source Code of Confusion Matrix #confusion matrix from sklearn.metrics import confusion_matrix,classification_report cm = confusion_matrix(behavior['Cluster'],kmeans.labels_) print('Confusion Matrix:\n \n',cm,'\n \n Classification_Report: \n \n',classification_report(behavior['Cluster'],kmeans.labels_)) #confusion matrix visualization import seaborn as sns import matplotlib.pyplot as plt f, ax=plt.subplots(figsize=(5,5)) sns.heatmap(cm,annot=True,linewidths=0.5,cmap = 'pink',linecolor="gray",fmt=".0f",ax=ax) plt.xlabel("prediction") plt.ylabel("actual") plt.show() Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Appendix 3 Steps in Orange Application for Visualization Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Appendix 4 Questionnaire on Behavioral Segmentation Analysis of Online Consumers The respondents, who are the shopee consumers are requested to answer the following questions: Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CURRICULUM VITAE Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES LAYLO, FRANK B. Sitio Maligaya, Banay-Banay 2nd, San Jose, Batangas Cell No: (+63) 9552297187 Email Address: frank.laylo@g.batstate-u.edu.ph OBJECTIVES • To be able to work at a responsible and challenging entry level that promotes personal growth and utilizes my education and background. • Computer Literate MS office- Word, Excel, PowerPoint • Responsible, Reliable, and can work independently • Excellent communication and interpersonal skills Works efficiently and productively EDUCATIONAL ATTAINMENT : : : : Batangas State University Pinagtongulan National High School Pinagtongulan National High School Pinagtongulan Elementary School August 2018 – Present June 2016 – May 2018 June 2012 – May 2016 June 2006 – April TRAININGS AND SEMINARS ATTENDED • • • • • BITS Synergy Conference “Data Science and AI Congres (October 2018) BITS Synergy Conference “Data Science and AI Congres (March 2019) BITS Synergy Conference “Industry 4.0 Solution Toward Challenges (October 2019) DeveOps for Beginners Ground Gurus (October 10,2020) Introduction to Malware threats (October 22, 2020) I hereby certify that the above information is true and correct to the best of my knowledge and belief. FRANK B. LAYLO Applicant Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CYMBELLYN ATHENA C. REYES 241 Zamora St. Ibabao Cuenca Batangas Cell No: (+63) 9450973875 Email Address: cymbellynathena.reyes@g.batstate-u.edu.ph OBJECTIVES To enhance my capabilities and ability to comply with the demands in the field of Computer Science and gain additional knowledge and pointers as to how all the information learned is applied in real time. SKILLS AND ABILITIES • Able to work under pressure • Capable of accomplishing a task within the given time • Computer Literate MS Office • Good communication skills EDUCATIONAL ATTAINMENT : : : : Batangas State University Lipa City Cuenca Institute Kalayaan Christian School Ibabao Elementary School August 2018 – Present 2015 – 2018 2012 – 2016 2006 – 2012 TRAININGS AND SEMINARS ATTENDED • BITS Synergy Conference “Data Science and AI Congres (October 2018) • • • • BITS Synergy Conference “Data Science and AI Congres (March 2019) BITS Synergy Conference “Industry 4.0 Solution Toward Challenges (October 2019) DevOps for Beginners Ground Gurus (October 10,2020) Introduction to Malware threats (October 22, 2020) I hereby certify that the above information is true and correct to the best of my knowledge and belief. CYMBELLYN ATHENA C. REYES Applicant Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES ROSALES, GAYLE MARIE L. Castillo Padre Garcia, Batangas Cell No: (+63) 9563927044 Email Address: gaylemarie.rosales@g.batstate-u.edu.ph OBJECTIVES Seeking a work environment that will challenge me further, promote quality products and service, and provide me with the opportunity to meet and exceed assigned goals.To attain an internship that I can maximize my knowledge and skills in the field of Computer Science by utilizing my skills in programming and enabling further personal and professional development HIGHLIGHTS OF QUALIFICATION • Works efficiently and productively • Manage Multitasking and can work well even under pressure. • Willing to be trained • Very eager to learn new things and abilities. EDUCATIONAL ATTAINMENT Batangas State University Lipa City, Batangas Holy Trinity School, Padre Garcia, Batangas Venancio Trinidad Sr. Memorial School August 2018 – Present 2015 - 2018 2006-2012 TRAININGS AND SEMINARS ATTENDED • • • • • BITS Synergy Conference “Data Science and AI Congres (October 2018) BITS Synergy Conference “Data Science and AI Congres (March 2019) BITS Synergy Conference “Industry 4.0 Solution Toward Challenges (October 2019) DevOps for Beginners Ground Gurus (October 10,2020) Introduction to Malware threats (October 22, 2020) I hereby certify that the above information is true and correct to the best of my knowledge and belief. GAYLE MARIE L. ROSALES Applicant Republic of the Philippines BATANGAS STATE UNIVERSITY The National Engineering University Lipa Campus Marawoy, Lipa City COLLEGE OF INFORMATICS AND COMPUTING SCIENCES CERTIFICATE OF EDITING OF THESIS/DISSERTATION This is to certify that this Thesis/Dissertation entitled “BEHAVIORAL SEGMENTATION ANALYSIS OF ONLINE CONSUMERS IN SHOPEE BY USING CLUSTERING IN K-MEANS ALGORITHM” of FRANK B. LAYLO, CYMBELLYN ATHENA C. REYES, GAYLE MARIE L. ROSALES in partial fulfillment of the requirements for the degree Bachelor of Science in Computing Science has been reviewed and edited by the undersigned based on the minutes of the Final Defense. It now follows the standard format of the University and conventions of research writing. MARVIN DOMINIC B. BUENA, Ph.D.(cand.), MA, LPT Signature over Printed Name Grammarian/ Editor Date Signed: May 24, 2022