Machine Learning for Customer Segmentation Project Proposal

A Project Proposal submitted for the partial fulfillment of the
requirements of the Advanced Diploma in Data Science (Part time)
COADDS191P -009
Independent Research Project
Advanced Diploma in Data Science
National Institute of Business Management
Colombo, Sri Lanka
4th October 2020
In this paper, based on a data sample from a UK-based non-store online retail, author
identify that are important of Customer segmentation and market analytics Further,
author tried to find out the Identifying potential customers and their unsatisfied
customer needs this enables marketers to create targeted marketing messages for a
specific group of customers which increases the chances of the person buying a
Key words – Customer Segmentation, Market Analytics
Table of Contents
Chapter 1: Introduction .................................................................................................. 5
Background ...................................................................................................... 5
Research Problem ............................................................................................ 6
1.3 Objective of the Project ..................................................................................... 6
1.4 Scope of the Project ........................................................................................... 7
1.5 Justification of Research .................................................................................... 7
1.6 Expected Limitations ......................................................................................... 8
1.7 Proposed Work Schedule .................................................................................. 8
Chapter 2: Literature Review ......................................................................................... 9
Introduction to the research theme .................................................................. 9
Theoretical explanation about the Key Words in the Topic ............................ 9
Findings by other researchers ........................................................................ 10
The research gap ............................................................................................ 11
Table for Variables, their definitions and sources ......................................... 12
Chapter conclusion......................................................................................... 12
Chapter 3: Methodology .............................................................................................. 13
Introduction .................................................................................................... 13
Population, sample and Sampling technique .................................................. 12
Type of Data to be collected and data sources ................................................ 13
Data collection tools and plan ......................................................................... 13
Conceptual framework .................................................................................... 14
Hypothesis....................................................................................................... 14
3.7 Methods of Data Analysis ................................................................................ 15
Chapter 1: Introduction
Over the years, the commercial world is becoming more competitive, as such
organizations have to satisfy the needs and wants of their customers, attract
new customers, and hence enhance their businesses. In the Business sector, the
various chain of trading’s generating a large amount of data. This data is
generated on a daily basis or monthly basis across the stores. This extensive
database of customers transactions needs to analyze for designing profitable
strategies. All customers have different kind of taste and needs. With the
increase in customer base and transaction, it is not easy to understand the
requirement of each customer. Identifying potential customers can improve the
marketing campaign, which ultimately increases the sales. Segmentation can
play a better role in grouping those customers into various segments.
The task of identifying and satisfying the needs and wants of each customer in
a business is a very complex task. This is because customers may be different
in their needs, wants, demography, geography, tastes and preferences,
behaviors and so on. As such, it is a wrong practice to treat all the customers
equally in business. This challenge has motivated the adoption of the idea of
customer segmentation or market segmentation, in which the customers are
subdivided into smaller groups or segments wherein members of each segment
show similar market behaviors or characteristics.
Research Problem
When we find similar characteristics in each customer’s behavior and needs.
Then, those are generalized into groups to satisfy demands with various
strategies and those strategies can be an input of the Targeted marketing
activities to specific groups Launch of features aligning with the customer
demand, Development of the product roadmap. As we know in traditional
method, we have to compare the existing customer data and the general
population data in some way to deduce a relationship between them. A manual
way of doing this is to compare the statistics between the customers and the
general population. For example, the mean and standard deviation of age can
be compared to determine which age group is more likely to be a customer or
the salaries can be compared to see what group of people fall into customers,
etc. But this analysis would give out many results which again have to be
analyses to come up with a final strategy. This process will require a lot of
time, and by the time this analysis completes, the competitor in the market will
capture most of the population, and the company will be out of business.
Objective of the Project
Identifying potential customers and their unsatisfied customer needs this
enables marketers to create targeted marketing messages for a specific group
of customers which increases the chances of the person buying a produc
Scope of the Project
The scope of the project Develop customized marketing campaigns, design an
optimal distribution strategy, choose specific product features for deployment,
Prioritize new product development efforts of business.
Justification of Research
Segmentation allows businesses to make better use of their marketing budgets,
gain a competitive edge over companies and, importantly, demonstrate a better
knowledge of your customers’ needs and wants. It helps to
Improve Marketing efficiency - Breaking down a large customer base
into more manageable pieces, making it easier to identify your target
audience and launch campaigns to the most relevant people, using the
most relevant channel.
Identify new market opportunities - During the process of grouping
your customers into clusters, you may find that you have identified a
new market segment, which could in turn alter your marketing focus
and strategy to fit.
Better brand strategy - Once you have identified the key motivators for
your customer, such as design or price or practical needs, you can
brand your products appropriately.
Improve distribution strategies - Identifying where customers shop and
when can informatively shape product distributions strategies, such as
what type of products are sold at particular outlets.
Customer retention – Using segmentation, marketers can identify
groups that require extra attention and those that churn quick, along
with customers with the highest potential value. It can also help with
creating targeted strategies that capture your customers’ attention and
create positive, high-value experiences with your brands.
Expected Limitations
The main barrier to the project was obtaining the datasets. There was a limited
number of datasets were conducted based on the real-world data for approval
of Customers.
Proposed work schedule
Chapter 2: Literature Review
2.1 Introduction to the research theme
Customer segmentation, refers to the process of dividing a market into different
buyers with different behavior’s, characteristics. Customer segmentation refers to a
way of dividing according to different characteristics of consumer groups. This theory
proposes to study and predict the future consumption trend of customers in the way of
segmentation of customer information and consumption behavior, as well as the profit
market planning of enterprises.
2.2 Theoretical explanation about the Key Words in the
2.2.1 Customer Segmentation
The process of grouping customers into sections of individuals who share common
characteristics is called Customer Segmentation.
2.2.2 Market analytics
Marketing analytics is customer lifecycle analytics. It is to conduct data analysis
around consumers to generate insights to guide marketing activities. Specifically, it
includes analyses such as market segmentation, consumer lifetime value analysis,
acquiring new customers, maintaining old customers, and enhancing customer
2.3 Findings by other researchers
The literature review is used to identify the conclusions of previous researches on
factors on Customer Segmentation of past literature will help to develop a framework
for the new research.
In 2015 Chinedu Pascal Ezenkwu, Simeon Ozuomba, Constance kalu came up with
their finding Application of K-Means Algorithm for Efficient Customer
Segmentation: A Strategy for Targeted Customer Services they found K means
algorithm has a purity measure of 0.95 indicating 95% accurate segmentation of the
customers. Insight into the business’s customer segmentation will avail it with the
following advantages: the ability of the business to customize market programs that
will be suitable for each of its customer segments; business decision support in terms
of risky situations such as credit relationship with its customers; identification of
products associated with each segments and how to manage the forces of demand and
supply; unravelling some latent dependencies and associations amongst customers,
amongst products, or between customers and products which the business may not be
aware of; ability to predict customer defection and which customers are most likely to
defect; and raising further market research questions as well as providing directions to
finding the solutions.
In April 2019 Balmeet Kaur, Pankaj Kumar Sharma came up with their finding
Implementation of Customer Segmentation using Integrated Approach in them study
they found in competitive market of e-commerce, the problem of identifying potential
customer is gaining more and more attention. To address this problem timely, this
paper proposes a study on integrated novel approach based on clustering using Kmeans and associative mining using Apriority technique. After identification of
targeted customers and their associative buying pattern, the business managers take
the strategic profitable decisions accordingly. This integrated model could be directly
brought into implementation for providing better profitable margins from sales.
2.4 The research gap
Customer segmentation based on stream clustering provides an ongoing picture of the
makeup of the customer base. It also indicates the value that different customer
groups have for the company and shows where increased marketing activities may be
worthwhile. It is not limited to the retail/e-commerce field, of course. It can be
applied in other sectors, too.
It allows companies to lay the foundation for targeted marketing campaigns aimed,
for instance, at rewarding loyal customers, preventing defections or gaining new
customers. The segmentation also helps a company select the right communication
channels. If the intention is to target online shoppers, for example, campaigns using
social media and email are preferable to expensive direct mailings. In other words,
this new approach to customer segmentation allows companies to reach the desired
customers with the right messages via the right channels, which also improves the
customer experience. And these benefits are achieved continuously, because updating
the clusters on an ongoing basis using the streams eliminates the key disadvantage of
traditional customer segmentation.
2.5 Table for Variables, their definitions and sources
InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely
assigned to each transaction. If this code starts with the letter 'c', it indicates a
StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely
assigned to each distinct product.
Description: Product (item) name. Nominal.
Quantity: The quantities of each product (item) per transaction. Numeric.
InvoiceDate: Invoice date and time. Numeric. The day and time when a
transaction was generated.
UnitPrice: Unit price. Numeric. Product price per unit in sterling (£).
CustomerID: Customer number. Nominal. A 5-digit integral number uniquely
assigned to each customer.
Country: Country name. Nominal. The name of the country where a customer
2.6 Chapter conclusion
The chapter, literature review concludes of previous researches on factors significant
on customer segmentation. Based on the analysis of past literature the variables were
extracted for the present study. Furthermore, new factors were included in the
framework which will be helpful to study with regards to the customer segmentation
and market analytics.
Chapter 3: Methodology
3.1 Introduction
This chapter sets out the research process and methods of analysis in order to identify
factors on Customer Segmentation and Market Analytics. Furthermore, chapter
comprises the research design, study sample, data collection methods and data
analysis plan.
3.2 Population, sample and Sampling technique
The Sample data are drawn from 01/12/2010 and 09/12/2011 for a UK-based and
registered non-store online retail.
3.3 Type of Data to be collected and data sources
This is a transnational data set which contains all the transactions occurring between
01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The
company mainly sells unique all-occasion gifts. Many customers of the company are
Data source: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II#
3.4 Data collection tools and plan
The study is conducting based on secondary data which was stored in online retail
store, required data fields were extracted from the original data source. The selected
fields were stored in a tabular format for the convenience of the study.
3.5 Conceptual framework
3.6 Hypothesis
Developing a hypothesis is necessary as the hypothesis will guide us decisions on
how to formulate the data in such a way to cluster customers. For the orders, our
hypothesis is that online purchase BLUE DIAMANTE PEN IN GIFT BOX based on
features such as big or small and price tier (high/premium or low/affordable).
Although we will use giftbox model to cluster on, the giftbox model features (e.g.
price, category, etc.) will be used for assessing the preferences of the customer
3.7 Methods of data analysis
3.7.1 Exploratory Data Analysis (EDA)
Exploratory data analysis is an approach to analyzing data sets to summarize their
main characteristics, often with visual methods. A statistical model can be used or not,
but primarily EDA is for seeing what the data can tell us beyond the formal modeling
or hypothesis testing task.
3.7.2 Cohort analysis
Cohort analysis is a type of behavioral analytics in which you group your users based
on their shared traits to better track and understand their actions. Cohort analysis
allows you to ask more specific, targeted questions and make informed product
decisions that will reduce churn and drastically increase revenue.
3.7.3 Funnel analysis
A funnel analysis is a method of understanding the steps required to reach an outcome
on a website and how many users get through each of those steps. The set of steps is
referred to as a “funnel” because the typical shape visualizing the flow of users is
similar to a funnel in your kitchen or garage.
3.7.3 Market basket analysis
Market basket analysis is a data mining technique used by retailers to increase sales
by better understanding customer purchasing patterns. It involves analyzing large data
sets, such as purchase history, to reveal product groupings, as well as products that are
likely to be purchased together.
3.7.4 Recency, Frequency and Monetary Value
Recency, frequency, monetary value is a marketing analysis tool used to identify a
company's or an organization's best customers by using certain measures. The RFM
model is based on three quantitative factors: Recency: How recently a customer has
made a purchase.
3.7.5 K Means clustering
K-means clustering is a type of unsupervised learning, which is used when you have
unlabeled data (i.e., data without defined categories or groups). The goal of this
algorithm is to find groups in the data, with the number of groups represented by the
variable K. The algorithm works iteratively to assign each data point to one of K
groups based on the features that are provided. Data points are clustered based on
feature similarity.
