Uploaded by Rohit Mishra

Machine Learning for Business Analytics

advertisement
MLBA – Assignment
Customer Segmenta on using RFM modelling and
Recommenda ons for segment
By
Lokesh Kesiraju
BJ22192
1|Page
Table of Contents
Objec ve: .......................................................................................................................................................................... 3
Customer Segmenta on and Segment selec on .............................................................................................................. 3
Data Transforma on: .................................................................................................................................................... 3
Step 1: ....................................................................................................................................................................... 3
Step2: ........................................................................................................................................................................ 3
Clustering: ..................................................................................................................................................................... 3
Step 3: ....................................................................................................................................................................... 3
Inferences:................................................................................................................................................................. 4
Table 1: ...................................................................................................................................................................... 4
Table 2: ...................................................................................................................................................................... 4
Inferences:..................................................................................................................................................................... 6
Cluster1: .................................................................................................................................................................... 6
Cluster2: .................................................................................................................................................................... 6
Cluster3: .................................................................................................................................................................... 6
Cluster 4: ................................................................................................................................................................... 6
Cluster 5: ................................................................................................................................................................... 6
Cluster selec on:........................................................................................................................................................... 6
Associa on Rules .............................................................................................................................................................. 6
Step 1: ....................................................................................................................................................................... 6
Step2: ........................................................................................................................................................................ 6
Step3: ........................................................................................................................................................................ 6
Step 4: ....................................................................................................................................................................... 6
Inferences:..................................................................................................................................................................... 7
2|Page
Objec ve:
1) To perform customer segmenta on for the data provided
2) Select a segment on the basis of business value
3) Offer insights and Recommenda ons for the segment
Customer Segmenta on and Segment selec on
Data Transforma on:
Dataset provided was transformed into RFM parameters to obtain the dataset (a ached with the document)
The following steps are followed in achieving the data transforma on.
Step 1: Missing Data rec fica on – The current data will be modified to evaluate Recency, frequency and Monetary
parameters. Hence, it becomes important to have data on this measures. For this purpose, Latest Invoice date,
Number of unique invoices, and total basket size for monetary data. If the data on these parameters is not available,
the data was removed.
Step2: The RFM parameters for each customer is summarised (with the help of pivot table)
Clustering:
Step 3:
This data is to be divided into clusters. To obtain the op mal number of clusters, dendogram and elbow method are
used to obtain a star ng point for number of clusters.
Dendogram
Dendogram (constructed using wards method) is shown below for reference
The below dendrogram suggests that the data can be divided into 4 clusters.
Elbow method:
3|Page
Based on the elbow method, 4-6 can be an ideal amount of clusters.
Considering both the factors, 5 is chosen as the ideal cluster number. With this number, K-Means clustering was
performed to obtain 5 clusters.
The informa on of the clusters is summarized in the tables below (Output Image a ached for reference)
Inferences:
(Note: Data file a ached shows 0 – 4 as cluster numbers. These have been shown as 1-5 in the table)
Table 1:
Cluster
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Total consumers
668
2573
942
17
119
Percentage
15.47%
59.57%
21.81%
0.39%
2.76%
Table 2:
Cluster 1
Frequency
Recency
Monetary
Min
2
0
525
Mean
11.0
0.9
3541
Median
10
1
3019
Max
24
6
13375
Cluster 2
Frequency
Recency
Monetary
Min
1
0
0
Mean
3
2
815
Median
2
2
616
Max
9
5
6208
Cluster 3
Frequency
Recency
Monetary
Min
1
6
0
Mean
2
9
438
Median
1
8
300
Max
13
12
7741
Cluster 4
Frequency
Recency
Monetary
Min
14
0
27487
Mean
47
0
44384
Median
45
0
50415
Max
81
1
65892
Cluster 5
Frequency
Recency
Monetary
Min
4
0
1296
Mean
29
0
10954
Median
27
0
9231
Max
89
10
30301
The data has been visualised to present the clusters effec vely
4|Page
5|Page
Inferences:
From the above analysis, below clusters have been obtained
Cluster1:
This cluster broadly contains people whose purchases are very recent, with medium frequency and medium
monetary value
Cluster2:
This cluster broadly contains people whose purchases are medium recent (with spread), with low frequency and low
monetary value. These are your regular shoppers
Cluster3:
This cluster broadly contains people whose purchases are not recent, with low frequency and low monetary value
Cluster 4:
This cluster broadly contains people whose purchases are very recent, with very high frequency and high monetary
value. However, these are few in number
Cluster 5:
This cluster broadly contains people whose purchases are very recent, with very high frequency and medium
monetary value. However, these are also low in number
Cluster selec on:
Given this clusters, it is important to select the target cluster for further analysis. Below ra onale is used for
selec on.
Cluster 3 is eliminated as these are not recent purchases and are low frequency and low monetary value. Among 1,2,
4&5, Clusters 4&5 are already high value (and low in number). Hence, recommending to these folks wouldn’t lead to
too much of improvement.
From the business perspec ve, the value addi on lies in improving the monetary output from cluster 2 as these are
people with low frequency and low monetary value currently. By recommending these people, the business value
can be improved. Hence, cluster 2 will be chosen for finding the Associa on Rules.
Associa on Rules
Steps followed - To arrive at the associa on rules, the following steps are followed
Step 1: Data Modifica on – The transac on details for the customer data was obtained for the elements in the
cluster. For this purpose the merge func on of the pandas library was used (Inner Join was used on Customer ID as
reference column). This provided an output file that contained the details of what each consumer bought
Step2: Conver ng data to enable associa on : For this purpose, one hot encoding of data has been performed on
the Descrip on and customer ID columns. Pivot table was constructed in python and the values are converted into
binary numbers to obtain the encoded data set
Step3: Apriori Algorithm – To compute support: For this purpose, mlxtend library was imported from python and
apriori and associa on func ons were exported. Then, the apriori algorithm was used to calculate the support data
for each individual item and the minimum support was set to 0.05 (by manually observing the data at different
support levels).
Step 4: Associa on rules: Based on the output obtained by using Apriori algorithm (to compute support), the
associa on rules were calculated by se ng the minimum li value to 1
The resul ng rules obtained are tabulated in the next page. For all these rules, the confidence is greater than
support. A total of 22 rules were obtained
6|Page
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Antecedents
PINK REGENCY TEACUP AND SAUCER
BAKING SET SPACEBOY DESIGN
RED HANGING HEART T-LIGHT HOLDER
GREEN REGENCY TEACUP AND SAUCER
GREEN REGENCY TEACUP AND SAUCER
WOODEN STAR CHRISTMAS SCANDINAVIAN
WOODEN HEART CHRISTMAS SCANDINAVIAN
GARDENERS KNEELING PAD CUP OF TEA
ROSES REGENCY TEACUP AND SAUCER
ROSES REGENCY TEACUP AND SAUCER
PAPER CHAIN KIT VINTAGE CHRISTMAS
SET OF 3 REGENCY CAKE TINS
GARDENERS KNEELING PAD KEEP CALM
HEART OF WICKER LARGE
HEART OF WICKER SMALL
PAPER CHAIN KIT 50'S CHRISTMAS
SPOTTY BUNTING
PARTY BUNTING
BAKING SET 9 PIECE RETROSPOT
REGENCY CAKESTAND 3 TIER
WHITE HANGING HEART T-LIGHT HOLDER
REGENCY CAKESTAND 3 TIER
Consequents
GREEN REGENCY TEACUP AND SAUCER
BAKING SET 9 PIECE RETROSPOT
WHITE HANGING HEART T-LIGHT HOLDER
PINK REGENCY TEACUP AND SAUCER
ROSES REGENCY TEACUP AND SAUCER
WOODEN HEART CHRISTMAS SCANDINAVIAN
WOODEN STAR CHRISTMAS SCANDINAVIAN
GARDENERS KNEELING PAD KEEP CALM
GREEN REGENCY TEACUP AND SAUCER
REGENCY CAKESTAND 3 TIER
PAPER CHAIN KIT 50'S CHRISTMAS
REGENCY CAKESTAND 3 TIER
GARDENERS KNEELING PAD CUP OF TEA
HEART OF WICKER SMALL
HEART OF WICKER LARGE
PAPER CHAIN KIT VINTAGE CHRISTMAS
PARTY BUNTING
SPOTTY BUNTING
BAKING SET SPACEBOY DESIGN
SET OF 3 REGENCY CAKE TINS
RED HANGING HEART T-LIGHT HOLDER
ROSES REGENCY TEACUP AND SAUCER
Support
0.05
0.05
0.05
0.05
0.05
0.06
0.06
0.05
0.05
0.05
0.06
0.06
0.05
0.05
0.05
0.06
0.05
0.05
0.05
0.06
0.05
0.05
Confidence
0.92
0.84
0.80
0.77
0.76
0.74
0.74
0.73
0.71
0.71
0.64
0.59
0.59
0.57
0.47
0.45
0.45
0.40
0.39
0.33
0.32
0.30
Lift
13.94
6.45
4.75
13.94
10.79
9.32
9.32
8.29
10.79
4.29
4.65
3.55
8.29
4.97
4.97
4.65
3.54
3.54
6.45
3.55
4.75
4.29
The rules are arranged in decreasing order of confidence for ease of reading.
Inferences:
When the data is analyzed, the effect of me has been ignored. Effec vely, we are trying to find recommenda ons to the buyer depending on what similar people have
bought in the past year.
A lot of rules are present in the Teacup and saucer segment, where different colored products are recommended with high li and high confidence. Some recommenda ons
are present in the Christmas products category where a star is recommended for people who bought hearts (and vice versa).
An interesting recommendation is to suggest a cake tin and teacup and saucer if a cake stand is bought.
These recommendations can help in improving the overall buying experience of the customers.
7|Page
Download