MLBA – Assignment Customer Segmenta on using RFM modelling and Recommenda ons for segment By Lokesh Kesiraju BJ22192 1|Page Table of Contents Objec ve: .......................................................................................................................................................................... 3 Customer Segmenta on and Segment selec on .............................................................................................................. 3 Data Transforma on: .................................................................................................................................................... 3 Step 1: ....................................................................................................................................................................... 3 Step2: ........................................................................................................................................................................ 3 Clustering: ..................................................................................................................................................................... 3 Step 3: ....................................................................................................................................................................... 3 Inferences:................................................................................................................................................................. 4 Table 1: ...................................................................................................................................................................... 4 Table 2: ...................................................................................................................................................................... 4 Inferences:..................................................................................................................................................................... 6 Cluster1: .................................................................................................................................................................... 6 Cluster2: .................................................................................................................................................................... 6 Cluster3: .................................................................................................................................................................... 6 Cluster 4: ................................................................................................................................................................... 6 Cluster 5: ................................................................................................................................................................... 6 Cluster selec on:........................................................................................................................................................... 6 Associa on Rules .............................................................................................................................................................. 6 Step 1: ....................................................................................................................................................................... 6 Step2: ........................................................................................................................................................................ 6 Step3: ........................................................................................................................................................................ 6 Step 4: ....................................................................................................................................................................... 6 Inferences:..................................................................................................................................................................... 7 2|Page Objec ve: 1) To perform customer segmenta on for the data provided 2) Select a segment on the basis of business value 3) Offer insights and Recommenda ons for the segment Customer Segmenta on and Segment selec on Data Transforma on: Dataset provided was transformed into RFM parameters to obtain the dataset (a ached with the document) The following steps are followed in achieving the data transforma on. Step 1: Missing Data rec fica on – The current data will be modified to evaluate Recency, frequency and Monetary parameters. Hence, it becomes important to have data on this measures. For this purpose, Latest Invoice date, Number of unique invoices, and total basket size for monetary data. If the data on these parameters is not available, the data was removed. Step2: The RFM parameters for each customer is summarised (with the help of pivot table) Clustering: Step 3: This data is to be divided into clusters. To obtain the op mal number of clusters, dendogram and elbow method are used to obtain a star ng point for number of clusters. Dendogram Dendogram (constructed using wards method) is shown below for reference The below dendrogram suggests that the data can be divided into 4 clusters. Elbow method: 3|Page Based on the elbow method, 4-6 can be an ideal amount of clusters. Considering both the factors, 5 is chosen as the ideal cluster number. With this number, K-Means clustering was performed to obtain 5 clusters. The informa on of the clusters is summarized in the tables below (Output Image a ached for reference) Inferences: (Note: Data file a ached shows 0 – 4 as cluster numbers. These have been shown as 1-5 in the table) Table 1: Cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Total consumers 668 2573 942 17 119 Percentage 15.47% 59.57% 21.81% 0.39% 2.76% Table 2: Cluster 1 Frequency Recency Monetary Min 2 0 525 Mean 11.0 0.9 3541 Median 10 1 3019 Max 24 6 13375 Cluster 2 Frequency Recency Monetary Min 1 0 0 Mean 3 2 815 Median 2 2 616 Max 9 5 6208 Cluster 3 Frequency Recency Monetary Min 1 6 0 Mean 2 9 438 Median 1 8 300 Max 13 12 7741 Cluster 4 Frequency Recency Monetary Min 14 0 27487 Mean 47 0 44384 Median 45 0 50415 Max 81 1 65892 Cluster 5 Frequency Recency Monetary Min 4 0 1296 Mean 29 0 10954 Median 27 0 9231 Max 89 10 30301 The data has been visualised to present the clusters effec vely 4|Page 5|Page Inferences: From the above analysis, below clusters have been obtained Cluster1: This cluster broadly contains people whose purchases are very recent, with medium frequency and medium monetary value Cluster2: This cluster broadly contains people whose purchases are medium recent (with spread), with low frequency and low monetary value. These are your regular shoppers Cluster3: This cluster broadly contains people whose purchases are not recent, with low frequency and low monetary value Cluster 4: This cluster broadly contains people whose purchases are very recent, with very high frequency and high monetary value. However, these are few in number Cluster 5: This cluster broadly contains people whose purchases are very recent, with very high frequency and medium monetary value. However, these are also low in number Cluster selec on: Given this clusters, it is important to select the target cluster for further analysis. Below ra onale is used for selec on. Cluster 3 is eliminated as these are not recent purchases and are low frequency and low monetary value. Among 1,2, 4&5, Clusters 4&5 are already high value (and low in number). Hence, recommending to these folks wouldn’t lead to too much of improvement. From the business perspec ve, the value addi on lies in improving the monetary output from cluster 2 as these are people with low frequency and low monetary value currently. By recommending these people, the business value can be improved. Hence, cluster 2 will be chosen for finding the Associa on Rules. Associa on Rules Steps followed - To arrive at the associa on rules, the following steps are followed Step 1: Data Modifica on – The transac on details for the customer data was obtained for the elements in the cluster. For this purpose the merge func on of the pandas library was used (Inner Join was used on Customer ID as reference column). This provided an output file that contained the details of what each consumer bought Step2: Conver ng data to enable associa on : For this purpose, one hot encoding of data has been performed on the Descrip on and customer ID columns. Pivot table was constructed in python and the values are converted into binary numbers to obtain the encoded data set Step3: Apriori Algorithm – To compute support: For this purpose, mlxtend library was imported from python and apriori and associa on func ons were exported. Then, the apriori algorithm was used to calculate the support data for each individual item and the minimum support was set to 0.05 (by manually observing the data at different support levels). Step 4: Associa on rules: Based on the output obtained by using Apriori algorithm (to compute support), the associa on rules were calculated by se ng the minimum li value to 1 The resul ng rules obtained are tabulated in the next page. For all these rules, the confidence is greater than support. A total of 22 rules were obtained 6|Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Antecedents PINK REGENCY TEACUP AND SAUCER BAKING SET SPACEBOY DESIGN RED HANGING HEART T-LIGHT HOLDER GREEN REGENCY TEACUP AND SAUCER GREEN REGENCY TEACUP AND SAUCER WOODEN STAR CHRISTMAS SCANDINAVIAN WOODEN HEART CHRISTMAS SCANDINAVIAN GARDENERS KNEELING PAD CUP OF TEA ROSES REGENCY TEACUP AND SAUCER ROSES REGENCY TEACUP AND SAUCER PAPER CHAIN KIT VINTAGE CHRISTMAS SET OF 3 REGENCY CAKE TINS GARDENERS KNEELING PAD KEEP CALM HEART OF WICKER LARGE HEART OF WICKER SMALL PAPER CHAIN KIT 50'S CHRISTMAS SPOTTY BUNTING PARTY BUNTING BAKING SET 9 PIECE RETROSPOT REGENCY CAKESTAND 3 TIER WHITE HANGING HEART T-LIGHT HOLDER REGENCY CAKESTAND 3 TIER Consequents GREEN REGENCY TEACUP AND SAUCER BAKING SET 9 PIECE RETROSPOT WHITE HANGING HEART T-LIGHT HOLDER PINK REGENCY TEACUP AND SAUCER ROSES REGENCY TEACUP AND SAUCER WOODEN HEART CHRISTMAS SCANDINAVIAN WOODEN STAR CHRISTMAS SCANDINAVIAN GARDENERS KNEELING PAD KEEP CALM GREEN REGENCY TEACUP AND SAUCER REGENCY CAKESTAND 3 TIER PAPER CHAIN KIT 50'S CHRISTMAS REGENCY CAKESTAND 3 TIER GARDENERS KNEELING PAD CUP OF TEA HEART OF WICKER SMALL HEART OF WICKER LARGE PAPER CHAIN KIT VINTAGE CHRISTMAS PARTY BUNTING SPOTTY BUNTING BAKING SET SPACEBOY DESIGN SET OF 3 REGENCY CAKE TINS RED HANGING HEART T-LIGHT HOLDER ROSES REGENCY TEACUP AND SAUCER Support 0.05 0.05 0.05 0.05 0.05 0.06 0.06 0.05 0.05 0.05 0.06 0.06 0.05 0.05 0.05 0.06 0.05 0.05 0.05 0.06 0.05 0.05 Confidence 0.92 0.84 0.80 0.77 0.76 0.74 0.74 0.73 0.71 0.71 0.64 0.59 0.59 0.57 0.47 0.45 0.45 0.40 0.39 0.33 0.32 0.30 Lift 13.94 6.45 4.75 13.94 10.79 9.32 9.32 8.29 10.79 4.29 4.65 3.55 8.29 4.97 4.97 4.65 3.54 3.54 6.45 3.55 4.75 4.29 The rules are arranged in decreasing order of confidence for ease of reading. Inferences: When the data is analyzed, the effect of me has been ignored. Effec vely, we are trying to find recommenda ons to the buyer depending on what similar people have bought in the past year. A lot of rules are present in the Teacup and saucer segment, where different colored products are recommended with high li and high confidence. Some recommenda ons are present in the Christmas products category where a star is recommended for people who bought hearts (and vice versa). An interesting recommendation is to suggest a cake tin and teacup and saucer if a cake stand is bought. These recommendations can help in improving the overall buying experience of the customers. 7|Page