DATA DRIVEN APPROACH FOR CONSUMERS’ REPURCHASE OF PRODUCTS [Document subtitle] [DATE] [COMPANY NAME] [Company address] 1 Table of Contents Abstract ........................................................................................................................................... 5 Chapter 1 Introduction ................................................................................................................... 6 1.1 Background....................................................................................................................... 6 1.1.2 Competitive Strategy ...................................................................................................... 6 1.2 Problem Statement ............................................................................................................... 7 1.3 Research Motivation ............................................................................................................. 7 1.3.1 Theoretical bodywork ..................................................................................................... 7 1.3.2 Business practices ........................................................................................................... 7 1.4 Research Problem ................................................................................................................. 7 1.5 Research Objective ................................................................................................................ 8 1.6 Summary of Research Methodology ..................................................................................... 8 1.7 Limitations and Main Assumptions of the Research............................................................. 8 1.7.1 Limitations of the Research ............................................................................................ 8 1.7.2 Main Assumptions .......................................................................................................... 9 1.8 Structure of Dissertation ....................................................................................................... 9 Chapter 2 Literature Review ........................................................................................................... 9 2.1.1 Customer Relationship Management............................................................................. 9 2.1.2 Customer Satisfaction, Loyalty and Retention ............................................................. 10 2.2 Customer Modelling Behavior ............................................................................................ 10 2.2.2 Traditional Modelling ................................................................................................... 11 2.3 Data mining and Customer churn ....................................................................................... 11 2.4 Agent Based Modelling and Simulation (ABMS) ................................................................. 12 2.5 Related works ...................................................................................................................... 12 2 Chapter 3 Methodology ................................................................................................................ 18 3.1 Data Methodology .............................................................................................................. 18 3.2 Dataset Description ............................................................................................................. 20 3.3 Selection tool....................................................................................................................... 20 3.4 Binary classification ............................................................................................................. 21 3.5 Data Mining Techniques...................................................................................................... 21 3.5.1 Decision Trees (DT) ....................................................................................................... 21 3.5.2 Random Forest (RF) ...................................................................................................... 21 3.5.3 Naïve Bayes (NB)........................................................................................................... 22 3.5.4 Support Vector Machines (SVM) .................................................................................. 22 3.6 Performance metrics ........................................................................................................... 23 Chapter 4 Analysis, Interpretation and Discussion ....................................................................... 26 4.1 Data exploration:................................................................................................................. 26 4.2 Pre-processing: .................................................................................................................... 28 4.3 Feature Selection ................................................................................................................ 28 ............................................................................................................................................... 28 Figure 4.3 showing the features ranking based on chi_square............................................. 28 4.4 Data modelling .................................................................................................................... 28 4.4.1 Method 1 ...................................................................................................................... 28 Table 4.1 Summary of Classification Report of machine learning methods (See Appendix A for raw data). ......................................................................................................................... 29 4.4.2 Method 2 ...................................................................................................................... 31 Table 4.2 Summary of Classification Report of machine learning and statistical methods (See Appendix B for raw data). ...................................................................................................... 32 3 4.5 Discussion ............................................................................................................................ 33 Chapter 5 Conclusion .................................................................................................................... 34 5.1 Conclusion ........................................................................................................................... 34 5.2 Recommendation ................................................................................................................ 34 Bibliography .................................................................................................................................. 35 Appendix ....................................................................................................................................... 41 4 Abstract In 2020, e-commerce retail sales globally accumulated to $4.28 trillion and the e- retail sales are expected to reach $5.4 trillion in 2022 and $6.4 trillion in 204 (Retail E- Commerce Sales Worldwide from 2014 to 2024, 2021). Online shoppers leave more footprints than ever; large sums of personal information, click stream data are usually collected and stored during each online visit for analysis with data mining tools. The Cross industry Standard Process for Data Mining methodology (CRISP-DM) was used in the analysis for this research work. By using clickstream data of an online store, various modelling techniques were considered for data analysis to identify the customers likely to purchase. The modelling techniques included: naïve bayes, decision trees, random forest and support vector machines. The results showed that support vector machine was the best in determining the customers tendency to repurchase. 5 Chapter 1 Introduction 1.1 Background Due to the modern – day information-based economy, e-commerce is achieving more popularity of the retail industry. The most popular kind of e-commerce being the Business-to- Customer trade, usually reflected as online shopping stores, replacing physical outlets rapidly (Suchacka and Chodak, 2016). In 2020, e-commerce retail sales globally accumulated to $4.28 trillion and the e- retail sales are expected to reach $5.4 trillion in 2022 and $6.4 trillion in 2024 (Retail ECommerce Sales Worldwide from 2014 to 2024, 2021). Further data report shows that in recent years, online market activities have created $145.1 billion in economic value to small organizations, involving $59 billion in profit from helping organizations penetrate to more consumers, $37 billion in additional brand exposure and almost $30 billion in cost savings (How Emerging Marketplaces Are Simplifying (And Gamifying) Online Shopping, 2020). 1.1.2 Competitive Strategy Hoe and Shanseen (2018) explained that for an organization to continue to exist as a going entity, the organization needs to develop and adapt its competitive strategy mix to navigate the constant changing attributes in the market space. Competitive advantage is crucial for a firm’s survival and long-term growth in the business market. Understanding the systematic structure is therefore of utmost importance for the long-term operations and growth. They also explained that an organization can under some circumstances win market share without much difficulty through focus on customer satisfaction to enable customer retention as one of its competitive advantages. The strategies involved will definitely factor a target market audience and an organization strengths and weaknesses. Ma (1999) noted that competitive advantages can be achieved when organizations can develop and successfully enforce a strategy that is not implemented by their fellow competitors. Clulow et al (2003) noted organizations can sustain competitive advantage when they consistently provide value to their consumers. Information gotten from customer behavioural analyses can aid in customer satisfaction and retention. Customer behavioural analyses can be traced to e-commerce beginning (Bellman, Lohse and Johnson, 1999) and has various modern applications. Examples of these analyses include product referrals for consumers (Y.tan, Xu, and Liu, 2016; Hidasi et al., 2016), the arrangement of consumers into distinct groups such as visitors, buyers etc. (Moe, 2003; Fajta, 2014), forecasting the chances of purchase to offer better service to customers that are likely to purchase (Lo, Frankowski and Leskovec, 2014; Korpusik et al., 2016; Suchacka and Templewski, 2017; Lang and Rettenmeier, 2017) and immediate awareness of customer churn in order to stop the churn (Castanedo et al., 2014). This research work shares similarities to customer churn prediction. Customer propensity to purchase and prevention of customer churn can be attained through display of product and service offerings. Various data mining methods are available to predict customer actions from collected data. These data mining methods include but not limited to logistic regression, support vector machines and decision trees. 6 1.2 Problem Statement The quick growth of e-commerce has changed the shopping landscape and the traditional sellerbuyer relationships. The transformation brings about its own challenges that modern organizations need to face. The challenges include the need for businesses to gain a competitive edge amidst a volatile interaction process between sellers and consumers since consumers and their particular needs or wants are no more personally known, yielding less loyal consumers. Importantly, acquiring customers, building trust, and retaining has become a major issue in modern e-commerce activities (Nakayama, 2009; Salehi et al., 2012). Online shoppers leave more footprints than ever; large sums of personal information, click stream data are usually collected and stored during each online visit for analysis with data mining tools. The information gotten from the results of these analyses can increase customer satisfaction. This increases the efficiency of the shopping process, making it more engaging and increasingly personalized (Magrabi, 2016). This helps in the long run to gain competitive edge from high consumer turnover rate and increased sales (Hop, 2013; Suchacka and Chodak, 2016). 1.3 Research Motivation 1.3.1 Theoretical bodywork Despite the broad nature of machine learning techniques (data mining techniques), many of the business research works have tended to focus on customer churn or order prediction in the telecommunication industry and banking (Dias, Godinho and Torres 2020; Elslamony 2014; He et al., 2014; Moro, Cortez and Laureano 2014; Raju et al., 2014; Tounsi, Hassouni and Anoun 2017;), or the personal information of the customers in various industries. This work intends to bridge the literature gap for machine learning research works for the e-commerce retailing industry using the clickstream datasets. Furthermore, clickstream data majorly represents the actual behaviour of a user or users more than any other category of dataset. 1.3.2 Business practices This research work intends to increase the focus of business of customer relationship management. Coston (2020) categorized customer experience journey into 5 phases: Awareness, Consideration, Purchase, Retention and Advocacy. In this work, the attempt is to focus on the purchase and retention stage which involves the activities that likely predict if a consumer is to buy a product, service and product offerings to retain a consumer. This will help increase revenue, reduce cost through targeted marketing and overall increase value of a company through profit, brand value etc. 1.4 Research Problem Defining whether a customer will make an order for a product during an online visit is a major issue in global e-commerce. An organization will strive to gain a competitive edge through customer retention over other organization through the use of incentive offerings, special services e.g., same day delivery and promotions to customers to convince them to make a purchase. This research work aims to utilize different data mining models from customer 7 activities on an online store to know the consumer with the better propensity to purchase a product. This prompts the research question: 1. How can a customer be classified into a purchasing or a non–purchasing customer by using the customer activities (clickstream data) on an online store? This original question leads to further questions: 2. How exploratory analysis of data can aid with information to solve the prediction problem? 3. What machine learning techniques gives the best answers from the prediction problem? 1.5 Research Objective In accordance with the research question, the objective is to find a predictive pattern for binary classification of customers based on different data mining techniques. Furthermore, the lesser objectives will include the use of data exploratory process in the prediction problem and the best machine learning techniques for the prediction problem. 1.6 Summary of Research Methodology A singular research methodology was chosen for this work. The Cross industry Standard Process for Data Mining methodology (CRISP-DM) was used in the analyses for this research work. CRISPDM is a structure that highlights the major processes that are taken in a data mining work. The methodology makes use of six stages: business understanding, data understanding, data preparation, data modelling, data evaluation, evaluation and deployment. The research makes use of secondary data. The dataset will be used to answer the research questions. Data modelling techniques such as Decision trees (DT), Random Forest (RF), Naïve Bayes and Support Vector Machine were identified through literature research. The data modelling techniques were executed through python and performance metrics such as accuracy and Area Under Curve were considered. 1.7 Limitations and Main Assumptions of the Research 1.7.1 Limitations of the Research Due to limitations of resources in acquiring state of the art computer systems, the analysis cannot be run on a server, large dataset cannot be used. Large dataset involves use of large memory base and high computational capacity. Hence, a smaller dataset to be used comprises two files and the combination gives 607,056 rows and 25columns. However, this dataset is a good example to make a good research as shown at Customer propensity to purchase dataset | Kaggle. The datasets represent customers visits to a fictional website for a particular day. It does not capture larger patterns of customer behaviour. Therefore, further research is needed to analyse customer behaviours over longer period of time. The methodology employs the CRISP-DM 8 approach. This approach is relatively old but still the best-known method for data analysis of business-related questions as shown on www.kdnuggets.com. 1.7.2 Main Assumptions The main assumption of this research is that data modelling technique can be used to provide insights into solving the problem of customers’ propensity to purchase a product from an online store. 1.8 Structure of Dissertation The study begins with the Chapter 1 which includes introduction and background of the study, problem statement, research motivation, research problem, research objectives, summary of the research methodology to be used, the limitations and the main assumptions of the research. The Chapter 2 involves the literature review of the research topic, where the main theories, statements, ideologies are covered. Related works of other researchers that highlights different data modelling techniques and the performance metrics are also covered. The Chapter 3 discusses the data methodology of the research work. The Chapter 4 involves the research question analysis, interpretation and discussion of results. The Chapter 5 includes the conclusions of this research work and recommends further questions that can be answered into other researches. Chapter 2 Literature Review 2.1.1 Customer Relationship Management Before modern market systems, organizations and market participants tended to have personal relationships and interactions with their consumers. This facilitated personal and customized services and product offerings (Benoit and Van den Poel, 2012). Coelho and Henseler (2012) noted that the closeness of the activities of the market participants generated customer loyalty and trust in the products and services that a customer would receive. As the market industry progressed, customers substituted personalized services for lower prices and unintended consequence in anonymity (Peppard, 2000). Hoe and Shanseen (2018) also noted that contemporary organizations are doing business activities in a market environment that has less certainty, the tempo and trends are quicker and the market systems are more intricate and complex. The customer is at the heart of a business organization and analysing consumer satisfaction is a core value retainer for improving financial performance. The above statement further highlights customer satisfaction as necessary for business survival, market share competition and business growth. Customer loyalty is paramount in business operations as it not only adds to business sales but enable the organizations to lower expenses in customer retention as opposed to targeting new customers. By developing and sustaining customer loyalty, organizations develop long and beneficial relationship for both customers and the organizations. McMullan and Gilmore (2008) stresses the need for a different method of creating and sustaining consumer loyalty by providing products and services that generate specific value that customers truly need. Organizations should be customer value oriented, think outwards to the market space about how customers can be better attended (Wooduff, 1997). Mutanen et al., (2010) opposes 9 the mass marketing system where each and every individual consumer receives equal treatment from a certain organization. Mass marketing cannot truly continue to achieve substantial market returns in the current varied nature of consumer business. Therefore, organizations are leaning towards a marketing approach favoured by Peppard (2000) as the consumer relationship management approach (CRM). This system repeatedly uses current information about existing and prospective consumers for the organization to prepare and be proactive in responding to customer needs. CRM comprises of creating methods and patterns in managing interactions with consumers (Kim et al., 2003), including the activities leading to customer generation, satisfaction, retention and the renewal of dormant consumers (Benoit and Van den Poel,2012). Yang and Peterson (2004) identifies that organizations focus on obtaining customer satisfaction and loyalty as a superior value strategy in achieving competitive advantage. Oliver (2010) noted customer loyalty as a valued customer metric in marketing due to the positive impacts in managing a loyal customer base. 2.1.2 Customer Satisfaction, Loyalty and Retention Lovelock (1983) explained customer loyalty as a consumer’s desire to retain business relationship with an organization; consistent purchase and use of products and services with a positive desire in recommendation of products to other people. Cronin and Taylor (1992) explained that customer satisfaction is a derivation of the customer’s relationship with the service and product quality which has a direct consequence on the customer loyalty. They were also of the understanding that retaining loyalty from a fragment of the market share is more profitable that continuously attempting to attract new consumers. Gremler and Brown (1996) also saw customer loyalty as customers repeating purchases from the same organization. Brown and Chen (2001) noted that there is a positive relationship between business profitability and customer loyalty. Oliver (2010) supported the narrative that loyal customers provide continuous sales and were less inclined to shop at other organizations. Sower et al (2001) explained that the chase of customer loyalty exists throughout the lifespan of a business entity. It is more of a journey than destination. They also noted that attained customer loyalty should be part of the firm’s competitive advantage. Jones (1996) identified customer loyalty as a key value for business entities to operate as ongoing entities through increase in long-term value of businesses. Increase in customer satisfaction and customer retention leads to increased sales and increased overall financial value. Word-of-mouth aids in low marketing costs (Heskett and Sasser Jr, 2010). Anderson and Mittal (2000) also identify that the organization profitability is generally increased by customer loyalty. 2.2 Customer Modelling Behaviour Customer retention can be attained when an organization is capable of understanding behavioural patterns that customers exhibit and the conditions that prompt such behaviours. The management of interactions and relationships with customers is crucial in customer management (Mgbemena, 2016). The prediction of consumer behaviour can be attained by consumer behaviour modelling (Olsen et al 2013). Consumer behaviour modelling comprises the use of methods, tools and techniques to identify behavioural manners or patterns to predict 10 possible future occurrences (Mgbemena, 2016). Nelson et al. (2006) divided customer relationship management models into analytical and behavioural models. Analytical models make use of big datasets that are usually stored in large data warehouses. The datasets require tools and techniques that can effortlessly analyse the data and give solutions to increase an organization revenue or help in cost management. Behavioural models were explained to be the use of surveys to compare and contrast cognitive answers of consumers to the products and services received from organizations. Furness (2001) divided consumer behaviour modelling into (1) predictive modelling (2) descriptive modelling and (3) a mix of predictive modelling and descriptive modelling. Predictive models are models that attempt to answer the “who” questions. For example, who will buy a product or service? Predictive models are the models that attempt to predict future consumer behaviour from the past behavioural patterns. Descriptive modelling attempts to answer the “why” questions. A derivative of descriptive modelling is clustering. When a clustering exercise is being done, consumers that show similar characteristics are classified into clusters. Verbeke et al., (2012) explained that the use of both predictive and descriptive models attempts to provide an effective answer on the “who” and “why” questions at the same time. 2.2.2 Traditional Modelling Diverse tools and techniques have been used to analyse and understand consumer retention. These techniques can be ordinarily termed as statistical analysis techniques and data discovery techniques. Statistical models for analysing customer retention include ANOVA analysis, chisquare test and correlation analysis. Data discovery techniques for analysing customer retention involve data mining techniques. 2.3 Data mining and Customer churn Data mining is the extraction of hidden information, patterns etc. from datasets. Data mining is analysing behavioural patterns for customer retention is popular in business analysis. Ye et al., (2008) noted that data mining techniques are better than traditional statistical models as they have prediction answers. This explains data mining popularity in literature and business organizations in recent years. Lemmens and Croux (2006) identifies they are more effective in large datasets. Data mining models are classified into supervised machine learning and unsupervised machine learning. The data mining models include but are not limited to decision trees, feed-forward neural networks, higher order markov chains, k-nearest neighbour, logistic and linear regression, long short-term memory, random forest, recurrent neural networks, support vector machines, principal component analysis and moving average for time series analysis. Therefore, this research also focuses on examining data mining models for customer retention analysis. Many industries attempt to study consumer churn such as banking (Dias, Godinho and Torres 2020; Elsalamony 2014; He et al., 2014; Moro, Cortez and Laureano 2014; Raju et al., 2014; Tounsi, Hassouni and Anoun 2017), retail (Dingli, Marmara and Fournier 2017; Migueis et al., 2012, Requena et al., 2020), manufacturing (Wang et al., 2020) and insurance (Sundarkumar and Ravi, 2015). Customer churn has been used in several area of studies like behavioural and economic studies (Nelsin et al., 2006). Dingli, Marmara and Fournier (2017) define churn as a word that signifies that a consumer has shifted to a fellow competitor or quits transactions. Churn 11 can also be defined as consumers who have very high tendencies to quit relationship with an organization (Cao, Yu and Zhang, 2015). Migueis et al., (2012) explains churn as when a consumer’s purchasing power drops below a pre-existing level across a specified period of time. Within the Fast-Moving Consumer Goods (FCMG) industry, consumers tend to gradually move or do business with another competitor so the exact time when a consumer will churn is difficult to note (Buckinx and Van Del Poel 2005). Migueis et al., (2012) thereby estimates churn in the FCMG retail industry to be the initial identical product the consumers have bought from another competitor. This helps to indicate the customer loyalty to a retail store. 2.4 Agent Based Modelling and Simulation (ABMS) Researchers and business analysts have tried to use many methods to understand consumer behaviour in the market environment. The ABMS practice is an example of such techniques (Twomey and Cadma, 2002). ABMS attempts to understand how systems operate under specific and different conditions. ABMS operates by imitating real-life practices into a model. ABMS can be used to analyse consumer behaviour in a retail system. The practices provide insights of complex relationship in a system. ABMS is divided into two major practices. Modelling reflects real-life situations into a model while simulation implements the model features in a manner that reflect the intended system. Agent based models comprises of agents (features or variables) and a system for their interactions. ABMS are usually defined by rules that contextualize the interaction of agents in model building structure (Macal and North, 2010). ABMS is starkly different from traditional modelling where features are usually manipulated and aggregated (Baxter et al., 2003). Traditional modelling approaches tend to serve their original purpose; however, the details of their results tend not to be sufficient to analyse the independent interactions of agents. ABMS can sufficiently analyse the independent interactions of agents on a massive level i.e., comprising a large number of variables, appreciating the variables’ full details and their interaction (North et al., 2010). The ABMS tool can be used in all areas of study including health care (Epstein, 2009), economics (Farmer and Foley, 2009), management science (Macal and North, 2010). In the business environment, ABMS has been used to help management understand and predict market patterns and dynamics (Macal and North, 2010). The tool has also been used in artificial life research, to fully examine life in various perspectives (Adami, 1998). It is a general and effective tool. ABMS was also utilized in customer modelling to know and forecast customer modelling (Zhang and Zhang, 2007). The ABMS system allows the incorporation of social and ecological action, norms, cultural structures and institution features. The ABMS relatively lacks optimal predictive capacity and also difficult to verify and validate (Matthews et al., 2007). The ABMS tool allows researchers to investigate a lot of natural phenomena and fill up literature gaps across many fields. 2.5 Related works Consequently, there are research works that attempt to identify factors or features that promote customer retention. These works are found to be based off different industries, however some of these works focused on the data modelling techniques and less of the business value. These features have been studied using different methods and models. Requena et al., (2020) tackled 12 the issue of user prediction from clickstream data (online activities) of an e-commerce website. They used two different ideological processes; a hand-crafted variable-focused classification and a machine learning classification technique. In the two processes, they intentionally coarse-grain a clickstream data to derive classic pattern trajectories with little information. The dataset was provided by Coveo, a North American organization that provides artificial intelligence services to service and retail industries. They tackled the issue of trajectory division of arbitrary length and the early forecasting of limited length trajectories for both balanced and unbalanced datasets. Their analysis demonstrated that the k-gram statistics including the visibility graph motifs showed fast and accurate predictions, indicating that purchase forecast is dependable even for very short observation windows. For the deep-learning technique, Long Short-Term memory (LSTM) demonstrated an improved classification accuracy on the clickstream dataset over former stateof-the-art (SOTA) models. The F1 and AUC metric were the choice of performance measures. The authors did not appear to be overly concerned about the business value aspect of the research. Gerpott and Ahmadi (2015) researched the potentials of contract, service usage, the sociodemographic features of telecommunication consumers and their given reason for ending their contracts to predict the possibility of consumer re-acquisitions. The study further researched the possibility of organization programs to cancel their contract cancellation request. The dataset used for this research was collected through the billing system of one out of the four telecommunications that were existing in the German market. The sample was 607,948 post-paid residential consumers. The major aim of the research was to examine the features that could be responsible in the telecommunicator’s attempt to re-acquire consumers who ended their phone contract but were still legally responsible to the telecommunicator until the time stipulated for mobile contract termination. The researchers made use of a single factor analysis of variance (ANOVA) and dependency tests were also conducted for the variables. The empirical analysis of the research showed that the duration of previous consumer relationship is significantly positive to absolute consumer defection. The analysis also enabled the telecommunicator executives to profiles the target groups into two where one section comprises consumers who had a high tendency to cancel their previous termination request. Dingli, Marmara and Fournier (2017) in their research showed how transactional data traits are developed and could be used to forecast customer churn within a local retail industry. The data was collected from a local supermarket. The consumer purchasing details were derived from the Point of Sales (POS) systems. The dataset features included the customer identification, stock identification, customers frequency, number of receipts, value of purchases etc. The data is definitely confirmed to be real-life situations. The machine learning models of Convolution Neural Networks (CNN) and Restricted Boltzmann Machine was used to analyse the data. The dataset was divided into two with one comprising of 26 features and another 9 features for both machine learning models. Their research further proved that machine learning techniques is useful to derive churn patterns. The performance metric was accuracy. The Restricted Boltzmann Machine provided the best results of 83% accuracy while CNN produced 68% accuracy for the first dataset and 92% and 74% respectively for the second dataset. From the results of their research, the warehouse manager can make sure that the needed products by consumers will always be available in the store for purchase. 13 Vafeiadis et al. (2015) embarked on a research work to compare several machine learning methods for predicting customer churn. The machine learning methods employed in this research are naïve bayes, logistic regression analysis, decision tree learning, support vector machine and artificial neural networks. The customer churn data set is also a telecommunications dataset downloaded for the University of California at Irvine Machine Learning Depository (UCI). The package C5.0 was used to analyse the performance of tested classifiers. The dataset consisted of 19 independent predicting variables and a dependent binary (yes or no) churn variable. The dataset comprised 5,000 samples of telecommunication consumer data. The research analysis was carried out in two parts; boosting and without boosting. The highest performance without boosting was the decision tree method whose accuracy was 94%. The support vector machine classifiers (RBF and POLY kernels) achieved an accuracy of 93%. The boosting part was analysed using Adaboost.M1 algorithm. Logistic regression and Naïve bayes data mining methods were not boosted as they lack free parameters so tuning could be not be done. Therefore, the work showed that the accuracy of the three remaining data classifiers were increased between 1% and 4%, the classifiers are; decision trees, support vector machines and artificial neural networks. So, the boosted support vector machine (SVM) was the most improved and the best classifier with an accuracy of approximately 97% and F-measure of 84%. The main positive implication of this research was that it offered additional information on the performance of common churn prediction methods in the telecommunications industry. It also showed the importance of boosting methods. However, the low number of samples could be responsible for such high accuracy so more consumer data ought to be tested with the boosting technique. Other boosting techniques could also be employed and more detailed dataset (with higher number of predictive variables) from the telecommunications industry to optimise the result statistical significance. Schaeffer and Sanchez (2020) in their work made use of machine learning approach with aim of noting patterns of future consumer churn. They trained classifiers with time series that analyses the consumer-side inventory and classify the consumers as retained or lost in periodic measures of inactivity in the usage and prepurchase of unitary service offered by the organization in a business-to-business scenario. The dataset was derived from a Mexican organization that provides a prepaid service of parcel-delivery. The dataset begins in January 2014 and finishes at April 2017. The total prepaid service sales and the total number of service delivery to each customer were extracted on a monthly basis. Each customer was recognised as a series of consumptions and purchases. They estimated the (1) the duration of the time period taken to produce the forecast, (2) the duration of inactivity from a customer after which the customer is assumed to be churned and (3) and how ahead in time the future prediction is made. They found the linear support vector machine (SVM) to perform uniformly well across the three parameters; the duration of time-series, duration of period of inactivity and future prediction horizon. The Adaboost techniques had the highest sensitivity. The best data modelling technique was identified as the Random Forest with 92% in specificity. The limitation of the work was the use of small dataset, so, less features could be extracted from the datasets. Huang and Kechadi (2013) embarked on a research work for estimating consumer behaviour by making use of a fusion-based system which combined unsupervised and supervised machine learning models for prediction of consumer behavioural patterns. The hybrid system comprised 14 a classic rule method (FOIL) and k-means clustering model. The research comprised three experimental analyses. The purpose of the first experiment was to confirm if the weighted kmeans clustering model was able to provide improved data partitioning answers. The second experimental was about analysing the result of the data modelling and comparing the information with popular techniques. The third experiment compared and contrasted the fusionbased models with various other hybrid data modelling processes. The dataset made use of was from the telecoms industry comprising 104,199 consumer records with 6,056 churners and 98,143 non churners. The dataset characteristics majorly consisted of demographics, call profiles and account details. The five-fold cross validation model was also used in this work. The conclusion of such experiments showed that the fusion-based machine learning model as a model with huge potential out performed better than other models in understanding customer behavioural patterns. However, the researchers did not consider eliminating potential noise in the dataset before the experiments were performed. The effectiveness of the model could be derailed and other clustering algorithms could be used to gather more behavioural patterns of customers. Dias, Godinho and Torres (2020) in their research attempted to forecast customer churn in retail banking. The dataset contained data of over 130,000 consumers of a retail bank along with their monthly interactions over a time duration of 2 years. The dataset features contained product balances, product acquisitions, use of bank services numbering 63 characteristics for every month. The retail bank’s definition of churn was when a consumer fell below a certain standard such as if the consumer had no interactions with the bank for 6 straight months, the assets balance was lower than or equalled to €25 and the debts balance was lower than or equalled to €25. Customer churn was considered the first time a consumer met all these conditions. In the work, 6 different machine learning models were used to analyse the dataset with a forecast goal of 6 months in advance. The methods included: Random Forests (RF), Support Vector Machine (SVM-kernlab), Logistic Regression (LR), Stochastic Boosting (SB ADA), Multivariate Adaptive Regression Splines (MARS-Earth), Classification and Regression trees (CART). The measurement metric were accuracy and AUC. The researchers concluded that the general sample results were good, even with the difficult out-of-sample sets that comprised only churners. The sets really analyse the predictive capacity of customer churn. The Stochastic boosting (SB) provided the best results and the most significant features for consumer churn in a 1-to-2 months’ time-frame are the overall value of bank products held and the customer having a debit or credit cards in another retail bank. For the 3-to-4 months’ time frame, the number of transactions and the consumer having a mortgage loan with another institution are the most significant features. Kirui et al., (2013) estimated consumer churn using a dataset that was collected from a European mobile network organization. The time collection period was three months comprising 112 characteristics and 106,405 records. The churners were 5.6% of the consumers within the dataset and others were active users. More characteristics were extracted and added to the dataset in order for the mining algorithms to recognise the customer churn dataset. Further stratified random sample was conducted and joined to the initial dataset. This was done to solve the problem of imbalance in the initial and modified datasets. Random sampling methods was employed to modify each stratum individually to reduce data to the preferred sample size. New group of data characteristics were identified to improve the prediction accuracy of consumer 15 churn. The findings showed that most of the variable subsets have a similar performance using Naïve Bayes. The findings also showed that as the sample size were increased the active to churn rate also increased. The probabilistic models; Naïve Bayes and Bayesian network produced better answers than the decision tree (C4.5). The research findings further show the prediction accuracy rates are higher in the adjusted dataset than the initial dataset. The new group of data variables that were added were contracted related traits, call pattern description traits and changes in call patterns. The original traits of the dataset included call profiles and call traffic information. The researchers focused on the feature selection and classification of the dataset and neglecting to properly understand customer behaviour. The data imbalance may have also reduced positive performance of the data models. Verbraken et al., (2013) attempted in their research a different structure, the cost-benefit analysis structure to performance levels in relation with profit optimization. The researchers utilised this specific cost- benefit structure to consumer churn. The business potential this structure offers is that it provides information on identifying the best classifier for profit optimization. The proposed structure further creates guidelines on the fragment of the consumer market that should be targeted in the consumer retention programme. The researchers conducted a sample study by utilising 21 methods to 10 different churn datasets. The findings showed that the area under the ROC (AUC) makes inaccurate estimations about misclassifications. They concluded that the utilization of performance evaluation benchmark in the market environment may produce misgivings and finally lead to lesser profits. They developed the H-measure to compensate for the perceived faults of the AUC metric in their work. The H-measure helps to identify the most profitable classifier although the H-measure does not still help in identifying the optimal customer fragment for customer retention programmes. The positive potential of the Expected Maximum Profit Classification (EMPC) measure it serves to know the fragment of the consumer base that should be targeted for customer retention programmes. However, this structure only attempted to solve profit optimization problem in the business world. Martinez et al., (2020) in their paper developed data models that help forecast future customer patterns in a non-contractual setting. They developed a data-motivated structure to forecast if a consumer will buy a product from an organization within a specified duration in the coming time periods. The dataset was given by a big manufacturer in central Europe. The dataset contained more than 10,000 customers and an overall number 200,000 purchases. They applied a group of important features that are derived from times and values of former purchases. The features include: number of purchases, mean time between purchases, standard deviation of time of purchases, maximal time between purchases, time since last purchase, frequency classification, moving averages, maximum value of purchases, mean values of purchase, median value of purchases, time frame variations, purchase trend, pairwise products, powers of two and three, and taking algorithms. These features yielded 274 variables. State of the art machine learning models such as logistic lasso regression, extreme learning machine i.e., single - hidden layer feed forward neural network (SLFN) and gradient tree boosting method were used in analysing the dataset. The performance metric used was the Area Under curve (AUC), accuracy and confusion matrix. All data models performed well, however, the gradient tree boosting methods performed best with AUC of 0.95 and accuracy of 89%. Their work found purchase timing to be an important 16 feature and could be used to increase forecast precision from 68% to 89%. This will greatly help reduce costs and increase revenue. Table 2.1 Author Study Requena et Shopper intent prediction from al., (2020) clickstream e-commerce data with minimal browsing information Gerpott and Regaining drifting mobile Ahmadi communication customers: (2015) Predicting the odds of success of win back efforts with competing risks regression Dingli, Comparison of Deep Learning Marmara Algorithms to Predict Customer and Fournier Churn within a Local Retail (2017) Industry. Method Covered Variables K-gram statistics, Long Clickstream data. Short-Term Memory Traditional Modelling Contract, Service usage and Techniques: ANOVA, Cancellation Reason Dependency tests. Customer identification, Stock identification, Customer’s frequency, number of receipts, value of purchases Vafeiadis et A comparison of machine Naïve bayes, Logistic Daily usage and al. (2015) learning techniques for regression analysis, International Plan customer churn prediction. decision tree Learning, Support Vector Machine and Artificial Neural Networks. Schaeffer Forecasting client retention: A Support Vector Monthly record of client and Sanchez machine-learning approach Machine transactions (2020) Huang and An effective hybrid learning Hybrid based method Demographics, Call profiles Kechadi system for telecommunication of FOIL and K-means and Account details. (2013) churn prediction clustering model Dias, Machine Learning for Customer Random Forests (RF), Product balances, Product Godinho and Churn Prediction in Retail Support Vector acquisitions, Use of bank Torres Banking Machine (SVM- services etc. up to 63. (2020) kernlab), Logistic Regression (LR), Stochastic Boosting (SB ADA), Multivariate Adaptive Regression Splines (MARS-Earth), Classification and Regression trees (CART). 17 Convolution Neural Networks (CNN) and Restricted Boltzman Machine Kirui et al. Predicting customer (2013) churn in mobile telephony industry using probabilistic classifiers in data mining. Verbraken A novel profit maximizing et al. (2013) metric for measuring classification performance of customer churn prediction models Martinez et A machine learning framework al. (2020) for customer purchase prediction in the noncontractual setting Naives Bayes and Call pattern description Bayesian network traits, Changes in call patterns, Call profiles and call traffic information Approaches based on Support Vector Machine, Decision Tree and Ensemble Methods. Logistic lasso regression, Extreme learning machine i.e., single - hidden layer feed forward neural network (SLFN) and Gradient tree boosting method Customer demographic information, Call details and Account details Number of purchases, Mean time between purchases, standard deviation of time of purchases, Maximal time between purchases etc Table 2.1 showing a review of relevant studies on customer churn and purchase prediction. Summary This chapter shows the need for this study. The satisfaction of the customers is necessary for a business to be viable. Customer satisfaction certainly aids customer loyalty. Analysing customer behavioural patterns certainly aids customer relationship management. Different methods have used to study customer churn and repurchase in the literature above. This works attempts to study clickstream data of an online store. The ABMS tool is to be used in this work. It allows the researcher to develop his own idea of data driven approaches in understanding customer repurchase. The researcher makes uses of cross industry standard process for data mining methodology to answer the classification question of a purchasing or a non–purchasing customer by using the day’s customer activities (clickstream data) on an online store? The dataset is fictional from Customer propensity to purchase dataset | Kaggle. Chapter 3 Methodology 3.1 Data Methodology The Cross industry Standard Process for Data Mining methodology (CRISP-DM) was used in the analyses for this research work. CRISP-DM is a structure that highlights the major processes that are taken in a data mining work. It offers a systematic process to the planning and execution of a data mining project (Chapman, 1999). It still is one of the generally accepted models in today’s world (Piatetsky-Shapiro, 2014). There are six stages in the model are shown in figure 3.1. The stages are majorly dependent on the result of the former stages (Seippel, 2018). The arrows 18 identify the strongest interactions. The external paths show the cyclic sequence of the data mining tasks. The results of the CRISP-DM model cause further business and initiates a new process (Chapman et al., 2000). The six stages are briefly described below the figure. Figure 3.1 Stages of the CRISP-DM model. Gotten from Chapman et al. (2000) 1. 2. Business Understanding: this stage examines the project goal from a business perspective. The arguments are modified into a data mining question. Data understanding: in this stage, data is collected and examines and analysed to gather initial information for data familiarization. 19 3. 4. 5. 6. Data preparation: this stage involves all the processes executed to derive the last dataset of features from the raw data that will act as input for the modelling techniques to perform their actions. Modelling: this stage requires the application of modelling methods. It further comprises the selection of the model and the tuning of the parameters of the model. Various modelling techniques are usually considered, however, for this work, naïve bayes, decision trees, random forest and support vector machines will be considered. Evaluation: this stage involves the comparing and contrasting of model results and the alignment of these results to the business objectives. Deployment: this final stage involves the integration of the model with live systems and live data to enable the business analyst make useful predictions and consider viable decisions for achieving business objectives. 3.2 Dataset Description The dataset was gotten from Kaggle and can be retrieved at Customer propensity to purchase dataset | Kaggle. The dataset is therefore secondary data. The dataset to be used contains logged shoppers’ interactions (clickstream dataset) on an online store. The dataset reflects data on the customers propensity to purchase a product. The datasets represent customers visits to a fictional website for a particular day. The dataset comprises two files and the combination gives 607,056 rows and 25 columns. The columns represent the variables. The variables and a short description are given; (1) user id- a unique identifier for the visitor, (2) basket icon click- did the visitor click on the shopping basket icon?, (3) basket add list- did the visitor add a product to their shopping cart on the “list page”?, (4) basket add detail- did the visitor add a product to their shopping cart on the “detail page?, (5) sort by- did the visitors sort products on a page?, (6) image picker- did the visitor use the image picker?, (7) account page click- did the visitor visit their account page?, (8) promo banner click- did the visitor click on the promo banner, (9) detail wish list add- did the visitor add to their Wishlist from the “detail page”?, (10) list size dropdown- did the visitor interact with a product drop down?, (11) closed mini basket click- did the visitor close their mini shopping basket?, (12) checked delivery detail- did the visitor view the delivery FAQ area on a product page?, (13) checked returns detail- did the visitor check the returns FAQ area on a product page?, (14) sign in- did the visitor sign into the website?, (15) saw checkout- did the visitor view the checkout?, (16) saw size charts- did the visitor view a product size chart, (17) saw delivery- did the visitor view the delivery FAQ page?, (18) saw account upgrade- did the visitor view the account upgrade page?, (19) saw home page- did the visitor view the website homepage?, (20) device mobile- was the visitor on a mobile device?, (21) device computer- was the visitor on a desktop device?, (22) device tablet- was the visitor on a tablet device?, (23) returning user- was the visitor new or returning?, (24) loc uk- was the visitor located in the UK, based on their IP address? and (25) ordered- did the visitor make an order?. 3.3 Selection tool The selection tool for executing the different machine learning models is the python package. Python is a multi-purpose high-capacity programme language (Python, 2017). It is commonly used throughout the machine-deep learning community for data mining because of its large library base that comprise many predictive algorithms. Scikit-learn will be employed in this research work and is among the most common libraries (Sk-learn,2017). Python contains 20 exemplary executions of various machine- deep learning model, a relatively simple user interface and is good for research (Pedregosa et al., 2011). Python libraries are imported to the program to perform different operations and processes. Numpy is for array operation, pandas deal with data frame operations, matplotlib and seaborn are for plotting and visualization. Scikit-learn includes different modules to solve Machine Learning tasks. 3.4 Binary classification Data mining has different uses with classification being the most popular. The purpose of classification is to forecast a categorical target feature from a group of input features. The target feature can be of binary kind (Kotu and Deshpande, 2014). This research work involves binary classification as the target feature has two categories: to order or not to order. A forecast is made from the interactions between the input and target features. It is further utilized for data classification (P.tan, Steinbach and Kumar, 2005). 3.5 Data Mining Techniques 3.5.1 Decision Trees (DT) Decision Trees (DT) comprise a group of split structures that separate diverse datasets into more, smaller, and similar subsets of a specific feature. The objective is to develop the most similar subsets. There are different models of decision trees to find the most suitable splits e.g., the Hunts model that follows a voracious concept comprising of optimal decisions (P. Tan et al., 2005). Simple DTs have the benefit of being transformed to simple and easy to interpret classification rules (Han et al., 2011). DTs easily identifies non-linear paths. DTs offers an effective tool for decision as all the problems are easily identifies and all possible options can be attempted. DTs offers a structure to count the outcome values. It further analyses the possible effects of a decision. The comprehensibility reduces when the models are larger and further imbalanced (Rokach and Maimon, 2014). DTs provides a fast learning and forecasting speed. Although, the different models of DTs can vary in terms of comprehensibility, DTs are way easier than black box models such as Feed Neural Network (FNN) or Support Vector Machines (SVM) (Rokach and Maimon, 2014). The disadvantages include the necessary feature engineering, the lack of capacity to directly model time series and larger complexity when dealing with trees that have categorical features that have plenty categories. The DTs models that consist only of singular DTs, also knowns as non-ensemble DTs are likely to over-fit and unstable in terms of noisy data (Hop, 2013). 3.5.2 Random Forest (RF) Random forest is an example of ensemble, where a lot of large trees are organized to the bootstrapped sampled data versions and are classified by the largest vote (Breiman, 1996). Random forest has additional benefit on bagging, as its de-correlates the trees. When a tree is split, a random sample of variables are selected, and these are the ones usually selected for the next splitting process. The results are further based on the majority vote of single trees. RF adds benefit such as robustness and over-fitting that non-ensemble DTs do not have. They have faster train time than boosted tress but require more time for prediction (Breiman, 2001). However, random forests still need feature engineering and do not model time series. 21 3.5.3 Naïve Bayes (NB) This is one of the best classification algorithms. It is a particular case of a Bayesian network. It is a proven improvement on the unconstrained Bayesian network classifier. Bayesian classifiers aid in forecasting the probability that an event belongs to specific set or class in large datasets. The technique is very accurate, quickly classifies and is fast to train with simple models. It needs little amount of training data to ascertain the parameters. It can deal with real, discrete and streaming data. The disadvantages do not cause major problems. The different possible agreeable priors do not majorly disagree within sectors of interest. Bayesian analysis can be applied to any space of models (Elsalamony, 2014). The Naïve Bayes classifier’s crux is factored of the Bayes theorem. The NB theorem is P(x|y) = P(y|x) P(x) / P(y) NB theorem shows the probability of an event and it is mathematically defined as Where A and B are events. P(A) and P(B) (P(B) are the likelihoods of an event being independent of each other. P(A|B) is the probability A under the condition of B. P(B|A) is the probability of the recognizing event B when event A is valid (Tounsi, Hassouni and Anoun, 2017). 3.5.4 Support Vector Machines (SVM) This is a model method that is utilized for both regression and classification analysis. SVM divides two classes by placing a hyperplane in the middle. In this method, only one hyperplane is made used of and this trait shows a difference when compared to DTs where a hyperplane is introduced after each split. In situations where various dividing hyperplanes are situated, the SVM figures out the best margin hyperplane, maximizing the distance between two classes. A hyperplane is shown in Figure 3.2 22 Figure 3.2 showing a hyperplane This increases more generalizability and higher test accuracies. When a variable space cannot be separated linearly, a kernel-function is utilized to place data on a higher dimensional variable space where the data can be separated linearly (Hofmann, 2006). The mapping of data inputs into higher-dimensional variable space enables the SVMs to model linear relationships and also perform non-linear classification (Cortes and Vapnik, 1995). Four simple kernel functions are highlighted below (Hsu, Chang and Lin, 2003). Linear = K (xᵢ, xⱼ) =xᵢᵀ xⱼ Polynomial = K (xᵢ, xⱼ) = (γxᵢxⱼ + r) ᵈ, γ > 0 Radial basis function (RBF) = K (xᵢ, xⱼ) = exp (-γ|| xᵢ - xⱼ||²) γ > 0 Sigmoid = K (xᵢ, xⱼ) = tanh (γ xᵢxⱼ + r) K (xᵢ, xⱼ) is the kernel function, placing the training vectors in a higher dimensional variable space. R, γ and ᵈ are kernel distinct guidelines (Hsu et al., 2008). The advantages of SVMs include high accuracy, fast forecast periods. SVM use less memory as it utilizes a subgroup of training points in the decision stage. The disadvantages include their long training times and the difficulty of interpretation. Feature engineering and hyper-parameter tuning is necessary (Han et al., 2011). Overall, SVM is good with a decent margin of separation and high dimension space. 3.6 Performance metrics They are many metrics used to examine the wellness of deep learning algorithms for a binary classification work. These measures are from the confusion matrix, Figure 3.3, where the correctly forecasted events are in shown in the blocks of true positives and true negatives. The 23 false forecasted events are signified in the blocks of false negatives and false positives. A confusion is really a concrete tool for the examining the wellness of the classifier. The probabilities in the binary classification work range from 0 to 1 and are usually represented as the negative or positive class. It reflects visually the degree of the major classification metrics: The true negatives (TN) are events where a visiting consumer did not buy a production (0) and a no purchase event was forecasted (0). The true positives (TP) are events where the visiting consumer bought a product (1) and it was forecasted (1). The false negatives (FN) indicates when a purchase was made (1) and it was not forecasted (0). The false positives (FP) indicated when a purchase was not made (0) and one was forecasted (1). Different performance measures are gotten from the confusion matrix. The different metrics are based on the specific classification parameters. The parameters indicate which of the prediction results range from 0 to 1. The type of parameters to be used as basis for analyses usually depends on the specific dataset. Figure 3.3 showing the confusion matrix of predicted and actual values with positive and negative events. The most common metric are the classification accuracy and error rate. Therefore, most classification models’ objectives are targeted towards low error rate or high accuracy percentage (P.Tan et al., 2005). 1) Error: this is the total number of falsely classified results. Error rate= Sum of Wrong forecasts/ Sum of Forecasts, = (FP+FN) / (TP + FP + FN +TN) 2) Accuracy: This measures the overall count of forecasts an algorithm gets correct, including both true positives and true negatives. This is equally known as positive predictive value (PPV). Accuracy = (TP + TN) / (TP + FP + TN + FN) 24 Out of all forecasted events, what percentage were right? Accuracy is not always the best measure, specifically when dealing with imbalanced datasets. Other metrics provide other and sometimes better information about error types, which are Recall, Precision and F1- score. 3) Precision: How precise the forecasts are. They are also knowns as positive predict value. Precision = TP/PP, count of true positives divided by count of all positive classified events. Of all the counts the algorithm said the customer will order, how many times did the consumer truly order. 4) Recall: This signifies the percentage of the classes that we dealing with that were truly captured by the algorithm. This is also called sensitivity and true positive rate. Recall = TP / (TP + FN) Of all the consumers that we identified as going to order, how much percentage did the model correctly indicate as going to order 5). F1 Score: In the F1-score, recall and precision are considered to be equal (Manning, Raghavan, & Schuetze, 2008). This is the harmonic mean of precision and recall. It is a really a potent significator of precision and recall (cannot have a high F1 score without strong model). F1 = 2(Precision * Recall) / (Precision + Recall) Penalizes models heavily if they are skewed towards precision or recall 6). Specificity- An important measure which stands in contrast to the recall and measures the proportion of negatives that are correctly identified. Specificity = TN / TN + FP 7). ROC AUC score- This is another performance metric that provides a way to comprehensively analyse our model’s wellness is the Area Under Curve (AUC) metric and a Receiver Operator Characteristic curve (ROC). The AUC gives a singular metric of numeric nature against the visual representation of the ROC. The visual graph shows the true positive rate (recall – TPR) against the false positive rate (FPR) of the models’ classifier. The ROC score indicates the average wellness against all available cost ratios between the FP and FN. When the ROC area value is 1.0, this shows a perfect forecast. Additionally, the scores 0.5, 0.65, 0.8 and 0.9 reflects random forecast, average, good and very good respectively. When ROC scores give lower scores than these, it indicates wrongness of the forecast. ROC AUC score offers a trade-off between the recall and specificity metric; therefore, ROC AUC score is threshold-independent (Zou, Omalley and Mauri, 2007). Verbraken et al. (2013) gave some doubts about the use of the AUC score but most peer-reviewed papers still use accuracy and ROC AUC score as it offers a chance to consider both 25 specificity and sensitivity and to also plot their dependency independent of the chosen parameter (Castanedo et al., Zhang et al., 2014; Lang and Rettenmeier, 2017). Chapter 4 Analysis, Interpretation and Discussion 4.1 Data exploration Data which is analysed in this research is the shopping data of users. It is already split into train and test sets. Total parameters are 25 which include user behaviour and other details on basis of which it is observed that whether order will be placed or not. Items are only 3 which are mobile, computer and tablet. All parameters have True or False values which are given by 0 and 1. Target parameter is “ordered” which means that either user ordered the item or not. Rest of the parameters are predictors. Total samples in train and test data are 4,55,401 and 1,51,655. In train dataset output classes are unevenly distributed. Samples of class 1 are only 19,000 whereas in test data it is completely missing. Figure 4.1 Train Set Class Distribution Correlation among parameters is plotted as heatmap to find out which parameters are positively related to each other. If we look at the figure below it shows that basket parameters and checkout are highly correlated to ordered. 26 Figure 4.2 Correlation of parameters To observe data distribution in different parameters bar plots and histogram are plotted. From bar plots it can be seen that frequency of True value is very low which also makes sense because acquired data is biased with class 0. Many users have shown interest in mobile devices rather than computer and tablets. Figure 4.3 Data distribution 27 4.2 Pre-processing All acquired values are either 0 or 1 which makes data evenly distributed. Therefore, there is no need of any pre-processing step. 4.3 Feature Selection Correlation: Correlation is plotted above in Figure 4.2 to find out which parameters are important for placing order by user. We will only consider parameters that are positively correlated, and these parameters are checked delivery detail, saw checkout, users that have signed in, those who have clicked basket icon and have added basket details. K-Selection: Feature selection is very important as we have seen above that many parameters are negatively correlated to each other and few parameters are also negatively correlated with ordered parameter. User ids are dropped as they are an alternative to index values. In K-selection method, score is assigned to each parameter/predictor with respect to target or output parameter. This score is measured by using chi-square model. High score of a parameter means that order placement by user is dependent on this parameter more. According to highest scores, checked delivery details, user that sign in and saw checkout, basket parameters are important for placing order. This answers the sub-research question: how can exploratory data analysis aid with information is solving the prediction problem? Figure 4.3 showing the features ranking based on chi_square Parameters that are finally selected are checked_delivery_detail, saw_checkout, basket_icon_click, list_size_dropdown, saw_homepage, basket_add_list, basket_add_detail, checked_returns_detail, promo_banner_click, image_picker, sort_by, account_page_click, promo_banner_click and closed_minibasket_click. 4.4 Data modelling 4.4.1 Method 1 It was observed in data analysis that class distribution is uneven and in test set class 1 observations are missing. To solve this issue, modelling is done through 2 methods. In the first 28 method, data is taken as it is, but remember the results won’t be generic and will be completely biased to class 0 so even if results we get are 100% on test data it means model is overfitted or biased. All 4 models are trained on test set and have achieved around 99% accuracy on test data. By looking at classification report or confusion matrix it is observed that class 1 accuracy is 0 (as it is missing in test set) while class 0 is 99% approximately, which makes these models completely biased. The results are below. Model type Accuracy Precision Recall F-measure SVM 99.25 100 99.26 99.63 DT 99.30 100 99.31 99.65 NB 98.71 100 98.72 99.36 RF 99.28 100 99.28 99.64 Table 4.1 Summary of Classification Report of machine learning methods (See Appendix A for raw data). Decision tree accuracy is highest among all. It achieved 99.3% accuracy with precision rate of 100% and F-measure of 99.6%. It classified 150,606 observations correctly and only 1049 were mis-classified. Parameters that are found important through Decision Tree model training are checked_delivery_details and saw_checkout as shown in Figure 4.4.1 Figure 4.4.1.1 showing the important features using decision tree model SVM achieved 99.2% accuracy and its precision rate is 100% and F1 score is 99.6%. 1125 observations are predicted wrong. The checked_delivery_detail and saw_checkout are also found to be important. 29 Figure 4.4.1.2 showing the most important features using SVM model Naïve Bayes accuracy is 98.7% as it predicted 1942 test observations wrong. Its precision is 100% and its F-measure is 99.3%. The checked_delivery_detail, sign_in and saw_checkout are also found to be important. Figure 4.4.1.3 showing the most important features at 0 on the x-axis for the Naïve Bayes model. Random Forest Analysis mis-classified 1090 samples and its accuracy is 99.28%, precision is 100% and F1 score is 99.6%. The sign-in, checked delivery details, saw checkout and basket parameters are important. These parameters were also found important in correlation and K-selection analysis. 30 Figure 4.4.1.4 showing the most important features using the Random Forest Analysis Decision tree has highest performance and is the best model of method 1. Roc curve can’t be plotted because there is only class 0 in test set. This issue is fixed in method 2. This partly answers the sub-research problem: What machine learning techniques gives the best answers from the prediction problem? 4.4.2 Method 2 In this method, data sampling method is used. In data sampling, either given data is over-sampled or under-sampled. In over-sampling, class having few observations are increased whereas in under-sampling class having too many samples are dropped from dataset. Because we are using classical models and total observations are above 400K so if over-sampling is applied then data will have around 800K samples that requires too much time to process, and secondly because for class 1 too many new observations will be created that will make this data bogus. Therefore, better approach is to do under-sampling and drop observations randomly to equalize class distribution. This method cannot be applied on test data because class 1 samples are none there. For evaluation we have used hold one out method on under-sampled data and 20% data is randomly taken out as test set. Now, again models are trained and results obtained are around 99% but this time class 1 observations are also included and model is efficient enough to predict results with 99% accuracy for both classes. These models are generic and can be used. The results are below. Model type Accuracy Precision Recall F-measure SVM 99.18 99.19 99.19 99.19 DT 98.93 98.94 98.94 98.94 NB 98.75 98.76 98.76 98.76 RF 99.04 99.05 99.04 99.04 31 Table 4.2 Summary of Classification Report of machine learning and statistical methods (See Appendix B for raw data). Decision tree accuracy is 98.93%, its precision is 98.9% and F-measure is 98.94%. Because we have applied under-sampling and 20% data is in test set, therefore, total observations in test data are 7638. Only 81 observations are predicted wrong during evaluation on test set. AUC score for class 0 is 99% and for class 1 is 98.7%. The most important feature is the checked_delivery_detail. . Figure 4.4.2.1 showing the most important feature using decision tree analysis. SVM achieved 99.18% results across the accuracy, precision, recall and F1-score performance metrics and its AUC score for class 0 is 99.3% and for class 1 it is 99%. The most important features are the saw_checkout and the checked_delivery_detail. Figure 4.4.2.2 showing the most important features using the SVM model. Naïve Bayes achieved 98.7% results across the accuracy, precision, recall and F1-score performance metrics and AUC score for class 0 is 98.8% and for class 1 it is 98.6%. The most important features are sign_in, saw_checkout and checked_delivery_detail. 32 Figure 4.4.2.3 showing the most important features at the x-axis using Naïve Bayes model RF achieved 99.05% results across the accuracy, precision, recall and F1-score performance metrics and. It predicted 95 observations wrong and AUC score for both classes is 99%. The signin, checked delivery details, saw checkout and basket parameters are important. Figure 4.4.2.4 showing the most important features using Random Forest 4.5 Discussion To further answer the sub-research question: What machine learning techniques gives the best answers from the prediction problem? The results of all four models are examined and by looking at accuracy and AUC results. It is clear that SVM out-performed other models and has the highest performance. Unlike method 1, models in method 2 can be used for real-time observations because data which is used for training here is not biased and in test set both classes were approximately equally present, and trained models achieved above 95% accuracy which makes them suitable to predict results in real-time. The RF model was the second-best model and it captured the same features as captured by the correlation analysis. In answering the original question, how can a customer be classified into a purchasing or a non–purchasing customer by using the customer activities (clickstream data) on an online store? I noticed the most important parameter across all models and correlation was checked_delivery_detail i.e., did the visitor view the delivery FAQ area on a product page? So, we can say that the visiting user that clicks on the delivery FAQ area on a product page has the highest tendency to purchase a product. The marketing manager needs to expend the most resources on creating advertisement on the 33 checked_delivery_detail page. This can increase sales as we know that customers that view this page have a higher tendency to purchase products. The marketing manager can also expend more resources on the saw_checkout and sign_in page as they tendency to make an order. In countries with a significant middle class, the one-click shopping culture seems common. The advertisements made on webpages such as checked_delivery_detail afford the purchasing customer the ease of accessing more products and it also guarantees higher revenue for the business as long as effective marketing schemes are created. The information gotten from the purchasing customer preferences can help the department in charge of Artificial Intelligence suggest the stocks that are more likely to be sold by the company to the warehousing managers. This will help same-day and instant delivery of products by a company e.g., Amazon uses artificial intelligence and delivers to 72% of its consumers within a day (Online shopping is polluting the planet - but it is not too late, 2020). This strategy can help increase a company’s competitive edge and control a higher market share than its fellow competitors. The marketing management can further develop programs that encourage customer loyalty to these purchasing customers such as coupons, discounts, sales offer etc. they can also use these purchasing customers to collect reviews about their products and services to enable the company to improve their operations. Chapter 5 Conclusion 5.1 Conclusion In this research, I made use of a clickstream (activities) dataset from an online store. The dataset reflected data on the customers propensity to purchase a product. The intention was to be able to classify a customer into a purchasing or non-purchasing customer. The research showed that with the use of exploratory analysis and machine learning models, we can classify customers into purchasing or non-purchasing customers. Four different machine learning models were used and Support Vector Machine was found to give the best results. In the work of Schaeffer and Sanchez, 2020, SVM was also found to give the bests results in churn prediction. Churn prediction is similar to this work. 5.2 Recommendation The dataset does not include the characteristics of the visiting users. This will make it difficult for the marketing management to narrow their range in marketing programs. Dataset that involves consumers characteristics can help improve target marketing accuracy. More research should be done that comprises of both dataset categories. Furthermore, the dataset comprises data of daily action on an online website. More data can be gotten across different time frames to create timely purchase predictions. In this way, the company management can have further information on cyclical patterns of customer behaviours. Class distribution appeared to be uneven across the dataset. In this dataset, class 1 observations are not available. The dataset is biased and will lead to very high values across the performance metrics. Purchase prediction is only one classification problem. Other problems can be the forecasting of whether a customer will leave a shopping session. An attempt to solve this problem can help reduce customer churn as the possible results can help the management develop marketing programs to help retain consumers. 34 Bibliography A.Elsalamony, H., 2014. Bank Direct Marketing Analysis of Data Mining Techniques. International Journal of Computer Applications, 85(7), pp.12-22. Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B. and Varoquaux, G., 2014. Machine learning for neuroimaging with scikitlearn. Frontiers in Neuroinformatics, 8. Adami, C., 1998. Introduction to Artificial Life. Al-Mashraie, M., Chung, S. and Jeon, H., 2020. Customer switching behavior analysis in the telecommunication industry via push-pull-mooring framework: A machine learning approach. Computers & Industrial Engineering, 144, p.106476. Anderson, E. and Mittal, V., 2000. Strengthening the Satisfaction-Profit Chain. Journal of Service Research, 3(2), pp.107-120. Bahmani, B., Mohammadi, G., Mohammadi, M. and Tavakkoli-Moghaddam, R., 2013. Customer churn prediction using a hybrid method and censored data. Management Science Letters, pp.1345-1352. Baxter, N., Collings, D. and Adjali, I., 2003. Agent based modelling. BT Technology Journal, 21(2), pp.126-132. Bellman, S., Lohse, G. and Johnson, E., 1999. Predictors of online buying behavior. Communications of the ACM, 42(12), pp.32-38. Benoit, D. and Van den Poel, D., 2010. Binary quantile regression: a Bayesian approach based on the asymmetric Laplace distribution. Journal of Applied Econometrics, 27(7), pp.11741188. Bowen, J. and Chen, S., 2001. The relationship between customer loyalty and customer satisfaction. International journal of contemporary hospitality management, 13(5), pp.213217. Breiman, L., 1996. Bagging Predictors. Boston: Kluwer Academic Publishers., pp.123-140. Brieman, L., 2001. Random Forests. Kluwer Academic Publishers, pp.5-32. Buckinx, W. and Van den Poel, D., 2005. Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. European Journal of Operational Research, 164(1), pp.252-268. Cao, J., Yu, X. and Zhang, Z., 2015. Integrating OWA and data mining for analysing customers churn in E-commerce. Journal of Systems Science and Complexity, 28(2), pp.381-392. Castanedo, F., Valverde, G., Zaratiegui, J. and Vazquez, A., 2014. Using Deep Learning to Predict Customer Churn in a Mobile Telecommunication Network. [ebook] Available at: <http://wiseathena.com> [Accessed 12 June 2021]. 35 Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C. and Wirth, R., 2000. CRISP-DM 1.0: Step-by-step data mining guide. Copenhagen. Clulow, V., Gerstman, J. and Barry, C., 2003. The resource‐based view and sustainable competitive advantage: the case of a financial services firm. Journal of European Industrial Training, 27(5), pp.220-232. Coelho, P. and Henseler, J., 2012. Creating customer loyalty through service customization. European Journal of Marketing, 46(3/4), pp.331-356. Coston, R., 2020. “Five Stages of Your Customers’ Buying Journey,”. [Blog] Jottful, Available at: <https://jottful.com/blog/the-5-stages-of-the-customer-journey/> [Accessed 12 June 2021]. Cortes, C. and Vapnik, V., 1995. Support-vector networks. Machine Learning, 20(3), pp.273297. Cronin, J. and Taylor, S., 1992. Measuring Service Quality: A Re-examination and Extension. Journal of Marketing, 56(3), p.55. Dias, J., Godinho, P. and Torres, P., 2021. Machine Learning for Customer Churn Prediction in Retail Banking. In: Gervasi O. et al. (eds) Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science, [online] 12251, pp.pp. 576– 589. Available at: <http://ttps://doi.org/10.1007/978-3-030-58808-3_42> [Accessed 7 June 2021]. Dingli, A., Marmara, V. and Fournier, N., 2017. Comparison of Deep Learning Algorithms to Predict Customer Churn within a Local Retail Industry. International Journal of Machine Learning and Computing, 7(5), pp.128-132. Epstein, J., 2009. Modelling to contain pandemics. Nature, 460(7256), pp.687-687. F., S., 2018. Machine-Learning Techniques for Customer Retention: A Comparative Study. International Journal of Advanced Computer Science and Applications, 9(2). Fajta, J., 2014. Online visitor classification based on mbti model. M.sc. Technische Universiteit Eindhoven. Farmer, J. and Foley, D., 2009. The economy needs agent-based modelling. Nature, 460(7256), pp.685-686. Furness, P., 2001. Techniques for Customer Modelling in CRM. Journal of Financial Services Marketing, 5(4), pp.293-307. Gerpott, T. and Ahmadi, N., 2015. Regaining drifting mobile communication customers: Predicting the odds of success of winback efforts with competing risks regression. Expert Systems with Applications, 42(21), pp.7917-7928. Gervasi, O., Murgante, B., Misra, S., Garau, C., Blečić, I., Taniar, D., Apduhan, e., Rocha, A., Tarantino, E., Torre, C. and Karaca, Y., 2020. Computational Science and Its Applications. In: Cagliari, Italy. ICCSA 2020, 20th International Conference. 36 Gremler, D. and Brown, S., 1996. Service Loyalty: Its Nature, Importance, and Implications. Han, J., Pei, J. and Kamber, M., 2011. Data Mining: Concepts and Techniques. 3rd ed. New York: Elsevier. He, B., Shi, Y., Wan, Q. and Zhao, X., 2014. Prediction of Customer Attrition of Commercial Banks based on SVM Model. In: 2nd International Conference on Information Technology and Quantitative Management, ITQM 2014. [online] Procedia Computer Science, pp.423430. Available at: <https://doi.org/10.1016/j.procs.2014.05.286> [Accessed 23 June 2021]. Heskett, J. and Sesser, E., 1997. The Service Profit Chain. Hidasi, B., Karatzoglou, A., Baltrunas, L. and Tikk, D., 2016. SESSION-BASED RECOMMENDATIONS WITH RECURRENT NEURAL NETWORKS. International Conference on Learning Representations (ICLR). Hoe, L. and Mansori, S., 2018. The Effects of Product Quality on Customer Satisfaction and Loyalty: Evidence from Malaysian Engineering Industry. International Journal of Industrial Marketing, 3(1), p.20. Hofmann, M., 2006. Support Vector Machines — Kernels and the Kernel Trick An elaboration for the Hauptseminar “Reading Club: Support Vector Machines”. Hop, W., 2013. Web-shop order prediction using machine learning. M.sc. Erasmus University. Hsu, C., Chang, C. and Lin, C., 2008. A Practical Guide to Support Vector Classication. Huang, Y. and Kechadi, T., 2013. An effective hybrid learning system for telecommunication churn prediction. Expert Systems with Applications, 40(14), pp.5635-5647. Jones, T., 1996. Why Satisfied Customers Defect. Journal of Management in Engineering, 12(6), pp.11-11. Kim, J., Suh, E. and Hwang, H., 2003. A model for evaluating the effectiveness of CRM using the balanced scorecard. Journal of Interactive Marketing, 17(2), pp.5-19. Kirui, C., Hong, L., Cheruiyot, W. and Kirui, H., 2014. Predicting Customer Churn in Mobile Telephony Industry Using Probabilistic Classifiers in Data Mining. International Journal of Computer Science, (10), pp.165-172. Kotu, V. and Desphande, B., 2014. Predictive Analytics and Data Mining. Morgan Kaufmann. Korpusik, M., Sakaki, S., Chen, F. and Chen, Y., 2016. Recurrent Neural Networks for Customer Purchase Prediction on Twitter. Boston: CBRecSys: Workshop on New Trends in ContentBased Recommender Systems at ACM Recommender Systems Conference. Lang, T. and Rettenmeier, M., 2017. Understanding Consumer Behavior with Recurrent Neural Networks. Lemmens, A. and Croux, C., 2006. Bagging and Boosting Classification Trees to Predict Churn. Journal of Marketing Research, 43(2), pp.276-286. 37 Lo, C., Frankowski, D. and Leskovec, J., 2016. Understanding Behaviors that Lead to Purchasing: A Case Study of Pinterest. KDD Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge and Data Mining, pp.531-540. Lovelock, C., 1983. Classifying Services to Gain Strategic Marketing Insights. Journal of Marketing, 47(3), p.9. Ma, H., 1999. Anatomy of competitive advantage: a SELECT framework. Management Decision, 37(9), pp.709-718. Macal, C. and North, M., 2010. Tutorial on agent-based modelling and simulation. Journal of Simulation, 4(3), pp.151-162. Magrabi, A., 2016. Top 5 Machine Learning Applications for E-Commerce. [Blog] techblog.commercetools, Available at: <https://techblog.commercetools.com/top-5machine-learning-applications-for-e-commerce-268eb1c89607> [Accessed 14 June 2021]. Manning, C., Ragha, P. and Schütze, H., 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press. Martínez, A., Schmuck, C., Pereverzyev, S., Pirker, C. and Haltmeier, M., 2020. A machine learning framework for customer purchase prediction in the non-contractual setting. European Journal of Operational Research, 281(3), pp.588-596. McMullan, R. and Gilmore, A., 2008. Customer loyalty: an empirical study. European Journal of Marketing, 42(9/10), pp.1084-1094. Mgbemena, C., 2016. A Data-driven Framework for Investigating Customer Retention. Doctor of Philosophy (Ph.D). Brunel University London. Miguéis, V., Van den Poel, D., Camanho, A. and Falcão e Cunha, J., 2012. Modeling partial customer churn: On the value of first product-category purchase sequences. Expert Systems with Applications, 39(12), pp.11250-11256. Moe, W., 2003. Buying, Searching, or Browsing: Differentiating Between Online Shoppers Using In-Store Navigational Clickstream. Journal of Consumer Psychology, 13(1), pp.2939. Moore, K., 2021. How Emerging Marketplaces Are Simplifying (And Gamifying) Online Shopping. [online] Forbes. Available at: <https://www.forbes.com/sites/kaleighmoore/2020/10/21/how-emerging-marketplaces-aresimplifying-and-gamifying-online-shopping/?sh=17bf0b2164d7> [Accessed 7 June 2021]. Moro, S., Cortez, P. and Rita, P., 2014. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62, pp.22-31. Mutanen, T., Nousiainen, S. and Ahola, J., 2010. Customer churn prediction – a case study in retail banking. Nakayama, Y., 2009. The impact of e-commerce: It always benefits consumers, but may reduce social welfare. Japan and the World Economy, 21(3), pp.239-247. 38 Neslin, S., Gupta, S., Kamakura, W., Lu, J. and Mason, C., 2006. Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models. Journal of Marketing Research, 43(2), pp.204-211. Niu, X., Li, C. and Lu, X., 2017. Predicitive analytics of e-commerce search behavior for conversion. In: AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS) DATA SCIENCE AND ANALYTICS FOR DECISION SUPPORT (SIGDSA). AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS). North, M., Macal, C., Aubin, J., Thimmapuram, P., Bragen, M., Hahn, J., Karr, J., Brigham, N., Lacy, M. and Hampton, D., 2010. Multiscale agent-based consumer market modeling. Complexity, p.NA-NA. Oliver, R., 2010. Satisfaction. 1st ed. New York: Routledge. Peppard, J., 2000. Customer Relationship Management (CRM) in financial services. European Management Journal, 18(3), pp.312-327. Piatetsky, G., 2021. CRISP-DM, still the top methodology for analytics, data mining, or data science projects - KDnuggets. [online] KDnuggets. Available at: <https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-miningdata-science-projects.html> [Accessed 12 June 2021]. Python.org. 2021. Python Documentation by Version. [online] Available at: <https://www.python.org/doc/versions/> [Accessed 7 June 2021]. Requena, B., Cassani, G., Tagliabue, J., Greco, C. and Lacasa, L., 2020. Shopper intent prediction from clickstream e-commerce data with minimal browsing information. Scientific Reports, 10(1). Rokach, L. and Maimon, O., 2014. Data Mining with Decision Trees. 2nd ed. Singapore: World Scientific Publishing Company. Salehi, F., Abdollahbeigi, B., Langroudi, A. and Salehi, F., 2012. The Impact of Website Information Convenience on E-commerce Success of Companies. Procedia - Social and Behavioral Sciences, 57, pp.381-387. Schaeffer, S. and Rodriguez Sanchez, S., 2020. Forecasting client retention — A machinelearning approach. Journal of Retailing and Consumer Services, 52, p.101918. Sieppel, H., 2018. Customer purchase prediction through machine learning. M.sc. University of Twente. Statista. 2021. Global retail e-commerce market size 2014-2024 | Statista. [online] Available at: <https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/> [Accessed 7 June 2021]. Sower, V., Duffy, J., Kilbourne, W., Kohers, G. and Jones, P., 2001. The Dimensions of Service Quality For Hospitals: Development and Use of the KQCAH Scale. Health Care Management Review, 26(2), pp.47-59. 39 Suchacka, G. and Stemplewski, S., 2017. Application of Neural Network to Predict Purchases in Online Store. In: Information Systems Architecture and Technology: Proceedings of 37th International Conference on Information Systems Architecture and Technology – ISAT 2016 – Part IV. Advances in Intelligent Systems and Computing, vol 524.. [online] Cham Springer. Available at: <https://doi.org/10.1007/978-3-319-46592-0_19> [Accessed 14 June 2021]. Suchacka, G., 2014. Analysis of Aggregated Bot and Human Traffic on E-Commerce Site. In: Federated Conference on Computer Science and Information Systems. [online] Poland: IEEE, pp.1123-1130. Available at: <http://10.15439/2014F346> [Accessed 14 June 2021]. Sundarkumar, G. and Ravi, V., 2015. A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Engineering Applications of Artificial Intelligence, 37, pp.368-377. Tan, P., Steinbach, M., Karpatne, A. and Kumar, V., n.d. Introduction to data mining. Tan, Y., Xu, X. and Liu, Y., 2016. Improved Recurrent Neural Networks for Session-based Recommendations. In: Deep Learning for Recommender Systems. [online] Boston: ACM, pp.17-22. Available at: <https://dl.acm.org/doi/10.1145/2988450.2988452> [Accessed 22 June 2021]. Tounsi, Y., Hassouni, L. and Anoun, H., 2017. Credit scoring in the age of Big Data – A Stateof-the-Art. International Journal of Computer Science and Information Security (IJCSIS), 15(7), pp.134-145. Twomey, P. and Cadman, R., 2002. Agent‐based modelling of customer behaviour in the telecoms and media markets. info, 4(1), pp.56-63. Ullah, I., Raza, B., Malik, A., Imran, M., Islam, S. and Kim, S., 2019. A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector. IEEE Access, 7, pp.60134-60149. Vafeiadis, T., Diamantaras, K., Sarigiannidis, G. and Chatzisavvas, K., 2015. A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55, pp.1-9. Verbraken, T., Verbeke, W. and Baesens, B., 2013. A Novel Profit Maximizing Metric for Measuring Classification Performance of Customer Churn Prediction Models. IEEE Transactions on Knowledge and Data Engineering, 25(5), pp.961-973. Wang, X., Ryoo, J., Bendle, N. and Kopalle, P., 2020. The role of machine learning analytics and metrics in retailing research. Journal of Retailing. Whiting, K., 2021. Online shopping is polluting the planet - but it's not too late. [online] World Economic Forum. Available at: <https://www.weforum.org/agenda/2020/01/carbonemissions-online-shopping-solutions/> [Accessed 7 June 2021]. Woodruff, R., 1997. Customer value: The next source for competitive advantage. Journal of the Academy of Marketing Science, 25(2), pp.139-153. 40 Yang, Z. and Peterson, R., 2004. Customer perceived value, satisfaction, and loyalty: The role of switching costs. Psychology and Marketing, 21(10), pp.799-822. Ye, Y., Liu, S. and Li, J., 2008. A Multiclass Machine Learning Approach to Credit Rating Prediction. In: International Symposiums on Information Processing, [online] IEEE, pp.5761. Available at: <http://10.1109/ISIP.2008.37> [Accessed 23 June 2021]. Zhang, Y., Dai, H., Xu, C., Feng, J., Wang, T., Bian, J., Wang, B. and Liu, T., 2014. Sequential click prediction for sponsored search with recurrent neural networks. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. Zhou, Z. and Feng, J., 2017. Deep Forest: Towards an Alternative to Deep Neural Networks. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp.3553-3559. Zou, K., O’Malley, A. and Mauri, L., 2007. Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models. Circulation, 115(5), pp.654-657. 41 Appendix A Method 1 Analysis A.1 Decision Tree Classification Report and Confusion Matrix A.2 Support Vector Machine Classification Report and Confusion Matrix A.3 Naïve Bayes Classification Report and Confusion Matrix 42 A.4 Random Forest Classification Report and Confusion Matrix B Method 2 Analysis B.1 Decision Tree Classification Report and Confusion Matrix Decision Tree AUC Curve 43 B. 2 Support Vector Machine Classification Report and Confusion Matrix SVM ROC Curve B.3 Naïve Bayes Classification Report and Confusion Matrix 44 Naïve Bayes ROC Curve D. 4 Random Forest Classification Report and Confusion Matrix ROC Curve 45 46
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )