IM S5028 Customer Analytics Data Mining Techniques and applications to CRM: decision trees and neural networks Data Mining techniques • Data mining, or knowledge discovery, is the process of discovering valid, novel and useful patterns in large data sets • • • • • • • Many different data mining algorithms exist Statistics Decision trees Neural networks Clustering algorithms Rule induction algorithms Rough sets Genetic algorithms • decision trees and neural networks are widely used, they are part of most data mining tools 2 Supervised vs unsupervised techniques – Supervised learning techniques - guided by known output(decision trees, most neural network types) Or – Unsupervised learning techniques use inputs only • Clustering algorithms • Kohonen Feature Maps (type of neural networks) • They us use similarity measures 3 1 IM S5028 Decision Trees • Tree-shaped structures • Can be converted to rules 4 Decision Trees • Decision trees are built by iterative process of splitting the data into partitions • Many different algorithms • Most common are: ID3, C4.5, CART, CHAID • Algorithms differ in the number of splits and the diversity function: Gini index , Entropy 5 Decision trees example (adapted from Information Discovery.Inc,1996) Develop a tree to predict profit Manufact urer State City Product Color Profit Smith CA Los Angeles Blue High Smith AZ Flagstaff Green Low Adams NY NYC Blue High Adams AZ Flagstaff Red Low Johnson NY NYC Green Avg. CA Los Angeles Red Avg. Johnson 6 2 IM S5028 First step: the tree algorithm splits the original table into three tables using “State” attribute Table 1A : For State = AZ Manufacture Stat r e Smith AZ Profi t Low Adams Low These two tables must be split further Product City Flagstaf Color Green fFlagstaf AZ Red f Table 1B : For State = CA Manufacture Stat r e Smith CA City Los Angeles Los Johnson CA Angeles Table 1C : For State = NY This one is classified Product Color Blue Profi t High Red Avg. Manufacture Stat r e Adams NY City Product Color NYC Blue Profi t High Johnson NYC Green Avg. NY A decision tree derived from this table 8 Corresponding rules • This decision tree above can in fact be translated into a set of rules as follows: • 1. IF State= AZ THEN Profit = Low; • 2. IF State= CA and Manufacturer = Smith THEN Profit = High; • 3. IF State= CA and Manufacturer = Johnson THEN Profit = Avg; • 4. IF State= NY and Manufacturer = Adams THEN Profit = High; • 5. IF State= NY and Manufacturer = Johnson THEN Profit = Avg; 9 3 IM S5028 Tree 2 10 Exercise • Note: different trees are possible • 1. Derive a different tree using “manufacturer” attribute first and then “state” • 2. Derive a tree starting with “colour” attribute • Represent the trees as rules • Compare the rules 11 Classification And Regression Trees (CART) algorithm • • • • Based on binary recursive partitioning looks at all possible splits for all variables searches through them all ranks order each splitting rule on the basis of a quality-of-split criterion measured by a diversity function eg. Gini rule or entropy – how well the splitting rule separates the classes contained in the parent node 12 4 IM S5028 Entropy • Entropy is a measure of disorder • If an object can be classified into n classes (Ci, ... Cn) and the probability of an object belonging to class Ci is p(Ci) the entropy of classification is 13 CART: Classification And Regression Trees • once a best split is found, the search is repeated for each child node, until further splitting is impossible or some other criterion is met • The tree is then tested on a testing sample(prediction/classification rate, error rate) 14 Decision trees, applications • Decision trees can be used for: – Prediction (eg. churn prediction) – classification, (eg. into , good and bad accounts) – Exploration – Segmentation 15 5 IM S5028 Decision trees can be used for segmentation All Customers Young Old Income high Income low Segment1 Segment2 Segment3 (Zikmund et al 2003) 16 Decision trees can be used for Churn modelling 50 Churners 50 Non-churners New technology? ne w 30 Churners 50 Non-churners <= 2.3 years Years as customer old 20 Churners 0 Non-churners > 2.3 years 5 Churners 40 Non-churners 25 Churners 10 Non-churners Age <= 45 20 Churners 0 Non-churners > 45 5 Churners 10 Non-churners •(adapted from Berson et al, 2000) 17 Case study: data analysis for marketing • Groth resources<ftp://ftp.prenhall.com/pub/ptr/ c++_programming.w-050/groth/> • case study 8 18 6 IM S5028 Data mining exercises: building models using decision tress tutorial in building models using decision trees • Students are to complete the decision tree tutorial from the Groth text, p 127-147 : • Tool : KnowledgeSEEKER • download and install (next week tutorial) from <ftp://ftp.prenhall.com/pub/ptr/c++_programmi ng.w-050/groth/> 19 References • Groth R. (2000) Data Mining • Also: – “Rules are Much More than Decision Trees” , Information Discovery Inc. – www.thearling.com – Salford Systems White Paper Series , “An Overview of CART Methodology” – Berson A.., Smith S., Thearling K (2000), Building Data Mining Applications for CRM, McGrow-Hill 20 7