Data Mining Executive Overview Alan Montgomery VP Business Development, SPSS amontgomery@spss.com “Data mining makes the difference” Agenda • What is data mining? • Who is using data mining, and for what? • How data mining fits into an IT system • Some myths about data mining Information: Internet • SPSS: http://www.spss.com • Two Crows Corp (Herb Edelstein): http://www.twocrows.com • Andy Pryke’s Data Mine http://www.cs.bham.ac.uk/~anp/TheDataMine.html • Knowledge Discovery Mine: http://www.kdnuggets.com Bibliography by (Herb Edelstein) M. Berry, G. Linoff, Data Mining Techniques, John Wiley, 1997 William S. Cleveland, The Elements of Graphing Data, Hobart Press, 1994 Howard Wainer, Visual Revelations, Copernicus, 1997 R. Kennedy, Lee, Reed, Van Roy, Solving Pattern Recognition Problems, PrenticeHall, 1998 U. Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, Advances in Knowledge Discovery and Data Mining, MIT Press, 1996 Dorian Pyle, Data Preparation for Data Mining, Morgan Kaufmann, 1999 C. Westphal, T. Blaxton, Data Mining Solutions, John Wiley, 1998 Vasant Dhar, Roger Stein, Seven Methods for Transforming Corporate Data into Business Intelligence, Prentice Hall 1997 Joseph P. Bigus, Data Mining With Neural Networks, McGraw-Hill, 1996L. Brieman, Freidman, Olshen, Stone, Classification and Regression Trees, Wadsworth, 1984 J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1992 Data holds Knowledge • Data can hold organization’s operations history, what we did . . . and what was the outcome • Can we find which actions gave good (bad) outcomes? • So learn from our past failures and successes to do better in future. What we learn from data Marketing - who’s likely to buy? Forecasts - what demand will we have? Loyalty - who’s likely to defect? Credit - which loans were profitable? Fraud - when did it occur? In each case can we: find the signs? . . . find others showing similar signs? Data mining is natural • This process is simply “learning from experience” • It is a totally natural and routine part of every successful business. • Data mining just helps you do it more quickly, accurately, and systematically. An example Winterthur Insurance, Spain Winterthur: Customer Loyalty or “Churn” • Churn is a common data mining issue. • What’s at stake? Losing car insurance clients at rate of 13.25% a year ($$$$). • Business Goal: retain profitable clients. • Data Mining Goals: predict which clients are likely to resign their policy. • Winterthur can then take action. Approach to churn Select data on customers who resigned • Divide this sample into: – a training set to learn from; – a test set to check the results. • Compare leavers in training set with similar customers who did not leave. • Learn the signature of likely churners. Winterthur Application • Two complementary approaches • In both we learn from a training set, and build a model. 1 Classify customers into leavers and nonleavers. Model gives Yes/No Answer. 2 Predict “likelihood” of people leaving. Generates a “propensity to leave”, or “score” for each case. Model gives numeric answer. Winterthur Results Result on churn classification. • Achieved > 91.5% accuracy predicting churn (Yes/No) on the test set. • This was 20% better than next competitor! Summary Data Mining • Data Mining means • finding patterns in your data • which you can use • to do your business better. • Decisions from data • It is a completely natural business process • . . . with a very wide range of applicability. Applications of Data Mining Four Case Studies Reuters BBC Halfords Survey of other users and applications Reuters Validating Forex Data • Reuters gets currency prices from many sources • May contain errors • Easy to spot afterwards (spikes, dips) • Conventional checking systems spot only obvious errors • What’s at stake? Reuters reputation, therefore sales Reuters Real-time Forex Data £/$ OK Time £/$ ERROR Time “NOW” Reuters - Validating Forex Data • Used historical Forex data • Derived dynamic, timebased descriptors • Built models (neural networks, rules) to predict price movements • Report deviations from predictions BBC TV Audience Prediction • What’s at stake? Survival of BBC! • Business goal – increase audience for TV programs • Proposed business action – better scheduling of programs • Data mining goal – predict audience share a programme will achieve in a particular slot BBC Results • Neural network trained on 1 years data – predicts audience share within 4% – equals best (> 2 years) human schedulers • Some problem programmes – human schedulers had same problems! • Rules gave insight into “reasons” • . . . but beware of reasons . . . Take care with “explanations” • “Any program (X) which follows a UK “soap” will achieve 6% less share that if X is put anywhere else” • So UK “soaps” cause audience to turn off ?? • No! The competition is at work! Halfords - Predicting Sales • Halfords are a retail organization • . . . planning to open new stores • What’s at stake? $10M investment / store • Goal: predict sales from a new store • 500 stores to learn from, many factors: • site, competition, catchment area, management practice, . . . . Halfords - Predicting Sales Clementine models much more accurate than previous statistical models Clementine Model(3w) Predicted sales Predicted sales Regression Model (6m) Actual sales Actual sales Who is using data mining? Manufacturing Finance •Daimler Benz •Ford •British Steel •Caterpillar Retail •Boots •Reuters •Tandy •Barclays Pharmaceutical •ICL Retail •National Westminster •Glaxo-Wellcome •Halfords •Citibank •Pfizer •Du Pont Government Telcos •Unilever •HM Customs & Excise •AT & T •IRS •Cable & Wireless •The Home Office •Cellnet •DERA •Airtouch Cellular •Singapore Telecoms Value of Reducing Attrition by 5% 100 Increase in Profitability 90 80 70 60 50 40 30 20 10 0 Auto/Home Insurance Branch Credit Card Industrial Bank Industrial Brokerage Deposits Distribution Life Insurance Publishing Software Based on The Loyalty Effect; Frederick F. Reichheld, Thomas Teal; Harvard Business School Press, 1996 Two Crows Survey Results Type of Application Credit risk analysis Fraud det ect ion At t rit ion management Market basket analysis T arget ed market ing Cust omer profiling 0 20 40 60 % of Re spon de n ts 80 Evolution of Marketing • Market products to –Everyone – Segments – Customers based on behavior (RFM) – Customers and non-customers based on demographics and psychographics Evolution of Marketing Technology • Mailing list management • Ad-hoc segmentation • RFM • Statistical selection: clustering, regression, logistic regression, etc. • Statistical selection: CHAID • Statistical selection: data mining Lift Lift measures the improvement between two treatments of the data 10,000 8,000 Random Scored 6,000 4,000 2,000 Size of Mailing (thousands) 10 00 90 0 80 0 70 0 60 0 50 0 40 0 30 0 20 0 10 0 0 0 Number of Responses 12,000 Return on Investment 100% 80% 60% % of Total Population 100 80 60 40 20 Random Scored 0 R 40% O I 20% 0% -20% -40% Typical Applications • Finance and Financial Services • • • • • • Lending risk assessment Prediction of customer profitability Targeting direct marketing Predicting market rates Fraud detection Calculating insurance claim profiles Typical Applications • Utilities • Electricity demand forecasting • Modeling energy pricing • Developing control algorithms • Retail • “Basket Analysis” (shopping patterns) • Promotions analysis • Analysis of personnel data Typical Applications • Science and Healthcare • • • • • • • • Drug discovery Predicting corrosivity of chemicals Assessing treatment effectiveness Monitoring intensive care patients Predict crop yield from environmental factors Choosing dental treatment for children Predicting recovery time Analysis of child care projects Typical Applications • Market Research • Increasing response rates to surveys • Estimating missing values in data • Manufacturing/Defence • • • • Analyzing equipment failures Managing spares, warranty claims, recalls Quality management Supply logistics Customer relationships • Profit modeling: – which customers generate most, or least, profit • Forecasting – what demand will we have? • Loyalty – who’s likely to defect? • Credit analysis • Fraud detection – What loans are the most risky? • When did it occur; what were the signs? • Do others show same signs? Summary • Data mining has very broad range of applications • It is already being used by leading companies in many sectors world-wide Agenda • What is data mining? • Who is using data mining, and for what? • Systems Architecture for data mining • Some myths about data mining Recall the decision-value pyramid Decision Value Knowledge Data Mining Management information RD/B, EIS, OLAP Data from operational systems TPS, D/B, Management Reports “Typical” multi-level IS Designed for: Designed for: killer SQL query. short transactions Big dangers: resilience. size? politics? Big danger: unclean data? killer SQL query Strategy Transaction Databases Data Warehouse Data Marts Supervisory Management Operations management Orders Receipts Invoices BI architecture Data sources Data preparation Data collection software External data Data storage KNOWLEDGE WORKERS Data mart Extract Other transaction systems Reporting Paper reports OLAP Data warehouse Segmentation Classification Browser Web server Browser Profiling Enrich Impute Scoring Transform Forecasting Simulation Functional department systems INFORMATION CONSUMERS Exception detection Load Calculate Deployment Pattern recognition Cleanse Manage ERP systems Data analysis & data mining Data mart Optimization MODEL BUILDERS Legacy databases Services / Application development / Prototyping Browser Desktop software DM in an Information System • The only requirements for data mining are – a business problem – some relevant data • The data can come from any data source • . . . or combination of data sources • Successful data mining requires two viewpoints – knowledge of the business meaning of the data – some common-sense analytical knowledge Data Mining Process in a multi-level IS Transaction Databases Data Warehouse Data Marts Eureka?? Other e.g. geographic, demographic, etc. Orders Receipts Invoices Business intelligence tools The data “mine” Query, SQL, Spreadsheets User driven Low dimensionality Little predictive value On Line Analytical Processing (OLAP) Data visualisation Statistics Automatic High dimensionality Non-Linear relations Highly predictive Tree builders, Rule induction Neural networks Business intelligence compared Query/Reporting ‘What were sales of product X in October’ Data Mining • Visualisation-driven • Goal-driven • Automatic • Manual profit • Validation driven • Manual OLAP time ‘Drill down October Sales of product X at 4% profit level, all regions’ Reports & Graphs Goal = ‘significant loss’: ‘If period = week 40 and product = BBQ then profit level = significant loss’ Executable Decision Model Discovered Knowledge is a non-trivial pattern in data classification these people will buy; those people will not association people who buy beer also buy nuts sequence after marriage, people buy insurance clustering/segmentation health, convenience, luxury food eaters . . . Select appropriate modeling technique Categorize your customers or clients Classification Forecast future sales or usage Prediction Group similar customers or clients Segmentation Discover products that are purchased together Association Find patterns and trends over time Sequence rule induction neural networks tree generators rule induction neural networks regression kohonen networks rule induction k-means web diagrams a priori rule induction trend functions rule induction neural networks Decision models • The ideal result is actionable knowledge • … executable software which makes a decision – market to these people out of the list – accept/decline this loan application – predicted revenue from this store is $205M – weight this premium by -5% – sales in this area are below par: investigate! • Models (software agents) can be deployed wherever appropriate in the existing IS Models deployed in an IS Decision models (“agents”) in action Reports Transaction Databases Data Warehouse Data Marts Orders Receipts Invoices Model used for new process New product? New promotion? Data Warehouse Data Marts Warehousing and mining $0.5-5M Data Warehouse Storage, Management Organisation, Control Data Mining $30-200K Discovery, Understanding Modelling • Warehouse not required for data mining... • ... but it is usually an excellent platform • Warehouse cleans data and solves politics – mine first, learn what the warehouse should hold – mine first, use the savings to pay for warehouse! Data mining is natural • DM automates the oldest, most natural process: learning from experience • Finds models of best business practice that can be deployed throughout the enterprise Data Data Mining Enterprise learning feedback loop Deploy models for best practice The Vision decision-enabled enterprises that continually adapt to new customer and market situations Summary of this section • • • • Data mining automates “learning from experience” . . . helps create organizations that adapt there is no limit to the number of applications only requirement is business problem plus relevant data • results can be reports, but better as active best practice models learned from data • models provide benefit only when deployed! • you don’t need to have a warehouse, . . . but it can help. Agenda • What is data mining? • Who is using data mining, and for what? • How data mining fits into an IT system • Some myths about data mining Data mining myths • Myth: “data mining is something algorithms do to large volumes of data; algorithms can discover new knowledge” • Fact: “Data mining is something people do on their businesses.” High-value results are often obtained with modest amounts of data. • Myth: Data mining requires a high degree of analytical skills (e.g. a PhD in statistics) • Fact: The best data miner is someone who knows and understands the business. Data mining vendors the myth-makers! • Vendors position DM to sell their: –parallel machines or large disks –expensive parallel algorithms –dramatic visualisation –high-power external consulting • Some problems need these (and their cost); many do not. Mine data intelligently • Data mining is not blundering blindly about in data using the most powerful shovel (algorithm). • Though it is smart to have a lot of quality tools (algorithms) available. • Contrast: –hydraulic mining by washing away mountains –mining by intelligent prospecting Hydraulic mining at Malakoff Diggins Hydraulic data mining? Picture from TandemTM advertisement Good Data Mining is: . . . “intelligent prospecting” • decide what you are looking for first, • then apply knowledge (c.f. geology, mineralogy..), • then take samples, • assay the results from the samples, • finally mine. Good Data Mining is: . . best with known business problem / opportunity patterns to learn from (known buyers, bad debts, fraud cases, good promotions, profitable lines . . .) • This determines: –business goals and goal variables, –data that is rich in information for this problem –suggest the analysis strategy Understand the Business Problem First Data ? Business problem C1 C2 $ Clustering What you know Insight Increase revenue Improve processes DM rarely requires massive data during the prospecting phase Case of the mysterious disappearing Terabytes • “Can Clementine handle our data base? We have 3Tb going back 20 years, 17M clients.” • “Probably, tell us what you want to investigate.” • “Account closure patterns, to reduce churn” • “How many occur each month?” (1700) 10-4 • What’s important? (age, marriage, . . . . ) 10-5 • When did you start saving this? (2 years ago) 10-6 • When do closure signs begin? (3 months) 10-7 Winterthur Result Recall the Winterthur “churn” problem • Result on churn classification. • Achieved > 91.5% accuracy predicting churn (Yes/No) on the (unseen) test set. • This was 20% better than next competitor! (SAS EM, IBM IM, HNC, Thinking Machines Inc.) Halfords - Predicting Sales Recall the store sales prediction result Clementine Model(3w) Predicted sales Predicted sales Regression Model (6m) Actual sales Actual sales Why? The data is not the business Business Data Name AgeIncom Mar/S Car C PurVal Last ChildSource e in/Div Card ch Purchren F. Bloggs 25 25000SingleYes M/C 5 23.5 34 0 L1 J. Smith37 33000Mar. Yes VISA3 123.4102 2 L2 J. Dow 45 40000Div. No VISA12 15.2 48 1 L1 The Business Business deals with the real world • Most of what is interesting to business is fuzzy - customers, customers’ behaviour • Hard to give a numeric value. • Business/market people know strengths and weaknesses in the data • Garbage (or bias) in = garbage (or bias) out. What’s in the chasm? • Business knowledge that’s in your head (or library, or in other department) • Data we aren’t yet using e.g. MR data. • E.g. company launched new product –90% of our non-buyers are close to buying –90% of our non-buyers will never buy • Same transaction data, but dramatically different prospects Business knowledge • Which factors are relevant? –quality/blend of raw materials –time of year / weather • Maybe key predictors must be derived –a sum: –a trend: –a ratio: household income, rate of sales decrease sales/sq ft. • Business/Market knowledge is the key Halfords’ application Predicted sales Predicted sales Higher accuracy than previous statistical models. Why? External statistics company In-house business manager Regression (6 months) Clementine (3 weeks) Actual sales Actual sales Halfords - Merging data Halfords - Adding market knowledge 1 Split into train and test data 2 Train models 3 Test the models Rationale for ClementineTM • Algorithms have no business knowledge or common sense • Need to use algorithms alongside business/ market expertise • DM is a creative/discovery process. We need fluency to follow train of thought (hunches). • Hunching is hard if business user must keep telling technology expert what to do. Clementine objectives • A data mining system which users can drive themselves • Many fully-packaged algorithms (no one silver bullet) • Can follow up clues discovered in the data • Easy to input own ideas / knowledge • As easy as a spreadsheet Clementine SPSS’ data mining workbench of the future User interface Algorithms Infrastructure Clementine Clementine SPSS Other algorithms Scalable architecture Common deployment vehicles Data mining: decisions from data to do your business better Data ? Business problem C1 C2 $ Clustering What you know Insight Increase revenue Improve processes Thank you for listening. ? Any Questions? Amontgomery@spss.com